Difference between revisions of "Anubis"
(dolphin: https://dolphin-emu.org/blog/2025/06/04/dolphin-progress-report-release-2506/) |
m (Add Proxmox Bugzilla to sites running Anubis) |
||
Line 51: | Line 51: | ||
* https://wiki.dolphin-emu.org/ | * https://wiki.dolphin-emu.org/ | ||
* https://fifo.ci/ | * https://fifo.ci/ | ||
* https://bugzilla.proxmox.com/ | |||
== Resources== | == Resources== |
Revision as of 23:48, 8 June 2025
Anubis is a "self hostable scraper defense software"[1] that is starting to get adopted especially by open source projects which are struggling under the load those aggressive AI/LLM scrapers cause on their infrastructure.
Since the bots and crawlers we employ match the description as well, they can also get detected and blocked by Anubis.
How it works
Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing "Mozilla". Nearly all AI scrapers (and browsers) use a User-Agent string that includes "Mozilla" in it. This means that Anubis is able to block nearly all AI scrapers without any configuration. —Xe[2]
Problems caused by Anubis
As the default user agent for Archivebot includes the word "Mozilla"[3], it triggers the Anubis challenge as well and ultimately fails because it can't solve it. After a friendly discussion in the #archiveteam-bs channel[4], a temporary solution was found by making Archivebot use the 'curl' useragent, which doesn't trigger the challenge, but the curl useragent could trigger other errors on outlinks.
Wikibot should still be affected by this too, but is able to set an arbitrary User-Agent header, so it is easy to workaround it.
Projects and websites known to deploy Anubis
- The Linux Kernel Mailing List archives[5]
- FreeBSD's SVN (and soon git)[5]
- SourceHut[5]
- FFmpeg[5]
- Wine[5]
- UNESCO[5]
- The Science Olympiad Student Center[5]
- Enlightenment (the desktop environment)[5]
- GNOME's GitLab[5]
- https://wiki.archlinux.org/
- https://muc.ccc.de/ and subdomains (they thankfully added an exception rule for Wikibot!)
- The Battle for Wesnoth
- https://osmocom.org/
- https://wiki.freecad.org/
- https://wiki.fhs.sh/
- https://wiki.alopex.li/
- https://wiki.freepascal.org/
- https://www.scioly.org/
- https://wiki.scummvm.org/
- https://azeyma.within.lgbt/git/
- https://forgejo.lyrion.ch/
- https://git.bottomservices.club/
- https://git.colean.cc/
- https://git.owlcode.tech/
- https://git.g3la.de/
- https://tangled.sh/
- https://git.average.name/user/sign_up
- linuxtv.org
- shoutwiki.com - some pages not all
- https://bugs.dolphin-emu.org/
- https://wiki.dolphin-emu.org/
- https://fifo.ci/
- https://bugzilla.proxmox.com/
Resources
References
- ↑ https://anubis.techaro.lol/
- ↑ https://xeiaso.net/blog/2025/anubis/
- ↑ https://github.com/ArchiveTeam/ArchiveBot/blob/a975ff994126f60b5b534668ee54b96c29d51707/pipeline/pipeline.py#L124
- ↑ https://irclogs.archivete.am/archiveteam-bs/2025-03-26#lf8cbcc04
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 https://xeiaso.net/notes/2025/anubis-works/