Difference between revisions of "Anubis"
|  (Create page for Anubis) | Cooljeanius (talk | contribs)   (→Projects and websites known to deploy Anubis:  add The Battle for Wesnoth) | ||
| Line 30: | Line 30: | ||
| * https://wiki.archlinux.org/ | * https://wiki.archlinux.org/ | ||
| * https://muc.ccc.de/ and subdomains (they thankfully added an exception rule for [[Wikibot]]!) | * https://muc.ccc.de/ and subdomains (they thankfully added an exception rule for [[Wikibot]]!) | ||
| * The Battle for Wesnoth | |||
| == Resources== | == Resources== | ||
Revision as of 12:33, 25 May 2025
Anubis is a "self hostable scraper defense software"[1] that is starting to get adopted especially by open source projects which are struggling under the load those aggressive AI/LLM scrapers cause on their infrastructure.
Since the bots and crawlers we employ match the description as well, they can also get detected and blocked by Anubis.
How it works
Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
The most hilarious part about how Anubis is implemented is that it triggers challenges for every request with a User-Agent containing "Mozilla". Nearly all AI scrapers (and browsers) use a User-Agent string that includes "Mozilla" in it. This means that Anubis is able to block nearly all AI scrapers without any configuration. —Xe[2]
Problems caused by Anubis
As the default user agent for Archivebot includes the word "Mozilla"[3], it triggers the Anubis challenge as well and ultimately fails because it can't solve it. After a friendly discussion in the #archiveteam-bs channel[4], a temporary solution was found by making Archivebot use the 'curl' useragent, which doesn't trigger the challenge.
Wikibot should still be affected by this, too, as it is not possible to change its user agent yet.
Projects and websites known to deploy Anubis
- The Linux Kernel Mailing List archives[5]
- FreeBSD's SVN (and soon git)[5]
- SourceHut[5]
- FFmpeg[5]
- Wine[5]
- UNESCO[5]
- The Science Olympiad Student Center[5]
- Enlightenment (the desktop environment)[5]
- GNOME's GitLab[5]
- https://wiki.archlinux.org/
- https://muc.ccc.de/ and subdomains (they thankfully added an exception rule for Wikibot!)
- The Battle for Wesnoth
Resources
References
- ↑ https://anubis.techaro.lol/
- ↑ https://xeiaso.net/blog/2025/anubis/
- ↑ https://github.com/ArchiveTeam/ArchiveBot/blob/a975ff994126f60b5b534668ee54b96c29d51707/pipeline/pipeline.py#L124
- ↑ https://irclogs.archivete.am/archiveteam-bs/2025-03-26#lf8cbcc04
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 https://xeiaso.net/notes/2025/anubis-works/