From Archiveteam
Revision as of 00:46, 8 May 2019 by ATrescue (talk | contribs) (→‎UsageChromeBot usage documentation on GitHub: Rearrangement. Looks less distorted when reading commands in table.)
Jump to navigation Jump to search

chromebot is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on

By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.


You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with it's parent ArchiveBot. Both “chromebot” and “chromebot:” work, with or without the colon. The username can be autocompleted using the “Tab” key in the EFNet web chat interface or IRC client.

Command Description

chromebot: a <url>
chromebot: a <url> <concurrency>
chromebot: a <url> <concurrency> <policy>
chromebot a <url>
chromebot a <url> <concurrency>
chromebot a <url> <concurrency> <policy>

Archive <url> with <concurrency> processes according to recursion <policy>.
chromebot: s <uuid>
chromebot s <uuid>
Get job status for <uuid>.
chromebot: r <uuid>
chromebot r <uuid>
Revoke or abort running job with <uuid>.

Please note that the commands are case-sensitive.


ChromeBot has been blacklisted by Instagram, a website infamous for being an archival loophole.

When trying to archive any website, chromebot responds with the following error:

< URL> cannot be queued: Banned by Instagram

One way to bypass Instagram's restrictions partially is using, which is just a third-party web viewer for Instagram, equipped with an AJAX-free user search feature and the ability to view profiles without Instagram's new Web-App-type website (similar to Twitter Lite) that made Instagram inaccessible to the Wayback Machine and Archive.Today's crawlers. The former gets stuck in an infinite refresh loop.

URL format:

A way to bypass Instagram's restriction using ArchiveBot, which is not blocked from Instagram, is using the snscrape tool to put the URLs of the posts into a text file list that, uploaded to or , that can be consumed by ArchiveBot's !ao < <link to list file> command.
Pages captured from Instagram store the information, but can not be viewed in the version injected into the Wayback Machine, which gets stuck in an infinite refresh loop due to Instagram's heavy usage of JavaScript (web-app type).

CloudFlare DDoS protection

Another obstacle for both this bot and ArchiveBot is CloudFlare's DDoS protection, which could prevent the bots from capturing a webpage.