Chromebot

From Archiveteam
Revision as of 21:48, 26 April 2019 by ATrescue (talk | contribs) (Mentioning ChromeBot's response to Instagram URL archival attempt.)
Jump to navigation Jump to search

chromebot is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.

By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.

Usage[1]

You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with it's parent ArchiveBot. Both “chromebot” and “chromebot:” work, with or without the colon. The username can be autocompleted using the “Tab” key in the EFNet web chat interface or IRC client.

Command Description
chromebot: a <uuid>
chromebot a <uuid>
Archive <url> with <concurrency> processes according to recursion <policy>.
chromebot: s <uuid>
chromebot s <uuid>
Get job status for <uuid>.
chromebot: r <uuid>
chromebot r <uuid>
Revoke or abort running job with <uuid>.

Please note that the commands are case-sensitive.


Restrictions

Instagram.com

ChromeBot has been blacklisted by Instagram, a website infamous for being an archival loophole.

When trying to archive any Instagram.com website, chromebot responds with the following error:

<Instagram.com URL> cannot be queued: Banned by Instagram

References