Difference between revisions of "Chromebot"

From Archiveteam
Jump to navigation Jump to search
m (No references anymore)
m
Line 30: Line 30:


== Restrictions ==
== Restrictions ==
=== Instagram.com ===
=== Instagram ===
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error:
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error:
  ''<Instagram.com URL> cannot be queued: Banned by Instagram''
  ''<Instagram.com URL> cannot be queued: Banned by Instagram''

Revision as of 22:57, 9 May 2019

chromebot aka. crocoite is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.

By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.

Usage

crocoite usage documentation on GitHub

You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with ArchiveBot. Both “chromebot” and “chromebot:” work, with or without the colon.

Command Description

chromebot: a <url>
chromebot: a <url> <concurrency>
chromebot: a <url> <concurrency> <policy>
chromebot a <url>
chromebot a <url> <concurrency>
chromebot a <url> <concurrency> <policy>

Archive <url> with <concurrency> processes according to recursion <policy>.
chromebot: s <uuid>
chromebot s <uuid>
Get job status for <uuid>.
chromebot: r <uuid>
chromebot r <uuid>
Revoke or abort running job with <uuid>.

Please note that the commands are case-sensitive.

Restrictions

Instagram

chromebot has been blacklisted by Instagram. When trying to archive any Instagram.com website, chromebot responds with the following error:

<Instagram.com URL> cannot be queued: Banned by Instagram

Cloudflare DDoS protection

chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload (issue #13 on GitHub).