Difference between revisions of "Chromebot"
m |
(→Usage: URL lists) |
||
Line 13: | Line 13: | ||
|- | |- | ||
| white-space: nowrap | | | white-space: nowrap | | ||
<code>chromebot: a <url> | <code>chromebot: a <url> -r <policy> -j <concurrency></code> | ||
|| Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''. | || Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''. | ||
|- | |- | ||
| <code>chromebot: s <uuid></code | | <code>chromebot: s <uuid></code></code> || Get job status for ''<uuid>''. | ||
|- | |- | ||
| <code>chromebot: r <uuid></code | | <code>chromebot: r <uuid></code></code> || Revoke or abort running job with ''<uuid>''. | ||
|} | |} | ||
Please note that the commands are case-sensitive. | Please note that the commands are case-sensitive. | ||
URL lists can be archived using recursion, for example: | |||
<code>chromebot: a https://transfer.notkiska.pw/inline/UpfR/HollyConrad-tweets -r 1 -j 4</code> | |||
chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment). | |||
== Restrictions == | == Restrictions == |
Revision as of 12:58, 22 May 2019
chromebot aka. crocoite is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.
Usage
crocoite usage documentation on GitHub
You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with ArchiveBot. Both “chromebot
” and “chromebot:
” work, with or without the colon.
Command | Description |
---|---|
|
Archive <url> with <concurrency> processes according to recursion <policy>. |
chromebot: s <uuid> |
Get job status for <uuid>. |
chromebot: r <uuid> |
Revoke or abort running job with <uuid>. |
Please note that the commands are case-sensitive.
URL lists can be archived using recursion, for example:
chromebot: a https://transfer.notkiska.pw/inline/UpfR/HollyConrad-tweets -r 1 -j 4
chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment).
Restrictions
chromebot has been blacklisted by Instagram. When trying to archive any Instagram.com website, chromebot responds with the following error:
<Instagram.com URL> cannot be queued: Banned by Instagram
Cloudflare DDoS protection
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload (issue #13 on GitHub).