Difference between revisions of "Chromebot"
m (→UsageChromeBot usage documentation on GitHub: Rearrangement. Looks less distorted when reading commands in table.) |
(Remove the irrelevant noise) |
||
Line 1: | Line 1: | ||
chromebot is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, [https://github.com/PromyLOPh/crocoite software] and bot, are maintained by [[User:PurpleSymphony]]. [[WARC]]s are uploaded daily to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org. | '''chromebot''' aka. '''crocoite''' is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, [https://github.com/PromyLOPh/crocoite software] and bot, are maintained by [[User:PurpleSymphony]]. [[WARC]]s are uploaded daily to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org. | ||
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [https://6xq.net/chromebot/ dashboard] is available for watching the progress of such jobs. | By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [https://6xq.net/chromebot/ dashboard] is available for watching the progress of such jobs. | ||
== Usage | == Usage == | ||
You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with | [https://github.com/PromyLOPh/crocoite/blob/184189f0a535996edca01a68182ed07d32e26e9c/README.rst#IRC-bot crocoite usage documentation on GitHub] | ||
You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon. | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 20: | Line 22: | ||
|| Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''. | || Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''. | ||
|- | |- | ||
| <code>chromebot: s <uuid></code><br /><code>chromebot s <uuid></code> || | | <code>chromebot: s <uuid></code><br /><code>chromebot s <uuid></code> || Get job status for ''<uuid>''. | ||
|- | |- | ||
| <code>chromebot: r <uuid></code><br /><code>chromebot r <uuid></code> || Revoke or abort running job with ''<uuid>''. | | <code>chromebot: r <uuid></code><br /><code>chromebot r <uuid></code> || Revoke or abort running job with ''<uuid>''. | ||
Line 29: | Line 31: | ||
== Restrictions == | == Restrictions == | ||
=== Instagram.com === | === Instagram.com === | ||
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error: | |||
When trying to archive any Instagram.com website, chromebot responds with the following error: | |||
''<Instagram.com URL> cannot be queued: Banned by Instagram'' | ''<Instagram.com URL> cannot be queued: Banned by Instagram'' | ||
=== Cloudflare DDoS protection === | |||
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload ([https://github.com/PromyLOPh/crocoite/issues/13 issue #13 on GitHub]). | |||
== References == | == References == | ||
<references /> | <references /> | ||
[[Category:Bots]] | [[Category:Bots]] |
Revision as of 22:53, 9 May 2019
chromebot aka. crocoite is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.
Usage
crocoite usage documentation on GitHub
You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with ArchiveBot. Both “chromebot
” and “chromebot:
” work, with or without the colon.
Command | Description |
---|---|
|
Archive <url> with <concurrency> processes according to recursion <policy>. |
chromebot: s <uuid> chromebot s <uuid> |
Get job status for <uuid>. |
chromebot: r <uuid> chromebot r <uuid> |
Revoke or abort running job with <uuid>. |
Please note that the commands are case-sensitive.
Restrictions
Instagram.com
chromebot has been blacklisted by Instagram. When trying to archive any Instagram.com website, chromebot responds with the following error:
<Instagram.com URL> cannot be queued: Banned by Instagram
Cloudflare DDoS protection
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload (issue #13 on GitHub).