Difference between revisions of "Chromebot"
(Mentioning a way to bypass Instagram's restriction.) |
m (Correct website name.) |
||
Line 27: | Line 27: | ||
''<Instagram.com URL> cannot be queued: Banned by Instagram'' | ''<Instagram.com URL> cannot be queued: Banned by Instagram'' | ||
One way to bypass Instagram's restrictions partially is using [http://Insta-Stalker.com/ Insta-Stalker | One way to bypass Instagram's restrictions partially is using [http://Insta-Stalker.com/ Insta-Stalker.com], which is just a third-party web viewer for Instagram, equipped with an AJAX-free user search feature and the ability to view profiles without Instagram's new Web-App-type website (similar to [https://mobile.twitter.com/ Twitter Lite]) that made Instagram inaccessible to the [[Wayback Machine]] and [[Archive.Today]]'s crawlers. The former gets stuck in an infinite refresh loop. | ||
'''URL format:''' | '''URL format:''' |
Revision as of 22:02, 26 April 2019
chromebot is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.
Usage[1]
You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with it's parent ArchiveBot. Both “chromebot
” and “chromebot:
” work, with or without the colon. The username can be autocompleted using the “↹Tab” key in the EFNet web chat interface or IRC client.
Command | Description |
---|---|
chromebot: a <uuid> |
Archive <url> with <concurrency> processes according to recursion <policy>. |
chromebot: s <uuid> chromebot s <uuid> |
Get job status for <uuid>. |
chromebot: r <uuid> chromebot r <uuid> |
Revoke or abort running job with <uuid>. |
Please note that the commands are case-sensitive.
Restrictions
Instagram.com
ChromeBot has been blacklisted by Instagram, a website infamous for being an archival loophole.
When trying to archive any Instagram.com website, chromebot responds with the following error:
<Instagram.com URL> cannot be queued: Banned by Instagram
One way to bypass Instagram's restrictions partially is using Insta-Stalker.com, which is just a third-party web viewer for Instagram, equipped with an AJAX-free user search feature and the ability to view profiles without Instagram's new Web-App-type website (similar to Twitter Lite) that made Instagram inaccessible to the Wayback Machine and Archive.Today's crawlers. The former gets stuck in an infinite refresh loop.
URL format:
- Search URL: https://insta-stalker.com/search/?q=
Search+Term+here
- User URL (example): https://insta-stalker.com/profile/SamsungMobile/