Difference between revisions of "Posterous"
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement] | Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement] | ||
== Seesaw script == | |||
Download: | |||
https://gist.github.com/Gelob/16aacab95d2d59887d86 | |||
Follow instructions to install seesaw and edit script for IP address. | |||
Running too many concurrently will get you banned at :50 past the hour. | |||
== Site List Grab == | == Site List Grab == |
Revision as of 02:30, 23 February 2013
Posterous | |
URL | http://posterous.com |
Status | Closing |
Archiving status | In progress... |
Archiving type | Unknown |
IRC channel | #preposterus (on hackint) |
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. Announcement
Seesaw script
Download:
https://gist.github.com/Gelob/16aacab95d2d59887d86
Follow instructions to install seesaw and edit script for IP address.
Running too many concurrently will get you banned at :50 past the hour.
Site List Grab
We have assembled a list of Posterous sites that need grabbing. Total found: 9898986
http://archive.org/details/2013-02-22-posterous-hostname-list
Tools: git
Archiving a single blog
Developing a command to archive a single blog, including all images and assets.
USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" wget "https://$hostname" --warc-file=$hostname.warc \ --mirror --no-check-certificate --span-hosts \ --domains=$hostname,s3.amazonaws.com,files.posterous.com,getfile.posterous.com,getfile0.posterous.com,getfile1.posterous.com,getfile2.posterous.com,getfile3.posterous.com,getfile4.posterous.com,getfile5.posterous.com,getfile6.posterous.com,getfile7.posterous.com,getfile8.posterous.com,getfile9.posterous.com,getfile10.posterous.com \ -U "$USER_AGENT" -nv -e robots=off --page-requisites \ --timeout 60 --tries 20 --waitretry 5 \ --warc-header "operator: Archive Team" \ --warc-header "posterous-hostname: $hostname"
Using https because it allows for http pipelining, which may help prevent being banned.