Difference between revisions of "Posterous"

From Archiveteam
Jump to navigation Jump to search
Line 7: Line 7:
| archiving_status = {{inprogress}}
| archiving_status = {{inprogress}}
| irc = preposterus
| irc = preposterus
| tracker = [http://tracker.archiveteam.org/posterous/ here]
}}
}}


Line 27: Line 28:
http://archive.org/details/2013-02-22-posterous-hostname-list
http://archive.org/details/2013-02-22-posterous-hostname-list


Tools: [https://github.com/ArchiveTeam/smeg git]  
Tools: [https://github.com/ArchiveTeam/smeg git]
 
== Archiving a single blog ==
 
Developing a command to archive a single blog, including all images and assets.
 
  USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
  wget "https://$hostname" --warc-file=$hostname.warc \
    --mirror --no-check-certificate --span-hosts \
    --domains=$hostname,s3.amazonaws.com,files.posterous.com,getfile.posterous.com,getfile0.posterous.com,getfile1.posterous.com,getfile2.posterous.com,getfile3.posterous.com,getfile4.posterous.com,getfile5.posterous.com,getfile6.posterous.com,getfile7.posterous.com,getfile8.posterous.com,getfile9.posterous.com,getfile10.posterous.com \
    -U "$USER_AGENT" -nv -e robots=off --page-requisites \
    --timeout 60 --tries 20 --waitretry 5 \
    --warc-header "operator: Archive Team" \
    --warc-header "posterous-hostname: $hostname"
 
Using https because it allows for http pipelining, which may help prevent being banned.

Revision as of 02:31, 23 February 2013

Posterous
Posterous home.png
URL http://posterous.com
Status Closing
Archiving status In progress...
Archiving type Unknown
Project tracker here
IRC channel #preposterus (on hackint)

Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. Announcement

Seesaw script

Download:

https://gist.github.com/Gelob/16aacab95d2d59887d86

Follow instructions to install seesaw and edit script for IP address.

Running too many concurrently will get you banned at :50 past the hour.

Site List Grab

We have assembled a list of Posterous sites that need grabbing. Total found: 9898986

http://archive.org/details/2013-02-22-posterous-hostname-list

Tools: git