Difference between revisions of "Blogger"
(<SketchCow> They're bullshit. Get rid of them.) |
m |
||
Line 5: | Line 5: | ||
| URL = http://www.blogger.com/ | | URL = http://www.blogger.com/ | ||
| project_status = {{online}} | | project_status = {{online}} | ||
| archiving_status = {{ | | archiving_status = {{upcoming}} | ||
| irc = frogger | |||
}} | }} | ||
'''Blogger''' is a blog hosting service. | '''Blogger''' is a blog hosting service. On February 23, 2015, they announced that "sexually explicit" blogs would be deleted in a month. We're downloading everything. | ||
== Downloading a single blog with Wget == | == Downloading a single blog with Wget == |
Revision as of 04:15, 24 February 2015
Blogger | |
URL | http://www.blogger.com/ |
Status | Online! |
Archiving status | Upcoming... |
Archiving type | Unknown |
IRC channel | #frogger (on hackint) |
Blogger is a blog hosting service. On February 23, 2015, they announced that "sexually explicit" blogs would be deleted in a month. We're downloading everything.
Downloading a single blog with Wget
These Wget parameters can download a BlogSpot blog, including comments and any on-site dependencies. It should also reject redundant pages such as the /search/ directory and any multiple occurrences of the same page but with different query strings. It has only be tested on blogs using a Blogger subdomain (e.g. http://foobar.blogspot.com), not custom domains (e.g. http://foobar.com). Both instances of [URL] should be replaced with the same URL. A simple Perl wrapper is available here.
wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="[URL]" --wait 1 [URL]
Export XML trick
Add this to a blog url and it will download the most recent 499 posts (that is the limit): /atom.xml?redirect=false&max-results=499
External links