Difference between revisions of "Blogger"

Revision as of 09:31, 26 November 2011

Blogger

URL	http://www.blogger.com/
Status	Online!
Archiving status	Not saved yet
Archiving type	Unknown
IRC channel	#archiveteam-bs (on hackint)

Blogger is a blog hosting.

Template:Expand

Downloading a single blog with Wget

These Wget parameters can download a BlogSpot blog, including comments and any on-site dependencies. It should also reject redundant pages such as the /search/ directory and any multiple occurrences of the same page but with different query strings. It has only be tested on blogs using a Blogger subdomain (e.g. http://foobar.blogspot.com), not custom domains (e.g. http://foobar.com). Both instances of [URL] should be replaced with the same URL. A simple Perl wrapper is available here.

wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="[URL]" --wait 1 [URL]

Export XML trick

Add this to a blog url and it will download the most recent 499 posts (that is the limit): /atom.xml?redirect=false&max-results=499

External links

Blogger^{[IA•Wcite•.today•MemWeb]}

@@ Line 15: / Line 15: @@
 These Wget parameters can download a BlogSpot blog, including comments and any on-site dependencies.  It should also reject redundant pages such as the /search/ directory and any multiple occurrences of the same page but with different query strings.  It has only be tested on blogs using a Blogger subdomain (e.g. http://foobar.blogspot.com), not custom domains (e.g. http://foobar.com).  Both instances of [URL] should be replaced with the same URL.  A simple Perl wrapper is available [http://pastebin.com/2QUuH26L here].
-<tt>wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="search" --referer="[URL]" --wait 1 [URL]</tt>
+<tt>wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="[URL]" --wait 1 [URL]</tt>
 == Export XML trick ==

Difference between revisions of "Blogger"

Revision as of 09:31, 26 November 2011

Downloading a single blog with Wget

Export XML trick

External links

Navigation menu

Search