Difference between revisions of "Starwars.yahoo.com"

From Archiveteam
Jump to navigation Jump to search
Line 1: Line 1:
Yahoo took down starwars.yahoo.com on Dec 15, 2009 and redirected the site to starwars.com.  We saved 99.9% of /users/, /forums/, /links/, and a smaller percentage of the other content on the site.
Problems encountered:
Problems encountered:
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP.  We used two approaches to get around this.
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP.  We used two approaches to get around this.

Revision as of 20:14, 23 December 2009

Yahoo took down starwars.yahoo.com on Dec 15, 2009 and redirected the site to starwars.com. We saved 99.9% of /users/, /forums/, /links/, and a smaller percentage of the other content on the site.

Problems encountered:

  • Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this.
    • TOR (slow as molasses, but worked) - collected using httrack
    • multiple IPs (fast, but needs large IP resources) - collected using wget

The tarballs in the archive reflect both archiving methods:

-rw-r--r--  1 root   root   228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2
-rw-r--r--  1 root   root    36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2