Difference between revisions of "Fileplanet"

From Archiveteam
Jump to navigation Jump to search
Line 20: Line 20:
* [http://blog.fileplanet.com http://blog.fileplanet.com]
* [http://blog.fileplanet.com http://blog.fileplanet.com]
* A list of all "site:www.fileplanet.com inurl:hosteddl" URLs since these files seem not to be in the simple ID range
* A list of all "site:www.fileplanet.com inurl:hosteddl" URLs since these files seem not to be in the simple ID range
* Where do links like http://dl.fileplanet.com/dl/dl.asp?classicgaming/o2home/rtl.zip come from and can we rescue those too?


===How to help===
===How to help===
Line 48: Line 49:
* For planning a good range to download, check http://www.quaddicted.com/stuff/temp/file_IDs_from_sitemaps.txt but be aware that apparently that does not cover all IDs we can get by simply incrementing by 1. Schbirid downloaded the file 75059 which is not listed in the sitemaps. So you can not trust that ID list.
* For planning a good range to download, check http://www.quaddicted.com/stuff/temp/file_IDs_from_sitemaps.txt but be aware that apparently that does not cover all IDs we can get by simply incrementing by 1. Schbirid downloaded the file 75059 which is not listed in the sitemaps. So you can not trust that ID list.
* The range 175000-177761 (weird end number since that's when the server ran out of space...) had ~1100 files and 69G. We will need to use 1k ID increments for those ranges.
* The range 175000-177761 (weird end number since that's when the server ran out of space...) had ~1100 files and 69G. We will need to use 1k ID increments for those ranges.
* Schbirid mailed to FPOps@IGN.com on the 3rd of May, no reply.


===Status===
===Status===
Line 125: Line 127:
|-
|-
|}
|}
===Graphs===
[[File:Fileplanet number of IDs from the sitemaps per 1k range.png]]

Revision as of 08:23, 11 May 2012

FilePlanet
Fileplanet logo
Website host of game content, 1996-2012
Website host of game content, 1996-2012
URL http://www.fileplanet.com
Status Closing
Archiving status In progress...
Archiving type Unknown
IRC channel #fireplanet (on hackint)

FilePlanet is no longer hosting new content, and "is in the process of being archived [by IGN]."

FilePlanet hosts 87,190 download pages of game-related material (demos, patches, mods, promo stuff, etc.), which needs to be archived. These tend to be larger files, ranging from 10MB patches to 3GB clients. We'll want all the arms we can for this one, since it gets harder the farther the archiving goes (files are numbered chronologically, and Skyrim mods are bigger than Doom ones).

What We Need

How to help

  • Have bash, wget, grep, rev, cut
  • >30 gigabytes of space per 5k increment
  • Put https://github.com/SpiritQuaddicted/fileplanet-file-download/blob/master/download_pages_and_files_from_fileplanet.sh somewhere and "chmod +x" it
  • Pick a free increment (eg 110000-114999) and tell people about it (#fireplanet in EFnet or post it here)
  • Create a new working directory for your download, named after your chunk. Eg 110000-114999/
  • INSIDE that directory, run the script with your start and end IDs as arguments. Eg "./download_pages_and_files_from_fileplanet.sh 110000 114999"
  • Take a walk for half a day.
  • Once you are done with your chunk, you will have pages_xx000-xx999.log and files_xx000-xx999.log plus the www.fileplanet.com/ directory.
  • Do a "grep -i error *.log" first and see if there were error messages. If so, tell us.
  • "cd .." and "tar -cf 110000-114999.tar 110000-114999/"
  • "du -hs 110000-114999/www.fileplanet.com/" and "ls -1 110000-114999/www.fileplanet.com/ | wc -l" and add those numbers to the table below.
  • Done! GOTO 10

In the end we'll upload all the parts to archive.org. If you have an account, you can use eg s3cmd.

s3cmd --add-header x-archive-auto-make-bucket:1 --add-header "x-archive-meta-description:Files from Fileplanet (www.fileplanet.com), all files from the ID range 110000 to 114999." put 110000-114999.tar s3://FileplanetFiles_110000-114999

s3cmd put 110000-114999/*.log s3://FileplanetFiles_110000-114999/

Mind the trailing slash.

Notes

  • For planning a good range to download, check http://www.quaddicted.com/stuff/temp/file_IDs_from_sitemaps.txt but be aware that apparently that does not cover all IDs we can get by simply incrementing by 1. Schbirid downloaded the file 75059 which is not listed in the sitemaps. So you can not trust that ID list.
  • The range 175000-177761 (weird end number since that's when the server ran out of space...) had ~1100 files and 69G. We will need to use 1k ID increments for those ranges.
  • Schbirid mailed to FPOps@IGN.com on the 3rd of May, no reply.

Status

Range Status Number of files Size in gigabytes Downloader
00000-09999 Done, archived 1991 1G Schbirid
10000-19999 Done, archived 3159 9G Schbirid
20000-29999 Done, locally 6453 7G Schbirid
30000-39999 Done, locally 4085 9G Schbirid
40000-49999 Done, archived 5704 18G Schbirid
50000-54999 Done, locally 2706 24G Schbirid
55000-59999 Done, archived (bad URL) 2390 24G Schbirid
60000-64999 Done, archived 2349 24G Schbirid
65000-69999 Done, archived 305 4G Schbirid
70000-79999 Done, archived 59 0.2G Schbirid
80000-109999 Running Schbirid

Graphs

Fileplanet number of IDs from the sitemaps per 1k range.png