Difference between revisions of "Fileplanet"

From Archiveteam
Jump to navigation Jump to search
m
 
(152 intermediate revisions by 10 users not shown)
Line 5: Line 5:
| URL = http://www.fileplanet.com
| URL = http://www.fileplanet.com
| image = Fileplanet_snap.png
| image = Fileplanet_snap.png
| project_status = {{closing}}
| project_status = {{specialcase}} (no longer being updated)
| archiving_status = {{inprogress}}
| archiving_status = {{saved}}
| irc = fireplanet
| irc = fireplanet
| irc_network = EFnet
| irc_abandoned = true
| data = {{IA collection|archiveteam-fileplanet}}
}}
}}


[http://www.fileplanet.com FilePlanet] is no longer hosting new content, and "is in the process of being archived [by IGN]."
In 2012 [http://www.fileplanet.com FilePlanet] announced it was no longer hosting new content, and "is in the process of being archived [by IGN]."


FilePlanet hosts 87,190 download pages of game-related material (demos, patches, mods, promo stuff, etc.), which needs to be archived. These tend to be larger files, ranging from 10MB patches to 3GB clients.  We'll want all the arms we can for this one, since it gets harder the farther the archiving goes (files are numbered chronologically, and Skyrim mods are bigger than Doom ones).
FilePlanet hosted tens of thousands of game-related files (demos, patches, mods, promo stuff, etc.). These tend to be larger files, ranging from 10MB patches to 3GB clients.


===What We Need===
===The archival===


* Files! (approx. 45% done 5/15/12)
After first downloading files [[Fileplanet/Status_of_by_id_grab|by iterating IDs on the public website fileplanet.com and upload those in chunks to archive.org]] as well as [[Fileplanet/non-id-urls|scouting the web for other public URLs]], we got FTP access to the storage servers by the staff. Thanks!
* /fileinfo/ pages - get URLs from sitemaps (Schbirid is downloading these)
** Afterwards, extract all thumbnail image links and grab the full size images (strip _sm2 from the basename)
* [http://blog.fileplanet.com http://blog.fileplanet.com]
* A list of all "site:www.fileplanet.com inurl:hosteddl" URLs since these files seem not to be in the simple ID range
* Where do links like http://dl.fileplanet.com/dl/dl.asp?classicgaming/o2home/rtl.zip come from and can we rescue those too?
** The non-IDed files are stuck behind the download manager - any clever way past it?  URLs to the files are of the form [http://download.direct2drive.com/ftp2/planetannihilation/mercilesscreations/opflash/opflash_-_uber_editor_tutorial.pdf?clientid=781894158 http://download.direct2drive.com/ftp2/planetannihilation/mercilesscreations/opflash/opflash_-_uber_editor_tutorial.pdf?clientid=781894158] and seem to require a valid ID to fetch.
*** Those URLs are the ones we currently fetch too. The script "visits" the download page and extracts such URL. The problem with these files is that they open a download link in a new window and I have not yet found out how to "open" that window correctly with wget. Haven't really tried though. -Schbirid


===How to help===
https://archive.org/details/archiveteam-fileplanet is the collection.


* Have bash, wget, grep, rev, cut
Unpacked and sorted it amounts to about ~120k files at ~10TB. The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact [[User:Schbirid]] with archived URLs that prove their previous availability to the public, e.g. via archived fileplanet.com pages.
* >100 gigabytes of space, just to be safe
* Put https://raw.github.com/SpiritQuaddicted/fileplanet-file-download/master/download_pages_and_files_from_fileplanet.sh somewhere (I'd suggest ~/somepath/fileplanetdownload/ ) and "chmod +x" it
* Pick a free increment (eg 110000-114999) and tell people about it (#fireplanet in EFnet or post it here). Be careful. In lower ranges a 5k range might work, but they get HUGE later. In the 220k range and probably lower too, we better use 100 IDs per chunk.
* * Keep the chunk sizes small. <30G would be nice. The less the better.
* Run the script with your start and end IDs as arguments. Eg "<code>./download_pages_and_files_from_fileplanet.sh 110000 114999</code>"
* Take a walk for half a day.
* You can <code>tail</code> the .log files if you are curious. See right below.
* Once you are done with your chunk, you will have a directory named after your range, eg 110000-114999/ . Inside that pages_xx000-xx999.log and files_xx000-xx999.log plus the www.fileplanet.com/ directory.
* Done! GOTO 10


In the end we'll upload all the parts to archive.org. If you have an account, you can use eg s3cmd.  
<gallery>
File:Fileplanet ftp structure.png|FTP structure
File:Fileplanet ftp restructured File Size Statistics.png|Size statistics
File:Fileplanet ftp restructured File Age.png|Age statistics
File:Fileplanet ftp restructured File Type Statistics.png|File type statistics
File:Fileplanet ftp restructured Largest Files.png|Largest files
</gallery>


<code>s3cmd --add-header x-archive-auto-make-bucket:1 --add-header "x-archive-meta-description:Files from Fileplanet (www.fileplanet.com), all files from the ID range 110000 to 114999." put 110000-114999/*.log 110000-114999.tar s3://FileplanetFiles_110000-114999/</code>
There is a half-assed search interface available at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?filename=yourfilenamehere and a directory browser at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?directory=/some/dir/here/


'''The log files are important! Make sure they are saved!'''
===Related items===
 
* /fileinfo/ pages and the embedded images/thumbnails from the grab by IDs: https://archive.org/details/FileplanetFiles_fileinfo_pages_images
===Notes===
* /download/ pages and download logs from the grab by IDs: https://archive.org/details/Fileplanet_index.htmls_and_logs_scraped_by_id
* For planning a good range to download, check http://www.quaddicted.com/stuff/temp/file_IDs_from_sitemaps.txt but be aware that apparently that does not cover all IDs we can get by simply incrementing by 1. Schbirid downloaded eg the file 75059 which is not listed in the sitemaps. So you can not trust that ID list.
* http://blog.fileplanet.com: https://archive.org/details/FileplanetBlogFileplanetCom
* The range 175000-177761 (weird end number since that's when the server ran out of space...) had ~1100 files and 69G. We will need to use 1k ID increments for those ranges.
* http://www.fileplanet.com/fileblog/archives/: https://archive.org/details/FileplanetFileblog
* Schbirid mailed to FPOps@IGN.com on the 3rd of May, no reply.
 
===Status===
{| class="wikitable"
|-
! Range
! Status
! Number of files
! Size in gigabytes
! Downloader
|-
| 00000-09999
| Done, [http://archive.org/details/FileplanetFiles_00000-09999 archived]
| 1991
| 1G
| Schbirid
|-
| 10000-19999
| Done, [http://archive.org/details/FileplanetFiles_10000-19999 archived]
| 3159
| 9G
| Schbirid
|-
| 20000-29999
| Done, [http://archive.org/details/FileplanetFiles_20000-29999 archived]
| 6453
| 7G
| Schbirid
|-
| 30000-39999
| Done, [http://archive.org/details/FileplanetFiles_30000-39999 archived]
| 4085
| 9G
| Schbirid
|-
| 40000-49999
| Done, [http://archive.org/details/FileplanetFiles_40000-49999 archived]
| 5704
| 18G
| Schbirid
|-
| 50000-54999
| Done, locally
| 2706
| 24G
| Schbirid
|-
| 55000-59999
| Done, [http://archive.org/details/FileplanetFiles_50000-559999 archived] (bad URL)
| 2390
| 24G
| Schbirid
|-
| 60000-64999
| Done, [http://archive.org/details/FileplanetFiles_60000-64999 archived]
| 2349
| 24G
| Schbirid
|-
| 65000-69999
| Done, [http://archive.org/details/FileplanetFiles_65000-69999 archived]
| 305
| 4G
| Schbirid
|-
| 70000-79999
| Done, [http://archive.org/details/FileplanetFiles_70000-79999 archived]
| 59
| 0.2G
| Schbirid
|-
| 80000-84999
| Done, locally
| 2822
| 31G
| Debianer
|-
| 85000-89999
| Done, [http://archive.org/details/FileplanetFiles_85000-89999 archived]
| 1869
| 29G
| Schbirid
|-
| 90000-109999
| Done, empty
| 0
| 0
| Schbirid
|-
| 110000-114999
| Done, [http://archive.org/details/FileplanetFiles_110000-114999 archived]
| 2139
| 35G
| Schbirid
|-
| 115000-115999
| Done, [http://archive.org/details/FileplanetFiles_115000-115999 archived]
| 932
| 1.9G
| codebear
|-
| 116000-116999
| Done, [http://archive.org/details/FileplanetFiles_116000-116999 archived]
| 694
| 11G
| codebear
|-
| 117000-117999
| Done, [http://archive.org/details/FileplanetFiles_117000-117999 archived]
| 752
| 16G
| codebear
|-
| 118000-118999
| Done, [http://archive.org/details/FileplanetFiles_118000-118999 archived]
| 726
| 16G
| codebear
|-
| 119000-119999
| Done, locally
| 718
| 28G
| codebear
|-
| 120000-124999
| Done, locally
| 3463
| 68G
| codebear
|-
| 125000-129999
| Done, archived [http://archive.org/details/FileplanetFiles_125000-125999] [http://archive.org/details/FileplanetFiles_126000-126999] [http://archive.org/details/FileplanetFiles_127000-127999] [http://archive.org/details/FileplanetFiles_128000-128999] [http://archive.org/details/FileplanetFiles_129000-128999] (bad URL)
| 3384
| 78G
| S[h]O[r]T
|-
| 130000-130999
| Done, [http://archive.org/details/FileplanetFiles_130000-130999 archived]
| 603
| 24G
| codebear
|-
| 131000-131999
| Done, [http://archive.org/details/FileplanetFiles_131000-131999 archived]
| 640
| 22G
| codebear
|-
| 132000-132999
| Done, [http://archive.org/details/FileplanetFiles_132000-132999 archived]
| 626
| 17G
| codebear
|-
| 133000-133999
| Done, [http://archive.org/details/FileplanetFiles_133000-133999 archived]
| 602
| 25G
| codebear
|-
| 134000-134999
| Done, [http://archive.org/details/FileplanetFiles_134000-134999 archived]
| 551
| 19G
| codebear
|-
| 135000-135999
| Done, [http://archive.org/details/FileplanetFiles_135000-135999 archived]
| 763
| 21G
| codebear
|-
| 136000-136999
| Done, [http://archive.org/details/FileplanetFiles_136000-136999 archived]
| 728
| 27G
| codebear
|-
| 137000-137999
| Done, [http://archive.org/details/FileplanetFiles_137000-137999 archived]
| 601
| 18G
| codebear
|-
| 138000-138999
| Done, [http://archive.org/details/FileplanetFiles_138000-138999 archived]
| 689
| 26G
| codebear
|-
| 139999-139999
| Done, [http://archive.org/details/FileplanetFiles_139000-139999 archived]
| 705
| 18G
| codebear
|-
| 140000-140999
| Done, [http://archive.org/details/FileplanetFiles_140000-140999 archived]
| 750
| 26G
| S[h]O[r]T
|-
| 141000-141999
| Done, [http://archive.org/details/FileplanetFiles_141000-141999 archived]
| 586
| 30G
| S[h]O[r]T
|-
| 142000-142999
| Done, [http://archive.org/details/FileplanetFiles_142000-142999 archived]
| 337
| 19G
| S[h]O[r]T
|-
| 143000-143999
| Done, [http://archive.org/details/FileplanetFiles_143000-143999 archived]
| 292
| 14G
| S[h]O[r]T
|-
| 144000-144999
| Done, [http://archive.org/details/FileplanetFiles_144000-144999 archived]
| 328
| 20G
| S[h]O[r]T
|-
| 145000-145999
| Done, [http://archive.org/details/FileplanetFiles_145000-145999 archived]
| 216
| 25G
| Schbirid
|-
| 146000-146999
| Done, [http://archive.org/details/FileplanetFiles_146000-146999 archived]
| 383
| 30G
| Schbirid
|-
| 147000-149999
| In progress
|
|
| Schbirid
|-
| 150000-150499
| Done, [http://archive.org/details/FileplanetFiles_150000-150499 archived]
| 216
| 15G
| S[h]O[r]T
|-
| 150500-150999
| Done, [http://archive.org/details/FileplanetFiles_150500-150999 archived]
| 270
| 13G
| S[h]O[r]T
|-
| 151000-151999
| Done, [http://archive.org/details/FileplanetFiles_151000-151499 archived]
| 310
| 19G
| S[h]O[r]T
|-
| 151500-151999
| Done, [http://archive.org/details/FileplanetFiles_151500-151999 archived]
| 244
| 17G
| S[h]O[r]T
|-
| 152000-152499
| Done, [http://archive.org/details/FileplanetFiles_152000-152499 archived]
| 234
| 19G
| S[h]O[r]T
|-
| 152500-152999
| Done, [http://archive.org/details/FileplanetFiles_152500-152999 archived]
| 255
| 13G
| S[h]O[r]T
|-
| 153000-153999
| Done, [http://archive.org/details/FileplanetFiles_153000-153999 archived]
| 287
| 19G
| S[h]O[r]T
|-
| 154000-154999
| In progress
|
|
| S[h]O[r]T
|-
| 155000-155999
| In progress
|
|
| S[h]O[r]T
|-
| 156000-156999
| In progress
|
|
| S[h]O[r]T
|-
| 157000-157999
| In progress
|
|
| S[h]O[r]T
|-
| 158000-158999
| In progress
|
|
| S[h]O[r]T
|-
| 159000-159999
| In progress
|
|
| S[h]O[r]T
|-
| 160000-179999
| open
| better use ranges of 500 here.
|
|
|-
| 180000-180499
| Done, [http://archive.org/details/FileplanetFiles_180000-180499 archived]
| 179
| 37G
| Schbirid
|-
| 180500-199999
| open
| better use ranges of 100-500 here.
|
|-
| 200000-200999
| Done, [http://archive.org/details/FileplanetFiles_200000-200099 archived] (bad URL)
| 247
| 41G
| Schbirid
|-
| 201000-219999
| open
| better use ranges of 100-500 here.
|
|-
| 220000-220499
| Done, [http://archive.org/details/FileplanetFiles_220000-220099 archived] (bad URL)
| 250
| 35G
| Schbirid
|-
| 220500+
|
|
|
| open
|-
|}
 
===Graphs===
[[File:Fileplanet number of IDs from the sitemaps per 1k range.png]]


{{navigation box}}
{{navigation box}}

Latest revision as of 21:36, 28 December 2023

FilePlanet
Fileplanet logo
Website host of game content, 1999-2012
Website host of game content, 1999-2012
URL http://www.fileplanet.com
Status Special case (no longer being updated)
Archiving status Saved!
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
(formerly #fireplanet (on EFnet))
Data[how to use] archiveteam-fileplanet

In 2012 FilePlanet announced it was no longer hosting new content, and "is in the process of being archived [by IGN]."

FilePlanet hosted tens of thousands of game-related files (demos, patches, mods, promo stuff, etc.). These tend to be larger files, ranging from 10MB patches to 3GB clients.

The archival

After first downloading files by iterating IDs on the public website fileplanet.com and upload those in chunks to archive.org as well as scouting the web for other public URLs, we got FTP access to the storage servers by the staff. Thanks!

https://archive.org/details/archiveteam-fileplanet is the collection.

Unpacked and sorted it amounts to about ~120k files at ~10TB. The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact User:Schbirid with archived URLs that prove their previous availability to the public, e.g. via archived fileplanet.com pages.

There is a half-assed search interface available at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?filename=yourfilenamehere and a directory browser at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?directory=/some/dir/here/

Related items