Difference between revisions of "Fileplanet"

From Archiveteam
Jump to navigation Jump to search
m (broken headline)
 
(214 intermediate revisions by 11 users not shown)
Line 2: Line 2:
| title = FilePlanet
| title = FilePlanet
| logo = Fileplanet_logo.jpg
| logo = Fileplanet_logo.jpg
| description = Website host of game content, 1996-2012
| description = Website host of game content, 1999-2012
| URL = http://www.fileplanet.com
| URL = http://www.fileplanet.com
| image = Fileplanet_snap.png
| image = Fileplanet_snap.png
| project_status = {{closing}}
| project_status = {{specialcase}} (no longer being updated)
| archiving_status = {{inprogress}}
| archiving_status = {{saved}}
| irc = fireplanet
| irc = fireplanet
| irc_network = EFnet
| irc_abandoned = true
| data = {{IA collection|archiveteam-fileplanet}}
}}
}}


[http://www.fileplanet.com FilePlanet] is no longer hosting new content, and "is in the process of being archived [by IGN]."
In 2012 [http://www.fileplanet.com FilePlanet] announced it was no longer hosting new content, and "is in the process of being archived [by IGN]."


FilePlanet hosts 87,190 download pages of game-related material (demos, patches, mods, promo stuff, etc.), which needs to be archived. These tend to be larger files, ranging from 10MB patches to 3GB clients.  We'll want all the arms we can for this one, since it gets harder the farther the archiving goes (files are numbered chronologically, and Skyrim mods are bigger than Doom ones).
FilePlanet hosted tens of thousands of game-related files (demos, patches, mods, promo stuff, etc.). These tend to be larger files, ranging from 10MB patches to 3GB clients.


To help: Download and run [https://github.com/SpiritQuaddicted/fileplanet-file-download/blob/master/download_pages_and_files_from_fileplanet.sh Schbirid's script].  This grabs the files and the source page, for its metadata.  You'll need to know what the start and stop parameters are - get those numbers from Schbirid in IRC.
===The archival===


[http://www.quaddicted.com/forum/viewtopic.php?pid=251 Here] is a thread keeping track of current status (currently on 60k).
After first downloading files [[Fileplanet/Status_of_by_id_grab|by iterating IDs on the public website fileplanet.com and upload those in chunks to archive.org]] as well as [[Fileplanet/non-id-urls|scouting the web for other public URLs]], we got FTP access to the storage servers by the staff. Thanks!


===What We Need===
https://archive.org/details/archiveteam-fileplanet is the collection.


* Files! (approx. 15% done 5/10/12)
Unpacked and sorted it amounts to about ~120k files at ~10TB. The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact [[User:Schbirid]] with archived URLs that prove their previous availability to the public, e.g. via archived fileplanet.com pages.
* /fileinfo/ pages - get URLs from sitemaps (Schbirid is downloading these)
* [http://blog.fileplanet.com http://blog.fileplanet.com]
* A list of all "site:www.fileplanet.com inurl:hosteddl" URLs since these files seem not to be in the simple ID range


===How to help===
<gallery>
File:Fileplanet ftp structure.png|FTP structure
File:Fileplanet ftp restructured File Size Statistics.png|Size statistics
File:Fileplanet ftp restructured File Age.png|Age statistics
File:Fileplanet ftp restructured File Type Statistics.png|File type statistics
File:Fileplanet ftp restructured Largest Files.png|Largest files
</gallery>


* Have bash, wget, grep, rev, cut
There is a half-assed search interface available at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?filename=yourfilenamehere and a directory browser at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?directory=/some/dir/here/
* >30 gigabytes of space per 5k increment
* Put https://github.com/SpiritQuaddicted/fileplanet-file-download/blob/master/download_pages_and_files_from_fileplanet.sh somewhere and "chmod +x" it
* Pick a free 5k increment (eg 110000-114999) and tell people about it (#fireplanet in EFnet or post it here)
* Create a new working directory for your download, named after your chunk. Eg 110000-114999/
* INSIDE that directory, run the script with your start and end IDs as arguments. Eg "<code>./download_pages_and_files_from_fileplanet.sh 110000 114999</code>"
* Take a walk for half a day.


* Once you are done with your chunk, you will have pages_xx000-xx999.log and files_xx000-xx999.log plus the www.fileplanet.com/ directory.
===Related items===
* Do a "<code>grep -i error *.log</code>" first and see if there were error messages. If so, tell us.
* /fileinfo/ pages and the embedded images/thumbnails from the grab by IDs: https://archive.org/details/FileplanetFiles_fileinfo_pages_images
* "<code>cd ..</code>" and "<code>tar -cf 110000-114999.tar 110000-114999/</code>"
* /download/ pages and download logs from the grab by IDs: https://archive.org/details/Fileplanet_index.htmls_and_logs_scraped_by_id
* "<code>du -hs 110000-114999/www.fileplanet.com/</code>" and "<code>ls -1 110000-114999/www.fileplanet.com/ | wc -l</code>" and add those numbers to the table below.
* http://blog.fileplanet.com: https://archive.org/details/FileplanetBlogFileplanetCom
* Done! GOTO 10
* http://www.fileplanet.com/fileblog/archives/: https://archive.org/details/FileplanetFileblog


In the end we'll upload all the parts to archive.org. If you have an account, you can use eg s3cmd.
{{navigation box}}
 
<code>s3cmd --add-header x-archive-auto-make-bucket:1 --add-header "x-archive-meta-description:Files from Fileplanet (www.fileplanet.com), all files from the ID range 110000 to 114999." put 110000-114999.tar s3://FileplanetFiles_110000-114999</code>
 
<code>s3cmd put 110000-114999/*.log s3://FileplanetFiles_110000-114999/</code>
 
Mind the trailing slash.
 
===Status===
{| class="wikitable"
|-
! Range
! Status
! Number of files
| Size in gigabytes
| Downloader
|-
| 00000-09999
| Done, [http://archive.org/details/FileplanetFiles_00000-09999 archived]
| 1991
| 1G
| Schbirid
|-
| 10000-19999
| Done, [http://archive.org/details/FileplanetFiles_10000-19999 archived]
| 3159
| 9G
| Schbirid
|-
| 20000-29999
| Done, locally
| 6453
| 7G
| Schbirid
|-
| 30000-39999
| Done, locally
| 4085
| 9G
| Schbirid
|-
| 40000-49999
| Done, [http://archive.org/details/FileplanetFiles_40000-49999 archived]
| 5704
| 18G
| Schbirid
|-
| 50000-54999
| Done, locally
| 2706
| 24G
| Schbirid
|-
| 55000-59999
| Done, [http://archive.org/details/FileplanetFiles_50000-559999 archived] (bad URL)
| 2390
| 24G
| Schbirid
|-
| 60000-64999
| Done, locally
| 2349
| 24G
| Schbirid
|-
|}

Latest revision as of 21:36, 28 December 2023

FilePlanet
Fileplanet logo
Website host of game content, 1999-2012
Website host of game content, 1999-2012
URL http://www.fileplanet.com
Status Special case (no longer being updated)
Archiving status Saved!
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
(formerly #fireplanet (on EFnet))
Data[how to use] archiveteam-fileplanet

In 2012 FilePlanet announced it was no longer hosting new content, and "is in the process of being archived [by IGN]."

FilePlanet hosted tens of thousands of game-related files (demos, patches, mods, promo stuff, etc.). These tend to be larger files, ranging from 10MB patches to 3GB clients.

The archival

After first downloading files by iterating IDs on the public website fileplanet.com and upload those in chunks to archive.org as well as scouting the web for other public URLs, we got FTP access to the storage servers by the staff. Thanks!

https://archive.org/details/archiveteam-fileplanet is the collection.

Unpacked and sorted it amounts to about ~120k files at ~10TB. The "ftp2" files (another ~300k files at a total of ~1.2TB) cannot be shared publically since there are private files mixed in, we saved them to IA anyways so maybe in the future we can sort them out. If you are looking for files from Fileplanet that are not included in the public archives, contact User:Schbirid with archived URLs that prove their previous availability to the public, e.g. via archived fileplanet.com pages.

There is a half-assed search interface available at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?filename=yourfilenamehere and a directory browser at https://www.quaddicted.com/stuff/fileplanet/fileplanet.php?directory=/some/dir/here/

Related items