Difference between revisions of "GeoCities Project"
Revision as of 17:56, 27 October 2009
Upon the news of the closing of Geocities by Yahoo, Archive Team initiated the Geocities Project, a coordinated effort to rescue as much of Geocities' data off of the to-be-decomissioned Geocities servers. This project was begun in April of 2009, and continued throughout the summer of 2009 up to the closing date of October 26, 2009 by Yahoo. A list of Frequently Asked Questions about this project was generated and is available Here.
Parallel to our efforts (and in conjunction with them) archive.org began a major "deep crawl" of Geocities to add to their wayback machine. The page for their project is here. Please note that Archive Team and archive.org are 100% separate entities, with different approaches to the project of saving data and history.
It can not be stressed enough how many people were involved with this project - some preferred to be behind the scenes, while Jason Scott continued his habit of being a complete media hog, getting a lot of the interviews and face time with people asking what was up. But there were dozens of people involved, and they supplied weeks of time and effort to find efficient ways to download all of this data before it was removed.
Technical Details About Geocities
These are now-defunct facts about Geocities, culled from various sources, intended to provide some technical context for the arrangement of Geocities that were discovered during the harvesting phase of data.
Before the acquisition by Yahoo, Geocities used an unusual organization method for its userbase: Neighborhoods. Separating the subject matter of the pages by taste, neighborhoods with names like Area51 (Science Fiction and Fantasy), Nashville (Country Music), Augusta (Golf) and others allowed for an easier time of finding subject matter the browser was searching for. It helps to give context that search engines as the modern world knows them did not exist in such force.
A neighborhood would have up to 9,999 accounts underneath them, with the numbers representing the user's "block". Over time, Geocities added "Suburbs", which allowed an expansion past 9,999 users; these would have names like "Vault" and "Cavern" under the "Area51" neighborhood. A URL would then be available in the form of www.geocities.com/NEIGHBORHOOD/SUBURB/XXXX.
Geocities Homestead Neighborhoods and Suburbs, although having not been updated since 2007, gives an excellent overview of the Geocities history of Neighborhood organization.
The Various Names and Incarnations of Geocities
Originally called Beverly Hills Internet, the company opened up free web hosting in 1995 after a beta period.  It renamed itself to Geopages, and then Geocities. After its acquisition by Yahoo, its name was changed to Yahoo Geocities, which is what it remained until demise.
The Size and Amount of Geocities Accounts
Geocities would provide a limited amount of space for its users to build websites, although this amount grew over time. While the most famous about is fifteen megabytes per site, the number was actually much more variant and changed through different amounts over its lifetime. This is an attempt to find citations of the size from various sources; it is clear from the various points of reference that different people got different deals through Geocities over the years, especially with regard to paid versus free hosting.
This small size explains the usual look and feel of Geocities accounts, as users were naturally restricted in what items they could have on their pages, and would lean towards simple graphics or utilizing hotlinsk to build their look.
- 1997: 2mb Limit for Geocities. 
- April 29, 1997: Geocities welcomes its 500,000th "Homesteader" and increases the limit to 11mb. 
- 1998: 15mb limit for small business service 
- 1999: Geocities has 12 terabytes of storage. 
- 2001: 15mb for Geocities, 25mb for $8.95 a month 
- 2002: 15mb Limit for Geocities.
- 2002: 25mb for the newly introduced "Geocities Plus"
- 2003: 25mb for Geocities Plus (As of June)
- 2005: 75mb for Geocities Plus (As of January)
- 2005: 25mb for Geocities Plus (As of April)
Yahoo's Site Explorer showed 23M html pages in Yahoo's index as of April 29th, 2009.
Tips n' Tricks
- Although simple directory listings aren't accessible user's accounts, you might be able to obtain Apache-style directory listing for their subdirectories. For example, by stripping off the page filename for http://www.geocities.com/nenehs_world1/discography/homebrew.html, we can obtain an index for the subdirectory http://www.geocities.com/nenehs_world1/discography/; the benefit of this is that there may exist files which are not linked internally or externally, so crawlers are not made aware of them. Unfortunately, it seems many users do not organize their content into subdirectories, instead preferring to dump all files directly into the user directory. Also, they may have been good webmasters and provided a directory index which overrides directory listings.
- User:Jscott, Joey paulprote and many others are downloading the main www.geocities.com stuff.
- User:Soult downloaded parts of de.geocities.com, which is available as tar archive here (download takes 1-2 minutes to start before the first packets arrive, be patient)
- User:Bbot is mirroring downloaded content.
- User:Scumola is crawling geocities using the archive.org crawler but on hold in June due to Comcast's 250GB bandwidth limit. Will resume in July.
- Asheesh Laroia (User:Paulproteus) helped test User-Agent tricks to download from Geocities, and purchased geociti.es.
- User:Gouki, is downloading br.geocities.com.
- User:Jourdy288 is going to try to save Sega Master System Land.