Difference between revisions of "GeoCities URL Lists"

From Archiveteam
Jump to navigation Jump to search
(Add section '''URLs drawn from specific sources''')
Line 1: Line 1:
* swebb3's initial biglist: http://badcheese.com/~steve/biglist.txt  Mirror: http://cshaiku.com/biglist.txt
* swebb3's initial biglist: http://badcheese.com/~steve/biglist.txt  Mirror: http://cshaiku.com/biglist.txt
* sods list : [http://blog.odonnell.nu/static/sites.tar.bz2] - over 400,000 unique geocities sites (not pages), I don't have the ability to download them, hopefully some of the downloaders can make use of this.
* sods list : [http://blog.odonnell.nu/static/sites.tar.bz2] - over 400,000 unique geocities sites (not pages), I don't have the ability to download them, hopefully some of the downloaders can make use of this.
== URLs drawn from specific sources ==
It is especially important to back up URLs linked from news sites and other project that cared about the quality of the sites they link too. The following URL lists are all extracted from dumps of/crawling these sites:
* [http://soultcer.net/geocities-urls/digg.bz2 Links to geocities from digg.com] (thanks berticus)
* [http://soultcer.net/geocities-urls/dmoz.bz2 Links to geocities from the Open Directory Project (dmoz.org)]
* [http://soultcer.net/geocities-urls/slashdot.bz2 Links to geocities from slashdot.org] (thanks berticus)
* [http://soultcer.net/geocities-urls/wikipedia_de.bz2 Links to geocities from the German language wikipedia]

Revision as of 02:20, 1 May 2009

URLs drawn from specific sources

It is especially important to back up URLs linked from news sites and other project that cared about the quality of the sites they link too. The following URL lists are all extracted from dumps of/crawling these sites: