|Archiving status||In progress...|
|IRC channel||(on hackint)|
GeoCities Japan is the Japanese version of GeoCities. It survived the 2009 shutdown of the global platform.
On 2018-10-01, Yahoo! Japan announced that they would be closing GeoCities at the end of March 2019. (New accounts can still be created until 2019-01-10.)
- DNS CNAMEs for geocities (JSON format) (dead link): https://transfer.sh/QYWEG/geocities-dns-data
- Several records available at: https://anonfile.com/z1z62ak8ba/records_zip
- geocities_jp_first.txt: First level subdirectory list under geocities.jp, compiled from IA CDX data. 566,690 records in total.
- geocities_co_jp_first.txt: Same as above, for geocities.co.jp. 12,470 records in total.
- NOTE: The majority of sites under geocities.co.jp are not first-level sites, but "field" sites which are second-level (there could be, in theory, 1.79M of them; how many actually exist unknown), see explanation below.
- blogs_yahoo_co_jp_first.txt: Same as above, for blogs.yahoo.co.jp. 646,901 records in total.
- geocities_co_jp_fields.txt: List of field names under geocities.co.jp.
- Individual websites are listed in the following format: "http://www.geocities.co.jp/[FieldName]/[AAAA]" where AAAA ranges from 0000 to 9999.
- include-surts.txt: List of subdomains that should be allowed by your crawler.
- geocities.jp grab from E-Shuushuu Wiki, crawled as job:cu6azkjwy45qmo1wwdxsdfusj: https://pastebin.com/raw/17hLpsN5:
- geocities.jp grab from Danbooru, crawled as job:5x0pf7wloqgeqc2r9rddino2l: https://gist.githubusercontent.com/DoomTay/12a146e35fcee745b764ba3ae3c7545f/raw/863a021e43e0c93cb6f8943725a2ef5d1a699477/geocities-danbooru.txt
- geocities.co.jp and missed geocities.jp URLs grabbed from the above targets, crawled as job:31ges4c4c96k140sp6zah5vcc: https://transfer.sh/CLtZc/geocities-patch.txt (dead link)
- geocities.co.jp and geocities.jp crawl from Miss Surfersparadise, crawled as job:e8ynrp5a7p4vwjkyxw9eph9p0: https://archive.org/download/archiveteam_archivebot_go_20181021150002/urls-transfer.sh-geocities-misssp.txt-inf-20181007-102152-3ntkw-urls.txt