Difference between revisions of "Nifty"
Jump to navigation
Jump to search
m |
(looks like I gotta take this in my own hands) |
||
Line 27: | Line 27: | ||
<zout> there's some here. https://archive.is/homepage2.nifty.com | <zout> there's some here. https://archive.is/homepage2.nifty.com | ||
</pre> | </pre> | ||
=== Progress === | |||
* On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]] | |||
Next steps | |||
* Make attempts at scraping Google, Bing, Twitter using hints on [[Site exploration]] | |||
* Scrape hatena | |||
* Scrape archive.is | |||
* Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.) | |||
* Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw |
Revision as of 12:35, 13 September 2016
Nifty | |
Japanese ISP with web hosting | |
URL | homepage.nifty.com |
Status | Closing |
Archiving status | Not saved yet |
Archiving type | Unknown |
IRC channel | #archiveteam-bs (on hackint) |
Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-09-29. Termination notice[IA•Wcite•.today•MemWeb] (Japanese)
http://homepage.nifty.com/USERNAME/ http://homepage2.nifty.com/USERNAME/ http://homepage3.nifty.com/USERNAME/
URL harvesting
Let's follow Site exploration.
<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard <polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com <polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20 <zout> there's some here. https://archive.is/homepage2.nifty.com
Progress
- On 2016-09-12, User:Sanqui harvested 8884 *.nifty.com URLs from Wikimedia sites using mwlinkscrape
Next steps
- Make attempts at scraping Google, Bing, Twitter using hints on Site exploration
- Scrape hatena
- Scrape archive.is
- Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.)
- Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw