Difference between revisions of "Nifty"

Revision as of 12:35, 13 September 2016

Nifty
Japanese ISP with web hosting
URL	homepage.nifty.com
Status	Closing
Archiving status	Not saved yet
Archiving type	Unknown
IRC channel	#archiveteam-bs (on hackint)

Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-09-29. Termination notice^{[IA•Wcite•.today•MemWeb]} (Japanese)

http://homepage.nifty.com/USERNAME/
http://homepage2.nifty.com/USERNAME/
http://homepage3.nifty.com/USERNAME/

URL harvesting

Let's follow Site exploration.

<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard
<polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com
<polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20
<zout> there's some here. https://archive.is/homepage2.nifty.com

Progress

On 2016-09-12, User:Sanqui harvested 8884 *.nifty.com URLs from Wikimedia sites using mwlinkscrape

Next steps

Make attempts at scraping Google, Bing, Twitter using hints on Site exploration
Scrape hatena
Scrape archive.is
Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.)
Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw

@@ Line 27: / Line 27: @@
 <zout> there's some here. https://archive.is/homepage2.nifty.com
 </pre>
+=== Progress ===
+* On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]]
+Next steps
+* Make attempts at scraping Google, Bing, Twitter using hints on [[Site exploration]]
+* Scrape hatena
+* Scrape archive.is
+* Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.)
+* Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw

Difference between revisions of "Nifty"

Revision as of 12:35, 13 September 2016

URL harvesting

Progress

Navigation menu

Search