Difference between revisions of "Nifty"

From Archiveteam
Jump to navigation Jump to search
m
(looks like I gotta take this in my own hands)
Line 27: Line 27:
<zout> there's some here. https://archive.is/homepage2.nifty.com
<zout> there's some here. https://archive.is/homepage2.nifty.com
</pre>
</pre>
=== Progress ===
* On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]]
Next steps
* Make attempts at scraping Google, Bing, Twitter using hints on [[Site exploration]]
* Scrape hatena
* Scrape archive.is
* Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.)
* Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw

Revision as of 12:35, 13 September 2016

Nifty
Japanese ISP with web hosting
Japanese ISP with web hosting
URL homepage.nifty.com
Status Closing
Archiving status Not saved yet
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-09-29. Termination notice[IAWcite.todayMemWeb] (Japanese)

http://homepage.nifty.com/USERNAME/
http://homepage2.nifty.com/USERNAME/
http://homepage3.nifty.com/USERNAME/

URL harvesting

Let's follow Site exploration.

<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard
<polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com
<polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20
<zout> there's some here. https://archive.is/homepage2.nifty.com

Progress

Next steps

  • Make attempts at scraping Google, Bing, Twitter using hints on Site exploration
  • Scrape hatena
  • Scrape archive.is
  • Write a script to unravel URLs (when only a subpage was linked, we want to get the homepage itself too), order strategically by some simple heuristic (Wikipedia gets priority, then high ranking sites on Google, etc.)
  • Begin feeding lists, split into reasonable chunks, into ArchiveBot after consulting with yipdw