Difference between revisions of "Google Reader"

From Archiveteam
Jump to navigation Jump to search
(→‎Archiving: explain a little more how to use CDXes; link my master CDX index which combines them all for easier download & searching)
 
(25 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
| URL = {{url|1=http://www.google.com/reader/}}
 
| URL = {{url|1=http://www.google.com/reader/}}
 
| image = greader_screenshot_en.gif
 
| image = greader_screenshot_en.gif
| project_status = {{online}}
+
| project_status = {{closed}}
| archiving_status = {{inprogress}}
+
| archiving_status = {{saved}}
 
| source = [https://github.com/ArchiveTeam/greader-grab greader-grab]<br>
 
| source = [https://github.com/ArchiveTeam/greader-grab greader-grab]<br>
 
[https://github.com/ArchiveTeam/greader-directory-grab greader-directory-grab]<br>
 
[https://github.com/ArchiveTeam/greader-directory-grab greader-directory-grab]<br>
Line 16: Line 16:
 
| irc = donereading
 
| irc = donereading
 
}}
 
}}
= Quick Info =
+
'''Google Reader''' was an RSS feed reader, launched by [[Google]] in 2005 and killed off in 2013.
=== Shutdown notification ===
+
== Shutdown notification ==
 
On the March 13, Google announced that they'll "spring clean" Google Reader at [http://googlereader.blogspot.com/2013/03/powering-down-google-reader.html Official Google Reader Blog]:
 
On the March 13, Google announced that they'll "spring clean" Google Reader at [http://googlereader.blogspot.com/2013/03/powering-down-google-reader.html Official Google Reader Blog]:
<blockquote>we will soon retire Google Reader (the actual date is July 1, 2013)</blockquote>
+
:'''Powering Down Google Reader'''
 +
:''3/13/2013 04:06:00 PM''
 +
:''Posted by Alan Green, Software Engineer''
 +
:We have just announced on the [http://googleblog.blogspot.com/2013/03/a-second-spring-of-cleaning.html Official Google Blog] that we will soon retire Google Reader (the actual date is July 1, 2013). We know Reader has a devoted following who will be very sad to see it go. We’re sad too.
 +
:There are two simple reasons for this: usage of Google Reader has declined, and as a company we’re pouring all of our energy into fewer products. We think that kind of focus will make for a better user experience.
 +
:To ensure a smooth transition, we’re providing a three-month sunset period so you have sufficient time to find an alternative feed-reading solution. If you want to retain your Reader data, including subscriptions, you can do so through [https://www.google.com/takeout/?pli=1#custom:reader Google Takeout].
 +
:Thank you again for using Reader as your RSS platform.
 +
Reader and Reader API were turned off soon after midnight, Pacific time, July 2.
  
=== Backing up your own data ===
+
== Post-shutdown message ==
* Main page - [http://www.google.com/reader/ google.com/reader/]
+
:'''Thank you for stopping by.'''
* Export via [https://www.google.com/takeout/ Google Takeout]
+
:Google Reader has been [http://googleblog.blogspot.com.au/2013/03/a-second-spring-of-cleaning.html discontinued]. We want to thank all our loyal fans. We understand you may not agree with this decision, but we hope you'll come to love [http://alternativeto.net/software/google-reader/ these alternatives] as much as you loved Reader.
** Contains subscriptions and starred items, but not tags
+
:Sincerely,
** Can be imported into [http://theoldreader.com/ The Old Reader]
+
:The Google Reader team
* API: https://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI
+
:'''Frequently-asked questions'''
 +
:# '''What will happen to my Google Reader data?''''<br />All Google Reader subscription data (eg. lists of people that you follow, items you have starred, notes you have created, etc.) will be systematically deleted from Google servers. You can download a copy of your Google Reader data via [https://www.google.com/takeout/#custom:reader Google Takeout] until 12PM PST July 15, 2013.
 +
:# '''Will there be any way to retrieve my subscription data from Google in the future?'''<br />No -- all subscription data will be permanently, and irrevocably deleted. Google will not be able to recover any Google Reader subscription data for any user after July 15, 2013.
 +
:# '''Why was Google Reader discontinued?'''<br />Please refer to our [http://googleblog.blogspot.com.au/2013/03/a-second-spring-of-cleaning.html blog post] for more information.
  
=== Backing up historical feed data ===
+
== Archiving ==
Google Reader acts as a cache for RSS/Atom feed content, keeping deleted posts and deleted blogs readable (if you can recreate the RSS/Atom feed URL).  After the Reader shutdown, only a small portion (100 posts per blog) [https://groups.google.com/forum/?fromgroups=#!topic/Google-AJAX-Search-API/OaGf0eP57js will be available via the Feeds API], so it is imperative we grab everything before July 1 through the <tt>/reader/</tt> API. To help, read on.
+
Archive Team's [[User:Ivan|Ivan]] launched a heroic effort to retrieve historical feed data from the Google Reader API. Details can be found in the [[Google Reader/War room| war room]].
  
= How you can help =
+
All WARCs have been uploaded to http://archive.org/details/archiveteam_greader. The total size is about 8800 GB (feed data + directory + stats).
  
=== Upload your feed URLs ===
+
We don't yet have a convenient tool to read a specific feed in the uploaded megawarcs.  The [https://archive.org/details/archiveteam-googlereader201306-indexes.cdx master CDX index] (see the [https://archive.org/web/researcher/cdx_file_format.php CDX file format for interpreting each entry]) gives metadata about which megawarc an archived file/feed/URL is in, and also the file's byte range, which allows seeking directly to it in the megawarc.
  
We need to discover as many feed URLs as possible.  Not all of them can be discovered through crawling, so so please upload your OPML files.  (Though if you have any private or passworded feeds, please strip them out.)
+
== Backing up your own data ==
 
+
* Main page - [http://www.google.com/reader/ google.com/reader/]
<big><b>Upload OPML files and lists of URLs to:
+
* Export some data via [https://www.google.com/takeout/ Google Takeout] until July 15. Your ZIP file will contains subscriptions and starred items, but not tags. This can be imported into any of the [http://getgini.com/google-reader-alternatives Google Reader alternatives].
 
+
* Before it shut down, you could export *everything* with [http://readerisdead.com/ Reader is Dead]
http://allyourfeed.ludios.org:8080/
 
</b></big>
 
 
 
=== Run the grab on your Linux machine ===
 
 
 
This project is not in the Warrior yet, so follow the install steps on these projects:
 
 
 
https://github.com/ArchiveTeam/greader-grab (grabs the [https://www.google.com/reader/api/0/stream/contents/feed/http%3A%2F%2Fgoogleblog.blogspot.com%2Ffeeds%2Fposts%2Fdefault?r=n&n=1000 actual text content of feeds])
 
 
 
https://github.com/ArchiveTeam/greader-directory-grab ([https://www.google.com/reader/view/#directory-search/test//0 searches for feeds using Reader's Feed Directory])
 
 
 
https://github.com/ArchiveTeam/greader-stats-grab ([https://www.google.com/reader/api/0/stream/details?s=feed%2Fhttp%3A%2F%2Fgoogleblog.blogspot.com%2Ffeeds%2Fposts%2Fdefault&tz=0&fetchTrends=true&output=json&client=scroll grabs subscriber counts and other data])
 
 
 
(Up to ~5GB of your disk space will be used; items are immediately uploaded elsewhere.)
 
 
 
=== Crawl websites to discover blogs and usernames ===
 
 
 
We need to discover millions of blog/username URLs on popular blogging platforms (which we'll turn into feed URLs).
 
 
 
Join [http://chat.efnet.org:9090/?nick=&channels=%23donereading&Login=Login #donereading] and [http://chat.efnet.org:9090/?nick=&channels=%23archiveteam&Login=Login #archiveteam] on efnet if you'd like to help with this.
 
 
 
The counts listed below are underestimates; please ask on IRC for updated counts.
 
 
 
See https://github.com/ludios/greader-item-maker/blob/master/url_filter.py for additional sites not listed here.
 
 
 
* *.tumblr.com [12,065,345 discovered through IA and commoncrawl]
 
** http<font></font>://USERNAME.tumblr<font></font>.com/rss
 
* *.livejournal.com [211,146 discovered through commoncrawl]
 
** http://USERNAME.livejournal.com/data/rss
 
** http://USERNAME.livejournal.com/data/atom
 
** http://USERNAME.livejournal.com/data/rss/
 
** http://USERNAME.livejournal.com/data/atom/
 
** http://www.livejournal.com/users/USERNAME/data/atom/ (older feed location for users)
 
** http://www.livejournal.com/users/USERNAME/data/rss/ (older feed location for users)
 
** http://www.livejournal.com/users/USERNAME/data/atom (older feed location for users)
 
** http://www.livejournal.com/users/USERNAME/data/rss (older feed location for users)
 
** http://community.livejournal.com/COMMUNITY/data/rss (older feed location for communities)
 
** http://community.livejournal.com/COMMUNITY/data/atom (older feed location for communities)
 
** http://www.livejournal.com/community/COMMUNITY/data/rss (older feed location for communities)
 
** http://www.livejournal.com/community/COMMUNITY/data/atom (older feed location for communities)
 
* *.wordpress.com [1,319,787 discovered through commoncrawl]
 
** http://BLOGNAME.wordpress.com/feed/
 
** https://BLOGNAME.wordpress.com/feed/
 
** http://BLOGNAME.wordpress.com/feed/atom/
 
** https://BLOGNAME.wordpress.com/feed/atom/ (probably low hit rate)
 
** http://BLOGNAME.wordpress.com/feed/rss/
 
** https://BLOGNAME.wordpress.com/feed/rss/ (probably low hit rate)
 
** http://BLOGNAME.wordpress.com/feed
 
** https://BLOGNAME.wordpress.com/feed
 
** http://BLOGNAME.wordpress.com/comments/feed/
 
** https://BLOGNAME.wordpress.com/comments/feed
 
* Wordpress blogs not on wordpress.com (easily identified by URLs containing "wp-content" or – with some false positives – by searching for '[0-9]{4}/[0-9]{2}/')
 
** SCHEMA+DOMAIN/feed/
 
** SCHEMA+DOMAIN/feed
 
** SCHEMA+DOMAIN/feed/rss/
 
** SCHEMA+DOMAIN/feed/atom/
 
** SCHEMA+DOMAIN/comments/feed/
 
** SCHEMA+DOMAIN/comments/feed
 
* *.blogspot.com [4,179,274 discovered through commoncrawl]
 
** http://BLOGNAME.blogspot.com/feeds/posts/default
 
** http://BLOGNAME.blogspot.com/feeds/posts/default?alt=rss
 
** http://BLOGNAME.blogspot.com/atom.xml (older feed)
 
** http://BLOGNAME.blogspot.com/rss.xml (older feed)
 
** http://www.BLOGNAME.blogspot.com/feeds/posts/default
 
** http://www.BLOGNAME.blogspot.com/feeds/posts/default?alt=rss
 
** http://www.BLOGNAME.blogspot.com/atom.xml (older feed)
 
** http://www.BLOGNAME.blogspot.com/rss.xml (older feed)
 
** http://BLOGNAME.blogspot.com/feeds/THREADID/comments/default
 
*** e.g. http://digicmb.blogspot.com/feeds/206744415950084609/comments/default
 
* blogger.com feeds, mostly redundant with blogspot.com
 
** http://www.blogger.com/feeds/*/posts/default
 
** http://www.blogger.com/feeds/*/posts/default?alt=rss
 
** https://www.blogger.com/feeds/*/posts/default
 
** https://www.blogger.com/feeds/*/posts/default?alt=rss
 
* http://feeds.feedburner.com/* [455,213 discovered through commoncrawl]
 
** http://feeds.feedburner.com/FEEDNAME
 
** +lowercase FEEDNAME
 
* http://feeds2.feedburner.com/*
 
** http://feeds2.feedburner.com/FEEDNAME
 
** +lowercase FEEDNAME
 
* http://feeds.rapidfeeds.com/*/ [generated 1-60,000]
 
** e.g. http://feeds.rapidfeeds.com/35746/
 
* *.posterous.com [9,901,701 discovered through spidering and commoncrawl]
 
** http://USERNAME.posterous.com/rss.xml
 
** https://USERNAME.posterous.com/rss.xml
 
* http://groups.google.com/group/* [13,966 discovered through commoncrawl]
 
** http://groups.google.com/group/GROUPNAME/feed/rss_v2_0_msgs.xml
 
** https://groups.google.com/group/GROUPNAME/feed/rss_v2_0_msgs.xml
 
** http://groups.google.com/group/GROUPNAME/feed/atom_v1_0_msgs.xml
 
** https://groups.google.com/group/GROUPNAME/feed/atom_v1_0_msgs.xml
 
* http://groups.yahoo.com/group/*/ [48,352 discovered through commoncrawl]
 
** http://rss.groups.yahoo.com/group/GROUPNAME/rss
 
** http://groups.yahoo.com/group/GROUPNAME/messages?rss=1 (older feed)
 
** +lowercase GROUPNAME
 
* *.typepad.com [77,983 domain-blogname pairs discovered through commoncrawl]
 
* *.typepad.jp
 
* http://blog.roodo.com/*
 
* *.diarynote.jp
 
* ameblo.jp/*
 
* http://www.wretch.cc/blog/*
 
* http://www.formspring.me/profile/USERNAME.rss
 
* *.blog.shinobi.jp
 
* *.exblog.jp [114,359 discovered through commoncrawl]
 
** http://BLOGNAME.exblog.jp/index.xml
 
** http://BLOGNAME.exblog.jp/atom.xml
 
** http://rss.exblog.jp/rss/exblog/BLOGNAME/index.xml
 
** http://rss.exblog.jp/rss/exblog/BLOGNAME/atom.xml
 
* http://*.blog.hexun.com
 
** http://USERNAME.blog.hexun.com/rss2.aspx
 
** http://fulltextrssfeed.com/USERNAME.blog.hexun.com/rss2.aspx
 
*** e.g. http://bbc1030.blog.hexun.com.tw/rss2.aspx
 
* http://*.blog.hexun.com.tw
 
** http://USERNAME.blog.hexun.com.tw/rss2.aspx
 
** http://fulltextrssfeed.com/USERNAME.blog.hexun.com.tw/rss2.aspx
 
* http://blog.livedoor.jp/*
 
** http://blog.livedoor.jp/BLOGNAME/index.rdf
 
** http://blog.livedoor.jp/BLOGNAME/atom.xml
 
* http://*.altervista.org/
 
* http://*.qzone.qq.com/
 
** http://feeds.qzone.qq.com/cgi-bin/cgi_rss_out?uin=QQID
 
** e.g. http://feeds.qzone.qq.com/cgi-bin/cgi_rss_out?uin=469826844
 
* http://*.blog.163.com/rss/
 
** http://USERNAME.blog.163.com/rss/
 
** e.g. http://hxcy1965.blog.163.com/rss/
 
* http://*.inube.com/
 
* http://*.my.nero.com/
 
** https://www.google.com/reader/view/#stream/feed%2Fhttp%3A%2F%2Frss.my.nero.com%2FlatestUser lists the usernames
 
* http://www.feed43.com/*
 
** e.g. http://feed43.com/6237213781584644.xml
 
* http://*.blog4ever.com/
 
* http://*.xanga.com/ (previously http://<font></font>www.xanga.com/* )
 
** http://<font></font>USERNAME.xanga.com/rss
 
** http://<font></font>USERNAME.xanga.com/rss/
 
** http://<font></font>www.xanga.com/rss.aspx?user=USERNAME
 
** http://<font></font>www.xanga.com/USERNAME/rss
 
* http://*.pixnet.net/
 
** http://feed.pixnet.net/blog/posts/rss/USERNAME
 
** http://feed.pixnet.net/blog/posts/atom/USERNAME
 
* twitter.com/* [~40M discovered through various datasets]
 
** http://twitter.com/statuses/user_timeline/USER-ID.rss (older feed)
 
** https://twitter.com/statuses/user_timeline/USER-ID.rss (older feed)
 
** http://twitter.com/statuses/user_timeline/USER-ID.atom (older feed)
 
** https://twitter.com/statuses/user_timeline/USER-ID.atom (older feed)
 
** http://twitter.com/statuses/user_timeline/USERNAME.rss (older feed)
 
** https://twitter.com/statuses/user_timeline/USERNAME.rss (older feed)
 
** http://twitter.com/statuses/user_timeline/USERNAME.atom (older feed)
 
** https://twitter.com/statuses/user_timeline/USERNAME.atom (older feed)
 
**
 
** http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=USERNAME
 
** https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=USERNAME
 
** http://api.twitter.com/1/statuses/user_timeline.atom?screen_name=USERNAME [very low hit rate]
 
** https://api.twitter.com/1/statuses/user_timeline.atom?screen_name=USERNAME [very low hit rate]
 
** +lowercase USERNAME for each feed
 
**
 
** http://search.twitter.com/search.rss?q=* (check for feeds Reader already has cached)
 
** https://search.twitter.com/search.rss?q=* ibid
 
** http://search.twitter.com/search.atom?q=* ibid
 
** https://search.twitter.com/search.atom?q=* ibid
 
* facebook.com/*
 
** Has feeds for Pages; see http://ahrengot.com/tutorials/facebook-rss-feed/
 
** Has feeds for Groups as well; see https://apps.facebook.com/groups_to_rss/
 
* plus.google.com/*
 
** http://rss2lj.net/g+/USER-ID
 
** http://gplusrss.com/rss/feed/[some kind of checksum or hash]
 
** http://www.googleplusfeed.net/feed/USER-ID
 
*** e.g. http://www.googleplusfeed.net/feed/115030581977322198102
 
* *.dreamwidth.org
 
* *.blog.com
 
* http://<font></font>pipes.yahoo.com/pipes/pipe.<font></font>run*
 
** You can search for feeds, e.g. http://<font></font>pipes.yahoo.com/pipes/search?r=source%3Afeeds.feedburner.com
 
* http://page2rss.com/rss/*
 
** e.g. http://page2rss.com/rss/0f57ce71ebdd24878485c8d3624c3819
 
* http://page2rss.com/atom/*
 
** e.g. http://page2rss.com/atom/ae56d7ac85827977bcf0aa7857f3f309
 
* 4chan.org
 
** Image Boards: http://boards.4chan.org/BOARD/index.rss (RSS)
 
** Image Boards: https://boards.4chan.org/BOARD/index.rss (RSS)
 
** Text Boards: http://dis.4chan.org/atom/BOARD (Atom)
 
** Text Boards: https://dis.4chan.org/atom/BOARD (Atom)
 
* *.vox.com
 
** http://USERNAME.vox.com/library/posts/atom.xml
 
** http://USERNAME.vox.com/library/posts/atom-full.xml
 
** http://USERNAME.vox.com/library/posts/rss.xml
 
** http://USERNAME.vox.com/library/posts/rss-full.xml
 
** http://USERNAME.vox.com/library/photos/rss.xml (probably skip)
 
* *.jux.com
 
** http://USERNAME.jux.com/quarks.rss
 
** https://USERNAME.jux.com/quarks.rss
 
* *.at.webry.info
 
* http://www.rsspect.com/*
 
** e.g. http://www.rsspect.com/rss/vagrant.xml
 
* http://buzz.googleapis.com/feeds/*/public/posted
 
** e.g. http://buzz.googleapis.com/feeds/112778807045063877346/public/posted
 
* craigslist.org
 
* http://www.mail-archive.com/*/maillist.xml
 
** e.g. http://www.mail-archive.com/linux-zigbee-devel@lists.sourceforge.net/maillist.xml
 
* Reddit users
 
** http://www.reddit.com/user/USERNAME/.rss
 
** https://pay.reddit.com/user/USERNAME/.rss (very low hit rate)
 
** http://www.reddit.com/user/USERNAME/comments/.rss
 
** https://pay.reddit.com/user/USERNAME/comments/.rss (very low hit rate)
 
** http://www.reddit.com/user/USERNAME/submitted/.rss
 
** https://pay.reddit.com/user/USERNAME/submitted/.rss (very low hit rate)
 
** +everything again with lowercased USERNAME
 
* Subreddits [152,042 found]
 
** http://www.reddit.com/r/SUBREDDIT/.rss
 
** https://pay.reddit.com/r/SUBREDDIT/.rss
 
** http://www.reddit.com/r/SUBREDDIT/top/.rss
 
** https://pay.reddit.com/r/SUBREDDIT/top/.rss
 
** http://www.reddit.com/r/SUBREDDIT/controversial/.rss
 
** https://pay.reddit.com/r/SUBREDDIT/controversial/.rss
 
** http://www.reddit.com/r/SUBREDDIT/new/.rss
 
** https://pay.reddit.com/r/SUBREDDIT/new/.rss
 
** +everything again with lowercased SUBREDDIT
 
* http://blog.myspace.com/blog/rss.cfm?friendID=FRIENDID (+ https?)
 
** Are these the blogs that myspace deleted on 2013-06-14?
 
** e.g. http://www.google.com/reader/view/#stream/feed%2Fhttp%3A%2F%2Fblog.myspace.com%2Fblog%2Frss.cfm%3FfriendID%3D181926159
 
* Windows Live Spaces feeds
 
** http://*.spaces.live.com/feed.rss
 
** http://*.spaces.live.com/blog/feed.rss
 
** http://*.spaces.live.com/photos/feed.rss
 
* Old Hacker News feeds
 
** http://rss.searchyc.com/user/USERNAME
 
** http://rss.searchyc.com/user/USERNAME?only=comments
 
** http://rss.searchyc.com/user/USERNAME?only=comments&sort=by_date
 
** http://rss.searchyc.com/user/USERNAME?sort=by_date
 
** http://rss.searchyc.com/USERNAME?sort=by_date
 
* Less Wrong feeds
 
** http://lesswrong.com/user/USERNAME/overview/.rss
 
** http://lesswrong.com/user/USERNAME/submitted/.rss
 
** http://lesswrong.com/user/USERNAME/comments/.rss
 
* Quora feeds
 
** http://www.quora.com/TOPIC/rss [101,265 discovered]
 
** http://www.quora.com/USERNAME/rss
 
** http://www.quora.com/USERNAME/questions/rss
 
** http://www.quora.com/USERNAME/answers/rss
 
* "shared items" feeds created by Reader users
 
** http://www.google.com/reader/public/atom/user/*/state/com.google/broadcast
 
*** e.g. http://www.google.com/reader/public/atom/user/06575532310267031409/state/com.google/broadcast
 
** Probably download these through the special API URL, e.g. https://www.google.com/reader/api/0/stream/contents/user/06575532310267031409/state/com.google/broadcast?r=n&n=1000
 
* "generated feeds" created while the feature was available
 
** http://www.google.com/reader/public/atom/webfeed/*
 
*** e.g. http://www.google.com/reader/public/atom/webfeed/11571763057935010098
 
** Probably download these through the special API URL, e.g. https://www.google.com/reader/api/0/stream/contents/webfeed/11571763057935010098?r=n&n=1000
 
* http://www.kickstarter.com/projects/PROJECTID/PROJECTNAME/posts.atom
 
** e.g. http://www.kickstarter.com/projects/306316578/light-table/posts.atom
 
* del.icio.us feeds
 
** Users: http://del.icio.us/rss/USERNAME
 
** Tags: http://del.icio.us/rss/tag/TAGNAME
 
** Popular: http://del.icio.us/rss/popular
 
** Popular tags: http://del.icio.us/rss/popular/TAGNAME
 
* http://youtube.com/user/*
 
** http://www.youtube.com/rss/user/USERNAME/videos.rss (old feed)
 
** http://gdata.youtube.com/feeds/api/users/USERNAME/uploads
 
** https://gdata.youtube.com/feeds/api/users/USERNAME/uploads
 
** http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?max-results=50
 
** http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?alt=rss&max-results=50
 
** http://gdata.youtube.com/feeds/base/users/USERNAME/uploads?alt=rss&v=2&client=ytapi-youtube-profile
 
** http://gdata.youtube.com/feeds/base/users/USERNAME/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile
 
** http://gdata.youtube.com/feeds/base/users/USERNAME/uploads?alt=rss&client=ytapi-youtube-rss-redirect&v=2&orderby=updated (redirect from old feed)
 
* http://*.multiply.com/
 
**http://USERNAME.multiply.com/feed.rss
 
**http://USERNAME.multiply.com/feed
 
*http://bandcamp.com
 
**http://USERNAME.bandcamp.com/feed
 
***Artist pages have a list of fans, and fan pages have a list of artists, by crawling both you can map out the bandcamp userbase.
 
**http://USERNAME.bandcamp.com/feed/album/ALBUMNAME
 
***For obvious reasons, the album needs to have been published by the given username.
 
* http://vimeo.com/USERNAME
 
** http://vimeo.com/USERNAME/videos/rss
 
** https://vimeo.com/USERNAME/videos/rss
 
*** e.g. https://vimeo.com/chriskpalmer/videos/rss
 
* ... and many more (please add them above!)
 
 
 
==== Tools for URL discovery ====
 
 
 
* Custom crawls with wget, HTTrack, Python code, etc
 
* https://commoncrawl.org/analysis-of-the-ncsu-library-urls-in-the-common-crawl-index/
 
<pre>
 
git clone https://github.com/trivio/common_crawl_index
 
cd common_crawl_index
 
pip install --user boto
 
PYTHONPATH=. python bin/index_lookup_remote 'com.blogspot'
 
</pre>
 
You can copy and edit <tt>bin/index_lookup_remote</tt> to print just the necessary information:
 
<pre>
 
# Print entire URL:
 
rest, schema =  url.rsplit(":", 1)
 
domain, path = rest.split('/', 1)
 
print schema + '://' + '.'.join(domain.split('.')[::-1]) + '/' + path
 
 
 
# Print just the subdomain:
 
print '.'.join(url.split('/', 1)[0].split('.')[::-1])
 
 
 
# Print just the first two URL /path segments:
 
rest, schema =  url.rsplit(":", 1)
 
domain, path = rest.split('/', 1)
 
print schema + '://' + '.'.join(domain.split('.')[::-1]) + '/' + '/'.join(path.split('/', 2)[0:2])
 
 
 
# Print just the first URL /path segment:
 
rest, schema =  url.rsplit(":", 1)
 
domain, path = rest.split('/', 1)
 
print schema + '://' + '.'.join(domain.split('.')[::-1]) + '/' + '/'.join(path.split('/', 1)[0:1])
 
</pre>
 
 
 
Pipe the output to <tt>| uniq | bzip2 > sitename-list.bz2</tt>, check it with <tt>bzless</tt>, and upload it to [http://allyourfeed.ludios.org:8080/ our OPML collector].
 
 
 
* site:domain.com or site:domain.com/page/ searches using Google, Bing, startpage
 
* http://dnshistory.org/subdomains/1/domain.com
 
 
 
=== Add to the above list of blog platforms ===
 
 
 
See:
 
 
 
* http://taimoorsultan.com/list-of-25-blogging-platforms/
 
* http://john.do/blogging-platforms/
 
* http://mashable.com/2007/08/06/free-blog-hosts/
 
* Many non-US blogging platforms
 
* Feeds from dead sites: http://www.archiveteam.org/index.php?title=Deathwatch#Dead_as_a_Doornail
 
 
 
= External links =
 
 
 
WARCs are landing at http://archive.org/details/archiveteam_greader
 
  
 
<references/>
 
<references/>
 
[[Category:Google]]
 
[[Category:Google]]
 
{{Navigation box}}
 
{{Navigation box}}

Latest revision as of 16:11, 30 December 2015

Google Reader
Google Reader logo
Greader screenshot en.gif
URL http://www.google.com/reader/[IAWcite.todayMemWeb]
Project status Offline
Archiving status Saved!
Project source greader-grab

greader-directory-grab
greader-stats-grab

Project tracker greader-grab

greader-grab :80
greader-directory-grab
greader-directory-grab :80
greader-stats-grab
greader-stats-grab :80

IRC channel #donereading (on EFnet)
Project lead Unknown

Google Reader was an RSS feed reader, launched by Google in 2005 and killed off in 2013.

Shutdown notification

On the March 13, Google announced that they'll "spring clean" Google Reader at Official Google Reader Blog:

Powering Down Google Reader
3/13/2013 04:06:00 PM
Posted by Alan Green, Software Engineer
We have just announced on the Official Google Blog that we will soon retire Google Reader (the actual date is July 1, 2013). We know Reader has a devoted following who will be very sad to see it go. We’re sad too.
There are two simple reasons for this: usage of Google Reader has declined, and as a company we’re pouring all of our energy into fewer products. We think that kind of focus will make for a better user experience.
To ensure a smooth transition, we’re providing a three-month sunset period so you have sufficient time to find an alternative feed-reading solution. If you want to retain your Reader data, including subscriptions, you can do so through Google Takeout.
Thank you again for using Reader as your RSS platform.

Reader and Reader API were turned off soon after midnight, Pacific time, July 2.

Post-shutdown message

Thank you for stopping by.
Google Reader has been discontinued. We want to thank all our loyal fans. We understand you may not agree with this decision, but we hope you'll come to love these alternatives as much as you loved Reader.
Sincerely,
The Google Reader team
Frequently-asked questions
  1. What will happen to my Google Reader data?'
    All Google Reader subscription data (eg. lists of people that you follow, items you have starred, notes you have created, etc.) will be systematically deleted from Google servers. You can download a copy of your Google Reader data via Google Takeout until 12PM PST July 15, 2013.
  2. Will there be any way to retrieve my subscription data from Google in the future?
    No -- all subscription data will be permanently, and irrevocably deleted. Google will not be able to recover any Google Reader subscription data for any user after July 15, 2013.
  3. Why was Google Reader discontinued?
    Please refer to our blog post for more information.

Archiving

Archive Team's Ivan launched a heroic effort to retrieve historical feed data from the Google Reader API. Details can be found in the war room.

All WARCs have been uploaded to http://archive.org/details/archiveteam_greader. The total size is about 8800 GB (feed data + directory + stats).

We don't yet have a convenient tool to read a specific feed in the uploaded megawarcs. The master CDX index (see the CDX file format for interpreting each entry) gives metadata about which megawarc an archived file/feed/URL is in, and also the file's byte range, which allows seeking directly to it in the megawarc.

Backing up your own data



v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ