Revision as of 02:53, 27 May 2013

Google Reader
URL	http://www.google.com/reader/^{[IA•Wcite•.today•MemWeb]}
Status	Online!
Archiving status
Archiving type	Unknown
Project source	https://github.com/ArchiveTeam/greader-grab
Project tracker	N/A
IRC channel	#donereading (on hackint)

Shutdown notification

On the March 13, Google announced that they'll "spring clean" Google Reader at Official Google Reader Blog:

we will soon retire Google Reader (the actual date is July 1, 2013)

Backing up your own data

Main page - google.com/reader/
Export via Google Takeout
- Contains subscriptions and starred items, but not tags
- Can be imported into The Old Reader
API: https://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

Backing up the historical feed data

Google Reader acts as a cache for RSS/Atom feed content, keeping deleted posts and deleted blogs accessible (if you can recreate the RSS/Atom feed URL). After the Reader shutdown, this data might still be available^[1] via the Feeds API, but we'd like to grab most of this data before July 1 through the much more straightforward /reader/ API.

The ArchiveTeam Warrior job for this project should be ready in a few days.

Your help is needed

Give us your feed URLs

We need to discover as many feed URLs as possible. Not all of them can be discovered through crawling, so we need your OPML files. (Though if you have any private or passworded feeds, please strip them out.)

Upload OPML files and lists of URLs to:

http://allyourfeed.ludios.org:8080/

Install the ArchiveTeam Warrior

Install the ArchiveTeam Warrior and have it run ArchiveTeam's Choice:

http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior

Google Reader will (probably) soon become the primary job.

Crawl websites to discover blogs and usernames

We need to discover millions of blog/username URLs on popular blogging platforms (which we'll turn into feed URLs):

*.tumblr.com [~37,842 discovered so far]
*.livejournal.com [~193,229 discovered so far]
*.wordpress.com [~31,053 discovered so far]
*.blogspot.com
*.posterous.com
*.typepad.com
twitter.com/*
facebook.com/*
plus.google.com/*
*.dreamwidth.org
*.blog.com
*.at.webry.info
*.posterous.com [~9,898,986 discovered]
Reddit feeds
- http://www.reddit.com/user/USERNAME/.rss
- https://pay.reddit.com/user/USERNAME/.rss
- http://www.reddit.com/user/USERNAME/comments/.rss
- https://pay.reddit.com/user/USERNAME/comments/.rss
- http://www.reddit.com/user/USERNAME/submitted/.rss
- https://pay.reddit.com/user/USERNAME/submitted/.rss
- http://www.reddit.com/r/SUBREDDIT/.rss
- https://pay.reddit.com/r/SUBREDDIT/.rss
- http://www.reddit.com/r/SUBREDDIT/top/.rss
- https://pay.reddit.com/r/SUBREDDIT/top/.rss
- http://www.reddit.com/r/SUBREDDIT/controversial/.rss
- https://pay.reddit.com/r/SUBREDDIT/controversial/.rss
- http://www.reddit.com/r/SUBREDDIT/new/.rss
- https://pay.reddit.com/r/SUBREDDIT/new/.rss
http://blog.myspace.com/*
Windows Live Spaces feeds
- http://*.spaces.live.com/feed.rss
- http://*.spaces.live.com/blog/feed.rss
- http://*.spaces.live.com/photos/feed.rss
Old Hacker News feeds
- http://rss.searchyc.com/user/username
- http://rss.searchyc.com/user/username?only=comments
- http://rss.searchyc.com/user/username?sort=by_date
Less Wrong feeds
- http://lesswrong.com/user/username/overview/.rss
- http://lesswrong.com/user/username/submitted/.rss
- http://lesswrong.com/user/username/comments/.rss
del.icio.us feeds
- Users: http://del.icio.us/rss/USERNAME
- Tags: http://del.icio.us/rss/tag/TAGNAME
- Popular: http://del.icio.us/rss/popular
- Popular tags: http://del.icio.us/rss/popular/TAGNAME
... and many more (please add them here!)
- http://taimoorsultan.com/list-of-25-blogging-platforms/
- http://john.do/blogging-platforms/
- http://mashable.com/2007/08/06/free-blog-hosts/
- Many non-US blogging platforms
- Feeds from dead sites: http://www.archiveteam.org/index.php?title=Deathwatch#Dead_as_a_Doornail

Join #donereading and #archiveteam on efnet if you'd like to help with this.

Crawl Google Reader itself for feeds

https://www.google.com/reader/directory/search?q=keyword-here https://www.google.com/reader/directory/search?q=keyword-here&start=10

Add gzip support to wget-lua

It would be quite helpful to have a wget-lua that supports gzip content encoding (vanilla wget doesn't support it either.) This will speed up downloads and save a lot of bandwidth.

There have already been some attempts at making wget support gzip:

https://github.com/kravietz/wget-gzip (Windows-only; needs to work on Linux)

https://github.com/ptolts/wget-with-gzip-compression (based on a wget from 2003?)

↑ https://groups.google.com/forum/?fromgroups=#!topic/Google-AJAX-Search-API/OaGf0eP57js

[1] ttps://groups.google.com/forum/?fromgroups=#!topic/Google-AJAX-Search-API/OaGf0eP57js

[1]

Difference between revisions of "Google Reader"

Revision as of 02:53, 27 May 2013

Contents

Shutdown notification

Backing up your own data

Backing up the historical feed data

Your help is needed

Give us your feed URLs

Install the ArchiveTeam Warrior

Crawl websites to discover blogs and usernames

Crawl Google Reader itself for feeds

Add gzip support to wget-lua

Navigation menu

@@ Line 106: / Line 106: @@
 https://www.google.com/reader/directory/search?q=keyword-here
+https://www.google.com/reader/directory/search?q=keyword-here&start=10
 === Add gzip support to wget-lua ===

Difference between revisions of "Google Reader"

Revision as of 02:53, 27 May 2013

Shutdown notification

Backing up your own data

Backing up the historical feed data

Your help is needed

Give us your feed URLs

Install the ArchiveTeam Warrior

Crawl websites to discover blogs and usernames

Crawl Google Reader itself for feeds

Add gzip support to wget-lua

Navigation menu

Search