Difference between revisions of "Google Code"
(+ Expected start time) |
|||
Line 5: | Line 5: | ||
| URL = {{url|1=http://code.google.com|2=Google Code}} | | URL = {{url|1=http://code.google.com|2=Google Code}} | ||
| project_status = {{closing}} | | project_status = {{closing}} | ||
| archiving_status = {{upcoming}} | | archiving_status = {{upcoming}} (ETA: Late August 2015) | ||
| irc = googlecodeblue | | irc = googlecodeblue | ||
}} | }} |
Revision as of 22:38, 10 June 2015
Google Code | |
URL | Google Code[IA•Wcite•.today•MemWeb] |
Status | Closing |
Archiving status | Upcoming... (ETA: Late August 2015) |
Archiving type | Unknown |
IRC channel | #googlecodeblue (on hackint) |
Google Code (AKA Project Hosting) is a software repository that is owned by Google. It hosts only open source software paired with an open source license.[1]
Google Code allows people to commit their code into either a Subversion (SVN), Git or Mercurial repository. It has a downloads section for people to upload their software packages (with a quota limit of 4GB, can be increased upon request) and also a wiki for projects to document their work at. There is also an issue tracker to track bugs in the project's software.
Vital signs
The site goes read-only on 24th August, 2015, followed by the closing on 25th January, 2016[2].
Archiving
Archiving source code repositories is rather easy (and incremental). Just clone the git/hg repository, or checkout SVN repo. For SVN, make sure that you checkout all branches, not just trunk. Ideally for svn one would use "svnrdump dump REPO" to dump not only the latest revision of the repository, but the complete history.
Archiving bugtrackers and the other stuff will be a bit harder.
A tool to export a repository to GitHub is available[3]. If the repository has been migrated to GitHub, the project is no longer available for access.
URL lists
Some seeds for site discovery:
- Underway: Scrape Google Code Search
- Enumerate a list of labels, then fetch results for each label.
- Google Code search results can be grabbed in packs of 100, just add "&num=100" to the end of the URL.
- Phase 1. Quick grep says 114,262 projects, plus 71,972 labels for further searching.
- 463,061 projects (18M text file)
- URLs from ArchiveTeam IRC logs
- List scraped from MediaWiki wikis
- List from FlossMole's data (sorted from a possibly-incomplete survey in November 2012: http://flossdata.syr.edu/data/gc/)
- Links from Open Directory Project
- Links from Kyan
- TODO: Scrape Google Search
- TODO: Scrape Bing
- TODO: Scrape Twitter
- TODO: Scrape the Common Crawl Index
- TODO: Scrape URLTeam dumps
- TODO: ask chris dibona for a complete list of projects
Tools
- FlossMole provides a set of tools to spider projects from GC