Google Code
Google Code | |
URL | Google Code[IA•Wcite•.today•MemWeb] |
Status | Endangered |
Archiving status | Saved! |
Archiving type | Unknown |
Project source | googlecode-grab |
Project tracker | googlecode |
IRC channel | #archiveteam-bs (on hackint) (formerly #googlecodeblue (on EFnet)) |
Data[how to use] | archiveteam_googlecode |
Google Code (AKA Project Hosting) was a software repository owned by Google. It hosted only open source software paired with an open source license.[1]
Google Code allowed people to commit their code into either a Subversion (SVN), Git or Mercurial repository. It had a downloads section for people to upload their software packages (with a quota limit of 4GB, could be increased upon request) and also a wiki for projects to document their work at. There was also an issue tracker to track bugs in the project's software.
Google Code officially shut down on January 25, 2016, but they left a public archive.
Vital signs
The site went read-only on 24th August, 2015, and was closed on 25th January, 2016.[2] They left a public archive, though.
Archiving
Archiving source code repositories is rather easy (and incremental). Just clone the git/hg repository, or checkout SVN repo. For SVN, make sure that you checkout all branches, not just trunk. Ideally for svn one would use "svnrdump dump REPO" to dump not only the latest revision of the repository, but the complete history.
Archiving bugtrackers and the other stuff will be a bit harder.
A tool to export a repository to GitHub is available[3]. If the repository has been migrated to GitHub, the project is no longer available for access.
ArchiveTeam started to save Google Code on December 18, 2015, as a Warrior project.
After the closure, they left a public archive, but that is missing some of the original information.[4] Although the original content got hidden from the public, ArchiveTeam got access and went on saving it, so that the Wayback Machine can receive a full copy.[5][6]
URL lists
Some seeds for site discovery:
- Underway: Scrape Google Code Search
- Enumerate a list of labels, then fetch results for each label.
- Google Code search results can be grabbed in packs of 100, just add "&num=100" to the end of the URL.
- Phase 1. Quick grep says 114,262 projects, plus 71,972 labels for further searching.
- 463,061 projects (18M text file)
- URLs from ArchiveTeam IRC logs
- List scraped from MediaWiki wikis
- List from FlossMole's data (sorted from a possibly-incomplete survey in November 2012: http://flossdata.syr.edu/data/gc/)
- Links from Open Directory Project
- Links from Kyan
- TODO: Scrape Google Search
- TODO: Scrape Bing
- TODO: Scrape Twitter
- TODO: Scrape the Common Crawl Index
- TODO: Scrape URLTeam dumps
- TODO: ask chris dibona for a complete list of projects
Tools
- FlossMole provides a set of tools to spider projects from GC
Archives
Google Code archives are uploaded to https://archive.org/details/archiveteam_googlecode, in WARC format.
"The Google Code Archive (https://code.google.com/archive/) contains the data found on the Google Code Project Hosting Service, which will be turned down in early 2016. This archive contains over 1.4 million projects, 1.5 million downloads, and 12.6 million issues."
References
- ↑ FAQ - support - Project Hosting on Google Code FAQ - User support for Google Project Hosting - Google Project Hosting
- ↑ Bidding farewell to Google Code
- ↑ Export to GitHub - Google Code
- ↑ http://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-03-12,Sat&sel=10#l6
- ↑ http://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-02-26,Fri&sel=97#l93
- ↑ http://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-03-05,Sat&sel=474#l470