Difference between revisions of "CodePlex"

From Archiveteam
Jump to navigation Jump to search
m (Updated stats.)
m (Updated stats. ZIP Step 1 completed. Correction regarding mapping files.)
Line 53: Line 53:
[[User:Sylirana]] has started archiving all of the .zip files that Microsoft provides for each project. Those contain the code (depending on which versioning system was used, multiple versions, see above) and other data such as issues and wikis.
[[User:Sylirana]] has started archiving all of the .zip files that Microsoft provides for each project. Those contain the code (depending on which versioning system was used, multiple versions, see above) and other data such as issues and wikis.


One thing to note is that while the zip files do contain image attachments, they just have a hash as a filename (and no extension). The HTML files (of the wikis, for example) do NOT link to that file, but instead to the soon-to-be-offline server. This is something to consider for anyone wanting to see images on the HTML-pages in the archive. There doesn't seem to be a mapping anywhere for the url (which contains a proper filename) and the file in the archive which just contains a hash as the file name.
One thing to note is that while the zip files do contain image attachments, they just have a hash as a filename (and no extension). The HTML files (of the wikis, for example) do NOT link to that file, but instead to the soon-to-be-offline server. This is something to consider for anyone wanting to see images on the HTML-pages in the archive. <s>There doesn't seem to be a mapping anywhere for the url (which contains a proper filename) and the file in the archive which just contains a hash as the file name.</s>
 
Upon checking all the JSON files inside the archive, I found that there are indeed some containing mappings for files of different subdirectories of the archive. This means it should be possible to fully rebuild the wiki (and other) pages *with* attachments from the zip files.


The archiving will happen in two steps due to the different ID types.
The archiving will happen in two steps due to the different ID types.
Line 65: Line 67:


{| class="wikitable"
{| class="wikitable"
|+ Current progress of ZIP files (step 1):
|+ Current progress of ZIP files (step 1) [DONE]:
|-
|-
! Total !! Total done (1) !! Saved (1) !! 404 (1)
! Total !! Total done (1) !! Saved (1) !! 404 (1)
|-
|-
| 108516 || 55796 (~51%) || 47898 (~279 GB) || 7898
| 108516 || 108516 (100%) || 94097 (~548.3 GB) || 14419
|}
|}


Line 77: Line 79:
<br>404 (1): Links which returned a 404 during step 1. This does NOT mean thata project is lost, it might simply have uppercase letters which will be checked in step 2 (see above).
<br>404 (1): Links which returned a 404 during step 1. This does NOT mean thata project is lost, it might simply have uppercase letters which will be checked in step 2 (see above).


Last updated: 2021-02-13 09:00 UTC
Stats on step 2 will follow soon.
 
Last updated: 2021-02-14 09:00 UTC


Contact Sylirana on hackint or check the channel (see infobox) if you have any questions.
Contact Sylirana on hackint or check the channel (see infobox) if you have any questions.

Revision as of 16:03, 14 February 2021

CodePlex
Codeplex 201703312132.png
URL CodePlex[IAWcite.todayMemWeb]
Archive[IAWcite.todayMemWeb]
Status Closing
Archiving status Upcoming...
Archiving type Unknown
IRC channel #plexicode (on hackint)
Project lead Unknown (WARC)
User:Sylirana (ZIP)

CodePlex was a software repository owned by Microsoft. It hosted only open source software paired with an open source license.[1]

CodePlex allowed people to commit their code into a Git, Mercurial, or Team Foundation Server version control repository. It had a downloads section for people to upload their software packages, an issue tracker, documentation repository, and discussion forums.

The platform was shut down in 2017, but a read-only archive remained online. This self-archive will be shut down in July 2021.

Vital signs

The shutdown announcement was made on 31st March 2017 [2]. New project creation was disabled at the same time the shutdown announcement was published. On an unspecified date in October 2017 the site will be made read-only. Shutdown is scheduled for 15th December 2017.

Archiving

After the 15th December 2017 shutdown date the announcement indicates that "lightweight archives" containing project source code, documentation, downloads, documentation, license, and issues as-of the date the site changed to read-only will be available. There is no planned date to stop hosting these archives.

The shutdown announcement indicates that project owners will be provided a tool to migrate their sites to Github. As of the announcement date the migration tool is "in the works".

Alex Mullans, a Microsoft Program Manager of the Visual Studio Team Services product, stated in a discussion on Hacker News [3] that project archives will be available for anyone to download (as opposed to being restricted to project owners). He further stated that for projects using the Git and Mercurial version control systems the ".git" and ".hg" folders would be included in the archive, so that full source code history would be preserved. For projects using the TFS version control system, however, the full history would not be included in the archive and only the code as-of the site being changed to read-only would be available.

In late January 2021[4], a banner was added to the archive website, announcing that the archive would be shut down in July 2021.

Site structure

There is a sitemap (Warning: Large xml file!)[IAWcite.todayMemWeb] which contains links to 108516 individual projects in the format of https://archive.codeplex.com/?p=<ID>. It has not been confirmed yet whether this contains all of the projects on the site or not.
There are projects on the sitemap that have been completely removed from the site.

It is important to note that there are two different types of IDs used. One is used at the sitemap and for some other resources, such as the page JSON this one is all lowercase (called <ID> here). The second type is the same ID, but with uppercase letters (if the project had any) (called <ID2> here). Because of that, just going by the IDs returned by the sitemap will return a 404 for those that have capital letters in the project name.

The individual sites load the actual contents of the page using JavaScript by requesting multiple JSON files (for the page (https://archive.codeplex.com/metadata/<ID>.json), issues, etc.).

There is a .zip file for each project. This uses the aforementioned second ID and is located at https://codeplexarchive.blob.core.windows.net/archive/projects/<ID2>/<ID2>.zip.

WARC

The way the site loads the content, makes it more difficult to all the pages (see above for details).

The first JSON that is requested for each project, contains an HTML snippet, which is then inserted into the actual page for the user to view.
The wiki on the site itself is *broken*! All Wiki links simply redirect to the project's single page from which issues and discussions can be read (also loaded with JS). Because of that, the only way to get the Wikis seems to be the .zip files. The imags only have hashes as filenames though and the links from the HTML pages can't be rewritten automatically (see below).

Project ZIP Files

User:Sylirana has started archiving all of the .zip files that Microsoft provides for each project. Those contain the code (depending on which versioning system was used, multiple versions, see above) and other data such as issues and wikis.

One thing to note is that while the zip files do contain image attachments, they just have a hash as a filename (and no extension). The HTML files (of the wikis, for example) do NOT link to that file, but instead to the soon-to-be-offline server. This is something to consider for anyone wanting to see images on the HTML-pages in the archive. There doesn't seem to be a mapping anywhere for the url (which contains a proper filename) and the file in the archive which just contains a hash as the file name.

Upon checking all the JSON files inside the archive, I found that there are indeed some containing mappings for files of different subdirectories of the archive. This means it should be possible to fully rebuild the wiki (and other) pages *with* attachments from the zip files.

The archiving will happen in two steps due to the different ID types.
In a first step, all of the projects with lowercase IDs will be saved.
In a second step, the JSON for all of the projects with a 404 during the first step (=The project has uppercase letters in the ID or it has been deleted from the site.) will be requested and another list to download will be generated, along with a list of projects that have been deleted from the site (see above).

The reasoning behind doing this in two steps and not just getting the JSON for every single project to check for the letter case is that the majority of projects can be downloaded without those extra steps (and requests to the server!).

Downloading is rather slow as the server significantly limits the bandwidth.
Despite those limits, the project ZIP files are on track to be completed within March, which is reasonably far from the shutdown in July.

Current progress of ZIP files (step 1) [DONE]:
Total Total done (1) Saved (1) 404 (1)
108516 108516 (100%) 94097 (~548.3 GB) 14419

Total: Total according to sitemap.
Total done (1): Total links done during step 1 (or in other words, progress towards step 2).
Saved (1): Saved during step 1.
404 (1): Links which returned a 404 during step 1. This does NOT mean thata project is lost, it might simply have uppercase letters which will be checked in step 2 (see above).

Stats on step 2 will follow soon.

Last updated: 2021-02-14 09:00 UTC

Contact Sylirana on hackint or check the channel (see infobox) if you have any questions.

References

  1. Documentation - CodePlex FAQ - Project Hosting Requirements
  2. Shutdown down CodePlex
  3. Hacker News - Shutting down Codeplex discussion
  4. No date is mentioned in the notice, but the Wayback Machine snapshots indicate that it was added in the last week of January 2021.

External links


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LINE BLOG · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · Discourse · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · Heroes of Newerth · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · VK · WeeWorld · Weibo · Wretch · Xuite · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ