Difference between revisions of "Projects"

From Archiveteam
Jump to navigation Jump to search
(usenet moved to its own page)
(split largely duplicated content)
Line 14: Line 14:


== Websites at risk ==
== Websites at risk ==
=== High Risk ===
{| class="wikitable"
! Website
! Closing date
! Project status
! User
! Archiving Status
! Details
! Archives
! Archive Date
! Archive Format
|-
| [[AOL Music]] [http://music.aol.com/] || 2013-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| rowspan="2" | [[Fileplanet]] [http://www.fileplanet.com/] || rowspan="2" | 2013-??-?? || rowspan="2" | Closing || [[Archiveteam]] || <span style="color:green">Saved</span> || Saved only files 00000-229999 || [[Fileplanet#Status]]  || 2012-05-08 - 2012-07-06 || .tar
|-
| [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading full website ||  ||  || .warc.gz
|-
| [[Warhammer|Warhammer Online: Age of Reckoning]] [http://www.warhammeronline.com/] || 2013-12-18 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloading the full main website || COMING || 2013-12-04 - 2013-12-14 || .warc.gz
|-
| rowspan="3" | [[WinAmp]] [http://www.winamp.com/] [http://dev.winamp.com/] [http://forums.winamp.com/] [http://blog.winamp.com/] || rowspan="3" | 2013-12-20 || rowspan="3" | Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Download full website and other domains ||  ||  || .warc.gz
|-
| [[Archivebot]] || <span style="color:green">Saved</span> || Downloaded  website, dev and blog subdomains ||  ||  || .warc.gz
|-
| Various || <span style="color:green">Saved</span> || Downloaded website, forums, skins/plugins || [https://archive.org/search.php?query=winamp+warc] || 2013-11 || .warc.gz
|-
| [[Wretch]] [http://www.wretch.cc/] || 2013-12-26 || Closing || [[Archiveteam]] || <span style="color:orange">In progress...</span> || Downloading the full website + accounts ||  || 2013-12-18 - present || .warc.gz
|-
| [[Quick.io]] [http://www.quik.io/] || 2013-12-31 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloaded the main website and the subdomains of the mainwebsite || COMING || 2013-12-13 || .warc.gz
|-
| rowspan="2" | [[ptch]] [http://ptch.com/] || rowspan="2" | 2014-01-02 || rowspan="2" | Closing || [[Archiveteam]] || <span style="color:orange">In progress...</span> || Only downloading the accounts and the things from the accounts ||  ||  || .warc.gz
|-
| [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloading only main website, blog and help website || COMING || 2013-12-14 || .warc.gz
|-
| [[jajah]] [http://jajah.com/] || 2014-01-31 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Download full website || COMING || 2013-12-15 - 2013-12-16 || .warc.gz
|-
| [[My Opera]] [http://my.opera.com/] || 2014-03-01 || Closing || [[User:Mithrandir]](?) || <span style="color:orange">In progress...</span> || Initial grab of files (6.2 GB) || [https://archive.org/details/files.myopera.com-initialgrab]  ||  || .warc.gz
|-
| [[widgetbox]] [http://www.widgetbox.com/] [http://support.widgetbox.com/] [http://blog.widgetbox.com/] [http://cdn.widgetbox.com/] [http://help.widgetbox.com/] [http://pub.widgetbox.com/] [http://files.widgetbox.com/] || 2014-03-28 || Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading all the websites ||  || 2013-12-19 - present || .warc.gz
|-
| [[TechNet]] [http://technet.microsoft.com/] || 2014-09-30 || Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading full website ||  ||  || .warc.gz
|-
| [[1UP.com]] [http://www.1up.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[UGO]] [http://www.ugo.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[GameSpy]] [http://www.gamespy.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|}


=== Average Risk ===
See [[Deathwatch]] and [[Alive... OR ARE THEY]].


=== Low Risk ===
== Ideas for projects ==


=== Important Websites ===
See [[Deathwatch]] and [[Alive... OR ARE THEY]].
{| class="wikitable"
! Website
! User
! Archiving Status
! Details
! Archives
! Archive Date
! Archive Format
|-
| [[Academic Earth]] [http://academicearth.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Codecademy]] [http://www.codecademy.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Delicious]] [https://delicious.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Facebook]] [https://www.facebook.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[FanFiction]] [https://www.fanfiction.net/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Google]] [https://www.google.nl/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[IFTTT]] [https://ifttt.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[infoAnarchy]] [http://www.infoanarchy.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Internet Archive]] [https://archive.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[last.fm]] [http://www.last.fm/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[LiveJournal]] [http://www.livejournal.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| rowspan="2" | [[pastebin]] [http://pastebin.com/] || [[User:Arkiver]] || Aborted || Archive power can better be used for other websites. || COMING || 2013-12-14 - 2013-12-17 || .warc.gz
|-
| [[User:joepie91]] || <span style="color:orange">In progress...</span> || Downloading newest pastes ||  ||  || .warc.gz
|-
| [[reddit]] [http://www.reddit.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[sourceforge]] [http://sourceforge.net/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Twitter]] [https://twitter.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WebCite]] [http://www.webcitation.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[the White House]] [http://www.whitehouse.gov/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[wikia]] [http://www.wikia.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WikiLeaks]] [http://wikileaks.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WikipediA]] [http://www.wikipedia.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|}


== Ideas for Projects ==
== Finished projects ==
:''See also [[Deathwatch]] and [[Alive... OR ARE THEY]].''
:''See also: [[:Category:Rescued Sites]].''
* Various Image Boards - not the short-lived 4chan clones but the more permanent ones like www.zerochan.net (as of today it has over 1.6 million images, all easily available like this: www.zerochan.net/1627488), Pixiv.net, minitokyo.net
* JoshW's video game music archive (links on http://hcs64.com/mboard/forum.php?showthread=26929). Not a "large" site but many many gigs of 7zipped WAVs
* Suggestion: An archive of .gif and .swf preloaders? [[User:Kuro|Kuro]] 19:49, 29 December 2009 (UTC)
**We can extract all the .gif files from the GeoCities archive and compare them using md5sum to discard dupes. [[User:Emijrp|Emijrp]] 19:58, 21 December 2010 (UTC)
* '''Set up''' an FTP hub which AT members can access and up/down finished projects.
** Internet Archive? jason created a section for Archive Team http://www.archive.org/details/archiveteam [[User:Emijrp|Emijrp]] 19:34, 4 June 2011 (UTC)
 
* Track the 100+ top [[twitter]] feeds, as designated by one of these idiot Twitter grading sites, and back up on a regular basis the top twitter people, for posterity.
* '''[http://www.groklaw.net/ Groklaw]''' has a [http://www.groklaw.net/article.php?story=20090105033126835 project proposal] that we could help with. - [[User:Jscott|Jason]]
** Now that Groklaw is dead, a mirror ought to be made soon. (Especially because their [http://groklaw.net/robots.txt robots.txt] blocks the Wayback Machine.) --[[User:Mithrandir|Mithrandir]] 20:28, 21 August 2013 (EDT)
* '''Archive''' the shutdown announcement pages on dead sites.
** this is being done in every wiki page, pasting the announcement, and archiving when possible at WebCite. [[User:Emijrp|Emijrp]] 19:33, 4 June 2011 (UTC)
 
* '''RSS Feed''' with death notices. - [[User:Jscott|Jason]]
** I'm taking a shot at this with [http://www.deaddyingdamned.com The Dead, the Dying & the Damned]. --[[User:Auguste|Auguste]] 14:34, 4 March 2011 (UTC)
* '''Twitter profile''' might be a good way to broadcast new site obituaries. - psicom
* '''[[TinyURL]]''' and similar services, scraping/backup - [[User:scumola|Steve]]
** highlight services that at least allow exporting data ([[Diigo]] that I know of). Next "best" - services that have registeration and enable viewing your URL / saving them by e.g. saving as HTML ([[tr.im]]). Etc. --[[User:Jaakkoh|Jaakkoh]] 05:39, 4 April 2009 (UTC)
** see [[urlteam]]. [[User:Emijrp|Emijrp]] 19:33, 4 June 2011 (UTC)
 
* '''[http://symphony21.com/ Symphony]''' could [http://nick-dunn.co.uk/article/symphony-as-a-data-preservation-utility/ potentially be used] for archiving structured XML/RSS feeds to a relational database - [[User:nickdunn|Nick]]
* '''A Firefox plugin''' for redirecting users to our archive when they request a site that's been rescued. - ???
**good idea, the problem is that the archives are not hosted as the original, but packed. [[User:Emijrp|Emijrp]] 19:32, 4 June 2011 (UTC)
**As some like what you propose already exists, this called [[wikipedia:MafiaaFire Redirector|MAFIAAFire Redirector]] (but that only redirects links from domains that have been seized by governments to backup sites) so if anyone wants to do this project, can be start by reviewing how this works extension. Although the files and pages are not hosted on a server as the original, but that all are packed, I read that [[wikipedia:Heritrix|Heritrix]] (the Internet Archive’s web crawler) by default the web resources that inspects are stored in a [[wikipedia:.arc|Arc]] archive, and perhaps could do something similar, but using bzip2, 7z, rar format archives or a combination of the above to manage the resources of a web. --[[User:Swicher|Swicher]] 07:23, 27 July 2011 (UTC)
* Archives of MUD, MUSH, MOO game sites and related information.  They won't all be around forever. --[[User:Auguste|Auguste]] 13:59, 24 February 2011 (UTC)
** I'm keeping an eye out for, and archiving sites like [http://www.lambdamoo.info LambdaMOO.info], which are either closing down or may be at risk. --[[User:Auguste|Auguste]] 13:59, 24 February 2011 (UTC)
* [http://ytmnd.com YTMND] [[User:Zachera|Zachera]] 20:06, 25 March 2011 (UTC)
* [http://c2.com/cgi/wiki?WikiWikiWeb WikiWikiWeb] - The first wiki, is still a valuable source of information on programming patterns and related topics. It's still active, but I'm not sure how much. It's been going since 1995 so its got real historical value. Plus it's all text and wouldn't take much space. The owner Ward Cunningham might be amenable to providing a copy, so I'd suggest contact first.
** I've done this and linked the dump from [[WikiTeam]]. -- [[User:Ca7|Ca7]]
* Electronics datasheets: [http://alldatasheet.com this], [http://datasheetarchive.com this], [http://www.datasheetcatalog.com this] [http://www.htmldatasheet.com and this] for example. Many of these datasheets are already very hard to find (esp. for older and rarer parts, e.g. those required to emulate old computer systems) and the sites are often in China, Russia or other countries that might give problems in the future. Lots of data to grab, and many of these sites only have very slow bandwidth, so it might be good to start archiving them early. --[[User:Darkstar|Darkstar]] 23:47, 9 April 2011 (UTC)
* '''ElfQuest Comics'''. They've recently all been scanned (6500 pages+) and are available [http://www.elfquest.com/gallery/OnlineComics3.html here]. They're hidden behind a Flash-based viewer though so someone would first have to decompile that to get to the links. --[[User:Darkstar|Darkstar]] 20:55, 18 May 2011 (UTC)
**Working on getting this finished up, done downloading all the images, just have to package it up. [[User:Underscor|Underscor]] 22:35, 4 June 2011 (UTC)
* '''TechNet Archive''': [http://www.microsoft.com/technet/archive/default.mspx?mfr=true here] "Technical information about older versions of Microsoft products and technologies. This information is scheduled to be removed soon." --[[User:Marceloantonio1|Marceloantonio1]] 08:24, 9 June 2011 (UTC -3)
**TechNet, and its big cousin, MSDN, are already being archived by other sites. For example, {{url|1=http://betaarchive.com}} has archived a huge pile of them, including older ones from the late 90's)
* '''[[Jux]]''' was going to get jammed on August 31, 2013, but not anymore. Still might be a good idea to keep them on the radar.
* Archive as many file servers (FTP and HTTP) as possible.
* '''[[Google Answers]]''' has no longer been accepting new questions for a while, and whether it will remain for a while is debatable.
* '''Newgrounds''' is one of the largest collections of Flash games and movies on the Internet. It would be a shame if it all disappeared.
* [[Yahoo!]] has decided to shut down more services, including [[Yahoo! Stars India]], [[Yahoo! Neighbors]], etc. These should be archived before they shut down. Also, yodel.yahoo.com seems to have been replaced by yahoo.tumblr.com, and should be archived too.
* Archive every [http://www.google.com/doodles/ Google Doodle].
 
* http://atheistpictures.com/
* Not if this goes here, but I have an idea for development an program that facilitates the detection of links that belong to certain sites. What do I mean by this?, Is that in my experience with the work in [[Windows Live Spaces]] archiving (and other projects that I've only checked), a problem that apparently occurs frequently is the search of links to those sites whose content will be archived; for example, the links of a Windows Live Space was whatever.spaces.ive.com or a video on Google Video is video.google.com/videoplay?docid=-[video ID number] and so therefore the problem in question is , where do I find the links to pages, videos, articles or anything of a site X and later archive the contents of the same?. Perhaps the most obvious answer is using the API of one or more search engines, but the [http://code.google.com/apis/ajaxsearch/documentation/reference.html Google Web Search API] is currently depreciated (besides being very limited), the [http://developer.yahoo.com/search/siteexplorer/ Site Explorer API] of Yahoo apparently stop working on Sept. 15 and to use the [http://msdn.microsoft.com/en-us/library/dd251020.aspx Bing's API] is required to have a registered AppId (from other search engines I have not checked, but I mention these because they are the most used). Well, because the APIs of the search engines do come with some problems for this project, then I think a good solution would opt to use the [http://www.google.com/search?q=%28automating|automatic|automation|automatization%29+web+%28browsing|browser%29 automation of the web browser] (that would be done the search/es required in (almost) all web searchers, traverse all the results found and to keep the corresponding links in somewhere). Maybe now some are wondering, why use that automatization if it can do likewise programmatically sending [[wikipedia:Hypertext Transfer Protocol#Request message|HTTP request]] to the server and parsing the HTML with the results?. Answer: It is true, it can also be done, but there is a "small" problem; search engines like Google and Bing have a dynamic HTML that when reviewing the source code of some of its results page, looks basically a mishmash of HTML and Javascript code hard to analyze, but this is solved with browser automatization because through this way the code of the search results page of the site would already be "served" for parsing because the browser interpret the code received from the server and convert this to commonplace HTML in RAM (or something) to illustrate this better I leave an example:
 
:[[File:Behavior of a dynamic page.PNG|thumb|left|Clicking on the picture can read a very detailed description of the four screenshots that compose this (besides being able to observe the image to full resolution)]]
:With this way of doing this also solves another detail; maintainability and adaptability of the code because the browser using automatization, all you have to do is indicate the search engine results page, the search term (which would something like site:whatever.com, inurl:.whatever.com/ and stuff like that), the tag where are the links results and what is the button "Next" (therefore this reduces the times of development and implementation for each particular search engine and without writing too much code). If anyone is still interested in the idea after that long explanation, then I will tell that between the browser automatization applications on which I have read, there are two that I have called attention, one is [http://watir.com/ Watir] (programmed in Ruby but is cross-platform and multibrowser) and [http://seleniumhq.org/projects/remote-control/ Selenium Remote Control] (also is cross-platform and multibrowser but unlike the previous one, this API supports C#, Java, Perl, PHP, Python and Ruby) so if anyone wants to realize this project, then can choose one of these applications to start (or other similar to the above). --[[User:Swicher|Swicher]] 09:41, 1 August 2011 (UTC)
 
* [http://www.harmonycentral.com Harmony Central] User (-submitted) Reviews were around for over a decade and covered just about every musical instrument and related accessory commercially sold. Site updates have caused these to be offline, though admins say the data still exists. As far as can be determined, Archive.org has little if any of these reviews. [http://www.harmonycentral.com/t5/Feedback/User-reviews/td-p/34660122 This thread] has the whole story. --[[User:Benbradley|Benbradley]] 20:41, 13 July 2013 (EDT)


== Finished Projects ==
:''See also: [[:Category:Rescued Sites]].''
* [[User:Jscott|Jason]] founded the Archive Team ([http://archiveteam.org/index.php?title=Main_Page&diff=prev&oldid=3 see]).
* [[User:Jscott|Jason]] founded the Archive Team ([http://archiveteam.org/index.php?title=Main_Page&diff=prev&oldid=3 see]).
* [[User:Bbot|bbot]] made [http://thepiratebay.org/user/archiveteam/ an archiveteam TPB user]. Get the password from him or Jason. (Not really a ''project'', per se.)
* [[User:Bbot|bbot]] made [http://thepiratebay.org/user/archiveteam/ an archiveteam TPB user]. Get the password from him or Jason. (Not really a ''project'', per se.)
Line 188: Line 39:
* [[User:Start|Start]] grabbed [https://www.dropbox.com/s/iok7mgvyxm3rvfj/FoxyTunes.zip FoxyTunes] (it's less than 1MB!) right before it shut down.
* [[User:Start|Start]] grabbed [https://www.dropbox.com/s/iok7mgvyxm3rvfj/FoxyTunes.zip FoxyTunes] (it's less than 1MB!) right before it shut down.


== Other Projects ==
== Other projects ==
* '''[[FanFiction.Net]]''' is being pre-emptively archived.
* '''[[FanFiction.Net]]''' is being pre-emptively archived.
* '''[[User:ip2k|seanp2k]]''' is running [http://somaseek.com somaseek.com] and tracking all the song history for all of the internet radio stations on [http://somafm.com somafm.com] since March 2010.
* '''[[User:ip2k|seanp2k]]''' is running [http://somaseek.com somaseek.com] and tracking all the song history for all of the internet radio stations on [http://somafm.com somafm.com] since March 2010.
Line 200: Line 51:
* '''[[User:Start|Start]]''' is archiving Emulation Zone.
* '''[[User:Start|Start]]''' is archiving Emulation Zone.


== Dead Projects ==
== Dead projects ==
* [[User:EmuWikiAdmin|EmuWikiAdmin]] created [http://www.emuwiki.com EmuWiki], a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database.  Unfortunately, it [http://gbatemp.net/t230096-emuwiki-com-closes-down shut down] in May 2010 due to copyright issues.  A 20GB torrent of the site is apparently floating around somewhere.
* [[User:EmuWikiAdmin|EmuWikiAdmin]] created [http://www.emuwiki.com EmuWiki], a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database.  Unfortunately, it [http://gbatemp.net/t230096-emuwiki-com-closes-down shut down] in May 2010 due to copyright issues.  A 20GB torrent of the site is apparently floating around somewhere.



Revision as of 19:13, 7 February 2014

Projects status
Online (331) · Special cases (51) · Endangered (70) · Closing (16) · Offline (424)
Rescued Sites (498) · Self-Saved (17) · Partially Rescued Sites (213) · In Progress (43) · Upcoming (11) · Not Saved Yet (409) · On hiatus (12) · Lost Sites (91)
Unknown Status (65)

Here's where Archive Teamsters can list the projects they are currently working on and organize new projects.

Projects

Our Current Projects page.

See also: Category:In progress.

ArchiveTeam Warrior

The ArchiveTeam Warrior is a virtual machine that will allow you to lend a hand on large archiving projects whenever they come up.

ArchiveBot

ArchiveBot is an IRC bot that automates archiving for smaller sites.

Websites at risk

See Deathwatch and Alive... OR ARE THEY.

Ideas for projects

See Deathwatch and Alive... OR ARE THEY.

Finished projects

See also: Category:Rescued Sites.
  • Jason founded the Archive Team (see).
  • bbot made an archiveteam TPB user. Get the password from him or Jason. (Not really a project, per se.)
  • bbot has archived everything2, and will continue to make further archives as more content is added.
  • starwars.yahoo.com was successfully archived before it shut downin Dec, 2009
  • Scott has archived the Infoanarchy wiki site. -- The archive is complete and is at: Infoanarchy wiki archive. A 5.1 MB gzipped archive of the wiki is also available. (The Infoanarchy wiki site was down for several months in the first part of 2011, but is back up as of May 2011. There is now very little content updating on the site.)
  • Scott has archived/mirrored The Cyberpunk Project. (You'll have to Google it - this wiki won't let me edit a page that includes the Russian TLD.) This Russian-based Website is inactive, and hasn't been updated or changed since April 2010. Most pages haven't been changed since 2007. How long will it stay online? Your guess is as good as mine... The mirror is available at: The Cyberpunk Project Mirror.
  • As reported on boingboing by Cory Doctorow, all of Gopherspace - scraped in 2007 - needs an archive home. Anybody have 15GB of spare hosted-server space for this project?
I do, please contact me at admin@emuwiki.com to tell me what to do. EmuWikiAdmin 15:17, 2 May 2010 (UTC)
They are added to iBiblio http://torrent.ibiblio.org/search.php?query=gopher&submit=search Emijrp 11:34, 2 November 2010 (UTC)
It was added to Internet Archive by Jason too http://www.archive.org/details/2007-gopher-mirror Emijrp 19:23, 4 June 2011 (UTC)
  • The data being hosted in Kasabi was retrieved and uploaded to Internet Archive. Edsu 13:03, 19 July 2012 (EDT)
  • Splinder is being copied before it shuts down in early 2012.
  • MobileMe - me.com. Closed June 30th, 2012.
  • Start grabbed FoxyTunes (it's less than 1MB!) right before it shut down.

Other projects

Dead projects

  • EmuWikiAdmin created EmuWiki, a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database. Unfortunately, it shut down in May 2010 due to copyright issues. A 20GB torrent of the site is apparently floating around somewhere.

Tools

See also

Fire DrillProjectsPhilosophy