Difference between revisions of "Main Page/Current Projects"

From Archiveteam
Jump to navigation Jump to search
(Google Drive to warrior-based)
 
(214 intermediate revisions by 23 users not shown)
Line 1: Line 1:
__NOTOC__
__NOTOC__
== Archive Team recruiting ==
== Archive Team recruiting ==
* Help us: '''[[ArchiveTeam_Warrior|☞ Download and run your warrior ☜]]'''.
* What's on: [https://tracker.archiveteam.org/ online tracker].
* [[Donate|Donate to keep our projects going]].
* [[Dev|Want to code for Archive Team? Here's a starting point.]]
* [[Dev|Want to code for Archive Team? Here's a starting point.]]
* Help us: '''[[ArchiveTeam_Warrior|☞ Download and run your warrior ☜]]'''.<br>
* What's on: [https://tracker.archiveteam.org/ online tracker].<br>
<!--Combined project activity graphs [http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/project_items.html here].-->


== Warrior-based projects ==
== Warrior-based projects ==
{{:CurrentWarriorProject}}
{{:CurrentWarriorProject}}


<!-- Urgent projects -->
=== Short-term, urgent projects ===
* Afghanistan: Archiving the Afghan web due to recent events. '''IRC Channel {{IRC|afghansites|network=hackint}}'''.
<!-- Projects with strong deadline, deadline is in the future -->
* [[XTube]]: The shutdown on 5 September 2021 will surely leave a gaping hole in the web. '''IRC Channel {{IRC|nevermind|network=hackint}}'''.
<!-- sorted by deadline (soonest on top) -->
<!-- Longer but finite projects -->
* [[Typepad]]: A blogging service ceased to exist by the end of September 2025. '''IRC Channel {{IRC|typebad}}'''
* [[Google Drive]]: Google will break millions of shared Drive links on 13 September 2021. '''IRC Channel {{IRC|googlecrash|network=hackint}}'''.
* Classic [[Google Sites]]: Making more sites inaccessible to the public starting September 1, 2021. '''IRC Channel {{IRC|nearlylostmygoogles|network=hackint}}'''.
* [[Periscope]]: Another Twitter acquisition, another shutdown. This time, its live-streamer gets to join Vine in the bin at the end of March. '''IRC Channel {{IRC|microscope|network=hackint}}'''.
* [[Webs]]: Vistaprint is killing off the Freewebs you knew from the 2000s on <s>March 31</s> June 30, 2021, unless you pay up. '''IRC Channel {{IRC|webbed|network=hackint}}'''.
<!-- Long-term projects -->
* [[GitHub]]: Embraced-uh, I mean, bought by Microsoft. '''IRC Channel {{IRC|gitgud|network=hackint}}'''.
* [[MediaFire]]: [https://twitter.com/textfiles/status/1349516443654758401 Not 'at-risk' but grabbing speculatively to save historic files] '''IRC Channel {{IRC|mediaonfire|network=hackint}}'''.
* [[Reddit]]: Banning communities that generate bad PR for Reddit Inc. Currently grabbing ''new'' material. '''IRC Channel {{IRC|shreddit|network=hackint}}'''.
* [[URLs]]: A random collection of stuff. '''IRC Channel {{IRC|//|network=hackint}}'''.
* [[URLTeam]]: URL shorteners were a fucking awful idea. '''IRC Channel {{IRC|urlteam|network=hackint}}'''.


''An updated Warrior virtual appliance (v3.2) is now available with better support for newer projects that utilize wget-at. Please download it using the link above.''
=== Medium-term projects ===
<!--
<!-- Projects for which the deadline has passed, deadline is unclear, but there is a moment they are "finished" -->
=== Scripts only === -->
<!-- sorted alphabetically -->
* [[Meta Ad Library]]: Database for advertisements for Facebook and other products by Meta. '''IRC Channel {{IRC|fads}}'''
* [[Peing]]: A Japanese question/answer service, was slated to be shutdown on {{datetime|2025-08-29}}. '''IRC Channel {{IRC|peingpong}}'''
* [[US Government]]: Archiving the US government. '''IRC Channel {{IRC|UncleSamsArchive}}'''
** [[Radio Free Asia]]: Non-profit media organization owned by USAGM.
** [[Radio Free Europe|Radio Free Europe/Radio Liberty]]: Non-profit media organization owned by USAGM.
** [[Voice of America]]: An internationally-broadcasting state media network at risk of closure.
 
=== Long-term projects ===
<!-- Ongoing projects. No deadline, no moment of "finishing" -->
<!-- sorted alphabetically -->
* [[Microsoft Update]]: Removal of legacy Windows drivers announced. '''IRC Channel {{IRC|windowfixer}}'''
* [[Telegram]]: Archiving public messages in various newsworthy and/or otherwise notable Telegram channels. '''IRC Channel {{IRC|telegrab}}'''.
* [[Twitch]]: Archiving metadata and select videos. '''IRC Channel {{IRC|burnthetwitch}}'''
* [[URLTeam]]: URL shorteners were a fucking awful idea. '''IRC Channel {{IRC|urlteam}}'''.
* [[URLs]]: A random collection of stuff. '''IRC Channel {{IRC|//}}'''.
* [[YouTube]]: Archiving [[YouTube#Scope|selected videos]]. '''IRC Channel {{IRC|down-the-tube}}'''.
 
=== Long-term, slower-paced projects ===
These are projects that are actively running but generally only have small numbers of items available to complete at a time.
<!-- sorted alphabetically -->
* [[Blogger]]: Grabbing inactive Blogger blogs since Google began a mass purge of inactive Google accounts on or after {{datetime|2023-12-01}}. '''IRC Channel {{IRC|frogger}}'''.
* [[GitHub]]: Embraced-uh, I mean, bought by Microsoft. '''IRC Channel {{IRC|gitgud}}'''.
* [[Imgur]]: Unregistered users' "old" and "inactive" images will be purged, and all NSFW content is being shown the door on {{datetime|2023-05-15}}. '''IRC Channel {{IRC|imgone}}'''.
* [[MediaFire]]: [https://twitter.com/textfiles/status/1349516443654758401 Not 'at-risk' but grabbing speculatively to save historic files] '''IRC Channel {{IRC|mediaonfire}}'''.
* [[Pastebin]]: Archiving the pastas. '''IRC Channel {{IRC|pastalavista}}'''.


== Manual projects ==
== Manual projects ==
* [[Coronavirus|2019-2021 coronavirus outbreak]]: Documenting and preserving data, events, and impacts of the virus on society. '''IRC Channel {{IRC|coronarchive|network=hackint}}'''
* [[ArchiveBot]]: For those with lots of disk space, bandwidth and long-term commitment. '''IRC Channel {{IRC|archivebot}}'''.
* [[ArchiveBot]]: For those with lots of disk space, bandwidth and long-term commitment. '''IRC Channel {{IRC|archivebot|network=hackint}}'''.
* [[Codearchiver]]: Dumping and archival of source code repositories and associated version control systems. '''IRC Channel {{IRC|codearchiver}}'''.
* [[WikiTeam]]: Saving wikis dumps (XML). And their external links for the Wayback Machine (WARC) as well as exporting MediaWiki databases. Permanent effort, [https://github.com/WikiTeam/wikiteam/wiki/Tutorial#I_have_no_shell_access_to_server everyone can help] (you choose the size of your downloads). '''IRC Channel {{IRC|wikiteam|network=hackint}}'''.
* Dead people: When people die, their webpages and/or social media might go "Poof!" due to fees and other knick-knack. '''IRC Channel {{IRC|archiveteam}}'''
* [[Wikibot]] and [[WikiTeam]]: Saving wikis dumps (XML). And their external links for the Wayback Machine (WARC) as well as exporting MediaWiki databases. Permanent effort, [https://github.com/WikiTeam/wikiteam/wiki/Tutorial#I_have_no_shell_access_to_server everyone can help] (you choose the size of your downloads). '''IRC Channels {{IRC|wikibot}} {{IRC|wikiteam}}'''.
* [[Formats|File Formats]] and [[Just Solve the Problem 2012|Just Solve]]: Let's Document all the File Formats! also has contents on the likes of subdomains, e.g. [http://fileformats.archiveteam.org fileformats.archiveteam.org] and [http://justsolve.archiveteam.org justsolve.archiveteam.org]. '''IRC Channels {{IRC|justsolve}}'''
 
== Recently finished projects ==
<!-- projects that have finished in the last 30 days go here in reverse-chronogical order to be found easily and showcase recent work. additionally, keep projects here that are still in the tracker but not yet deleted so it won't confuse people. -->
* [[Glitch]]: Hobbyist web hosting. Due to be inaccessible {{datetime|2025-07-08}} and expected to fully shutdown on the end of 2025. '''IRC Channel {{IRC|ditched}}'''.
* [[Goo.gl]]: Google's URL shortener will shut down on {{datetime|2025-08-25}}, excluding links that were active in late 2024. '''IRC Channel {{IRC|urlteamwasright}}'''


== Upcoming & proposed projects ==
== Upcoming & proposed projects ==
Line 37: Line 59:
<!-- Top priority: could disappear anytime now -->
<!-- Top priority: could disappear anytime now -->
<!-- Shutting down, definite deadline given -->
<!-- Shutting down, definite deadline given -->
* [[Chrome Web Store]]: Google has announced a timeline of policy changes that will lead to content being removed between December 1, 2020 and June 2022. '''IRC Channel {{IRC|chromeweblore|network=hackint}}'''.
* [[Dailymotion]]: Archiving inactive videos. '''IRC Channel {{IRC|DailyDemotion}}'''
* [[Chrome Web Store]]: Google has announced a timeline of policy changes that will lead to content being removed between {{datetime|2021-12-01}} and 2025. '''IRC Channel {{IRC|chromeweblore}}'''.
<!-- Shutting down, vague deadline given -->
<!-- Shutting down, vague deadline given -->
* [[Kinja]]: Deleting all user pages, maybe? '''IRC Channel {{IRC|gokinjagokinjago|network=hackint}}'''.
* [[Photobucket]]: Finally following through on over a year of email threats that free accounts are going to be mass deactivated if they don't pay up. '''IRC Channel {{IRC|photosucket}}'''.
* [[Twitter]]: Deleting inactive accounts <s>2019-12-11</s> sometime. '''IRC Channel {{IRC|twitterdead}}'''.
* Abandoned iOS App Store & Google Play apps: Both Apple and Google are slimming down on abandoned apps, [https://www.cnet.com/tech/mobile/one-third-of-apple-and-google-apps-are-so-outdated-they-could-get-removed/ with an estimated ~1.5M of them at risk]. '''IRC Channel {{IRC|appocalypse}}'''.
* [[Twitter]]: General instability; deleting inactive accounts <s>{{datetime|2019-12-11}}</s> sometime. '''IRC Channel {{IRC|twitterdead|EFnet|abandoned}}'''.
<!-- Shutting down, no deadline given -->
<!-- Shutting down, no deadline given -->
<!-- Archiving the archives -->
<!-- Archiving the archives -->
<!-- Misc. projects (unmaintained sites, distrust in owners) -->
<!-- Misc. projects (unmaintained sites, distrust in owners) -->
* [[YouTube]]: Archiving all YouTube metadata and selected videos afterwards soon. '''IRC Channel {{IRC|down-the-tube|network=hackint}}'''.
* [[VKontakte]]: A Russian equivalent of Facebook carries the risk of tumbling down under the weight of sanctions as a result of the government's invasion of Ukraine. '''IRC Channel {{IRC|lostkontakt}}'''.
* [[Imgur]]: Image hoster decided that using it for hosting images is not permitted. '''IRC Channel {{IRC|imgone}}'''.
* [[JamiiForums]]: the Tanzanian government would like this gone. '''IRC Channel {{IRC|jammedforums|EFnet|abandoned}}'''.
* [[JamiiForums]]: the Tanzanian government would like this gone. '''IRC Channel {{IRC|jammedforums}}'''.
* [[LiveJournal]]: Very old, widely regarded as in decline, and has a lot of important stuff buried in it. '''IRC Channel {{IRC|recordedjournal|EFnet|abandoned}}'''.
* [[LiveJournal]]: Very old, widely regarded as in decline, and has a lot of important stuff buried in it. '''IRC Channel {{IRC|recordedjournal}}'''.
* [[The Pirate Bay]]: Recently came back up, grabbing an archive for sanity's sake. '''IRC Channel {{IRC|yarharfiddlededee|EFnet|abandoned}}'''.
* [[Ownlog]]: Ownlog is losing popularity and support from its owners. '''IRC Channel {{IRC|pwnlog}}'''.
* [[The Pirate Bay]]: Recently came back up, grabbing an archive for sanity's sake. '''IRC Channel {{IRC|yarharfiddlededee}}'''.
* [[Valhalla]]: Where to store what even the [[Internet Archive]] doesn't have space for? '''IRC Channel {{IRC|huntinggrounds}}'''.
* [[Valhalla]]: Where to store what even the [[Internet Archive]] doesn't have space for? '''IRC Channel {{IRC|huntinggrounds}}'''.
* [[Giphy]]: Bought by Facebook, to be "integrated" (assimilated) into Instagram https://news.knowyourmeme.com/news/facebook-to-buy-giphy
* [[Giphy]]: Bought by <s>Facebook</s>Shutterstock, to be "integrated" (assimilated) into <s>Instagram</s> https://news.knowyourmeme.com/news/facebook-to-buy-giphy
 
== Recently finished projects ==
<!-- put projects here that are still in the tracker but not yet deleted so it won't confuse people -->
* [[CodePlex]]: Microsoft's self-archive will be permanently removed from its Recycle Bin after July 1, 2021. '''IRC Channel {{IRC|plexicode|network=hackint}}'''.
* [[Google Poly]]: A 3D art repository that Google will send to the trash compactors on June 30, 2021. New uploads cease April 30. '''IRC Channel {{IRC|polygone|network=hackint}}'''.
* [[Bintray]]: JFrog is dismantling the software distribution platform used by numerous projects in May. '''IRC Channel {{IRC|binnedtray|network=hackint}}'''.


== Hiatus / Missed the Mark ==
== On hiatus ==
* [[Tinkercad]]: Autodesk announced its intent to put designs from inactive OAuth accounts back into minds around May 24, 2021. '''IRC Channel {{IRC|tinkerhad|network=hackint}}'''.
* [[Angelfire]]: Angelfire is a web hosting service that contains big chunks of early WWW history and has no proper backup. '''IRC Channel {{IRC|angelonfire}}'''.
* [[Angelfire]]: Angelfire is a web hosting service that contains big chunks of early WWW history and has no proper backup. '''IRC Channel {{IRC|angelonfire}}'''.
* [[Audit2014|Audit 2014]]: It's time to verify our shit. '''IRC Channel {{IRC|auditteam}}'''. THIS PROJECT IS ON HIATUS AND WILL BE RETURNED TO AS AUDIT2018.
* [[Audit2014|Audit 2014]]: It's time to verify our shit. '''IRC Channel {{IRC|auditteam}}'''.
* [[Flickr]]: <s>[[Yahoo!]]</s> SmugMug decided to kill it after finding Yahoo!'s plans to do so before they were bought by Verizon. '''IRC Channel {{IRC|flickrfckr|network=hackint}}'''.
* [[Flickr]]: <s>[[Yahoo!]]</s> SmugMug decided to kill it after finding Yahoo!'s plans to do so before they were bought by Verizon. '''IRC Channel {{IRC|flickrfckr}}'''.
* [[FTP]]: Help us find and download all FTP sites! '''IRC Channel {{IRC|effteepee|network=hackint}}'''.
* [[FTP]]: Help us find and download all FTP sites! '''IRC Channel {{IRC|effteepee}}'''.
* [[Google Groups]]: "Gone within a year" ([[User:Jscott|SketchCow]], 2016-06-07).
* [[Google Drive]]: Same as MediaFire. '''IRC Channel {{IRC|googlecrash}}'''. Currently on hiatus.
* [[Google Groups]]: "Gone within a year" ([[User:Jscott|SketchCow]], {{datetime|2016-06-07}}).
* [[Google News Archive]]: Let's store all newspapers at Google, WCGW? '''IRC Channel {{IRC|papersplease}}'''.
* [[Google News Archive]]: Let's store all newspapers at Google, WCGW? '''IRC Channel {{IRC|papersplease}}'''.
* [[DevPort]]: This [http://developerportfolio.com/ portfolio SaaS provider] has [http://www.lowendtalk.com/discussion/65135/need-some-help-saas-provider-is-dead-but-my-site-is-still-up-how-should-i-grab-it reportedly] been having infrastructure issues, and removed their social media accounts. Possible impending shutdown.
* [[INTERNETARCHIVE.BAK]]: Grab a slice of the big cake of [[Internet Archive|The Archive]]! '''IRC Channel {{IRC|internetarchive.bak}}'''.
* [[INTERNETARCHIVE.BAK]]: Grab a slice of the big cake of [[Internet Archive|The Archive]]! '''IRC Channel {{IRC|internetarchive.bak}}'''.
* [[ISP Hosting]]: Finding ISP web hosting services before the Grim Reaper finds them. '''IRC Channel {{IRC|webroasting|network=hackint}}'''.
* [[ISP Hosting]]: Finding ISP web hosting services before the Grim Reaper finds them. '''IRC Channel {{IRC|webroasting}}'''.
* [[NewsGrabber]]: Saving all news articles. <!-- Help with server power or by finding more news sites.-->Currently paused. '''IRC Channel {{IRC|newsgrabber|network=hackint}}'''.
* [[Livestream]]: A video stream site merging with Vimeo in {{datetime|2025-01}}. '''IRC Channel {{IRC|deadtrickle}}'''
* [[Miraheze]]: <s>Shutting down sometime between {{datetime|2023-09-01}} and {{datetime|2023-10-31}}.</s> Rescued by new volunteers!
* [[Project Newsletter]]: Archiving e-newsletters, currently in development. '''IRC Channel {{IRC|projectnewsletter}}'''.
* [[Project Newsletter]]: Archiving e-newsletters, currently in development. '''IRC Channel {{IRC|projectnewsletter}}'''.
* [[Quizlet]]: Flashcards and other learning tools '''IRC Channel {{IRC|quizletusin}}'''.
* [[Quizlet]]: Flashcards and other learning tools '''IRC Channel {{IRC|quizletusin}}'''.
* [[Tumblr]]: [[Yahoo!]] considered killing it, now Yahoo has been acquired and Verizon declared war on NSFW blogs. '''IRC Channel {{IRC|tumbledown|network=hackint}}'''.
* [[Tinkercad]]: Autodesk announced its intent to put designs from inactive OAuth accounts back into minds around {{datetime|2021-05-24}}. '''IRC Channel {{IRC|tinkerhad}}'''.
* [[yuku]]: Lately yuku is very unstable and hosting thousands of forums. Project currently paused. '''IRC Channel {{IRC|archiveteam|network=hackint}}'''.
* [[Tumblr]]: [[Yahoo!]] considered killing it, now Yahoo has been acquired and Verizon declared war on NSFW blogs. Tumblr has since been sold to Automattic. '''IRC Channel {{IRC|tumbledown}}'''.
* [[Reddit]]: Banning communities that generate bad PR for Reddit Inc. Restricted access to APIs and data on {{datetime|2023-06-19}}. '''IRC Channel {{IRC|shreddit}}'''.


<small>ArchiveTeam primarily uses the hackint IRC network – ircs://irc.hackint.org:6697 (TLS required) – webchat: https://webirc.hackint.org/ – [[Archiveteam:IRC|More info]]
<small>ArchiveTeam uses the hackint IRC network – ircs://irc.hackint.org:6697 (TLS required) – webchat: https://chat.hackint.org/#/connect?join=archiveteam-bs – [[Archiveteam:IRC|More info]]</small>
<small>ArchiveTeam also has some channels left on the EFnet IRC network – irc://irc.efnet.org – webchat: http://chat.efnet.org:9090 – [[Archiveteam:IRC|More info]]</small><br>

Latest revision as of 23:36, 8 September 2025

Archive Team recruiting

Warrior-based projects

ArchiveTeam's Choice: Telegram

Short-term, urgent projects

  • Typepad: A blogging service ceased to exist by the end of September 2025. IRC Channel #typebad (on hackint)

Medium-term projects

Long-term projects

Long-term, slower-paced projects

These are projects that are actively running but generally only have small numbers of items available to complete at a time.

Manual projects

Recently finished projects

  • Glitch: Hobbyist web hosting. Due to be inaccessible 2025-07-08 and expected to fully shutdown on the end of 2025. IRC Channel #ditched (on hackint).
  • Goo.gl: Google's URL shortener will shut down on 2025-08-25, excluding links that were active in late 2024. IRC Channel #urlteamwasright (on hackint)

Upcoming & proposed projects

On hiatus

  • Angelfire: Angelfire is a web hosting service that contains big chunks of early WWW history and has no proper backup. IRC Channel #angelonfire (on hackint).
  • Audit 2014: It's time to verify our shit. IRC Channel #auditteam (on hackint).
  • Flickr: Yahoo! SmugMug decided to kill it after finding Yahoo!'s plans to do so before they were bought by Verizon. IRC Channel #flickrfckr (on hackint).
  • FTP: Help us find and download all FTP sites! IRC Channel #effteepee (on hackint).
  • Google Drive: Same as MediaFire. IRC Channel #googlecrash (on hackint). Currently on hiatus.
  • Google Groups: "Gone within a year" (SketchCow, 2016-06-07).
  • Google News Archive: Let's store all newspapers at Google, WCGW? IRC Channel #papersplease (on hackint).
  • INTERNETARCHIVE.BAK: Grab a slice of the big cake of The Archive! IRC Channel #internetarchive.bak (on hackint).
  • ISP Hosting: Finding ISP web hosting services before the Grim Reaper finds them. IRC Channel #webroasting (on hackint).
  • Livestream: A video stream site merging with Vimeo in 2025-01. IRC Channel #deadtrickle (on hackint)
  • Miraheze: Shutting down sometime between 2023-09-01 and 2023-10-31. Rescued by new volunteers!
  • Project Newsletter: Archiving e-newsletters, currently in development. IRC Channel #projectnewsletter (on hackint).
  • Quizlet: Flashcards and other learning tools IRC Channel #quizletusin (on hackint).
  • Tinkercad: Autodesk announced its intent to put designs from inactive OAuth accounts back into minds around 2021-05-24. IRC Channel #tinkerhad (on hackint).
  • Tumblr: Yahoo! considered killing it, now Yahoo has been acquired and Verizon declared war on NSFW blogs. Tumblr has since been sold to Automattic. IRC Channel #tumbledown (on hackint).
  • Reddit: Banning communities that generate bad PR for Reddit Inc. Restricted access to APIs and data on 2023-06-19. IRC Channel #shreddit (on hackint).

ArchiveTeam uses the hackint IRC network – ircs://irc.hackint.org:6697 (TLS required) – webchat: https://chat.hackint.org/#/connect?join=archiveteam-bsMore info