Difference between revisions of "Government Backup"

From Archiveteam
Jump to navigation Jump to search
(add some redlinks; will try to create later)
(→‎#DATAREFUGE: rm link now there's a dedicated page)
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
__NOTOC__
[[Image:Government data.jpg|300px|right]]
[[Image:Government data.jpg|300px|right]]


Line 8: Line 9:


[[Internet Archive]] has two teams, [[Wayback Machine|Wayback]] and [[Archive-It]] ([https://archive-it.org/ archive-it.org]), working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.
[[Internet Archive]] has two teams, [[Wayback Machine|Wayback]] and [[Archive-It]] ([https://archive-it.org/ archive-it.org]), working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.
The result of these internal efforts are saved in [https://archive.org/details/EndOfTerm2016WebCrawls this collection]. (Note that other efforts exist under this collection as Sub-Collections, such as Archive Team efforts.)


=== #DATAREFUGE ===
=== #DATAREFUGE ===


The [[DataRefuge|Data Refuge]] project ([http://www.ppehlab.org/datarefuge ppehlab.org/datarefuge]) has [https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit#gid=0 the following Google document] about climate datasets.
The [[DataRefuge|Data Refuge]] project has [https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit#gid=0 the following Google document] about climate datasets.


=== Archive Team FTP Backup ===
=== Archive Team FTP Backup ===


The [[USA-Gov|Archive Team project]] is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked [http://tracker.archiveteam.org/ftpdisco/ here] (discovery phase) and [http://tracker.archiveteam.org/ftp-gov/ here] (download phase). The results of this download are being sent to [https://archive.org/details/archiveteam_ftpgov this collection].
The [[ftp-gov|Archive Team project]] is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked [http://tracker.archiveteam.org/ftpdisco/ here] (discovery phase) and [http://tracker.archiveteam.org/ftp-gov/ here] (download phase). The results of this download are being sent to [https://archive.org/details/archiveteam_ftpgov this collection].


=== Archive Team General Websites Download ===
=== Archive Team General Websites Download ===


Besides the FTP data download, Archive Team is also doing a general download (where possible) of many crawlable government websites, such as [[USA-Gov|usa.gov]].
Besides the [[ftp-gov|FTP]] data download, Archive Team is also doing a general download (where possible) of many crawlable government websites, such as [[USA-Gov|usa.gov]].
 
== Internet Archive Statements ==
 
* [http://blog.archive.org/2016/11/09/us-election-results/ US Election Results] - Surprise at the outcome of the election and a call to keep libraries open.
* [http://blog.archive.org/2016/11/11/contribute-to-the-2016-u-s-presidential-election-web-archive/ Please Help Build the 2016 End-of-Term Archive] - A call for assistance and volunteers
* [http://blog.archive.org/2016/11/29/help-us-keep-the-archive-free-accessible-and-private/ Help Us Keep the Archive Free, Accessible, and Reader Private] - First entry that indicates election has influenced efforts to back up the archive in Canada.
* [http://blog.archive.org/2016/12/03/faqs-about-the-internet-archive-canada/ FAQs about the Internet Archive Canada] - Much needed clarification about the mirroring in Canada of the Internet Archive.
* [http://blog.archive.org/2016/12/06/internet-archive-canada-and-national-security-letter-in-the-news-roundup/ Internet Archive Canada and National Security Letter in the News] - Roundup of press mentions about the mirroring efforts.
* [http://blog.archive.org/2016/12/15/preserving-u-s-government-websites-and-data-as-the-obama-term-ends/ Preserving U.S. Government Websites and Data as the Obama Term Ends] - Notes by the head of Archive-It about efforts to run the End of Term archiving.
* [http://blog.archive.org/2016/12/17/robots-txt-gov-mil-websites/ Robots.txt Files and Archiving .gov and .mil Websites Archiving .GOV and .MIL Websites] - The Internet Archive will no longer follow ROBOTS.TXT directives on .GOV and .MIL.
* [http://blog.archive.org/2016/12/20/would-like-to-archive-government-web-services-not-just-web-sites-please-help/ Would Like to Archive Government Web Services, not just Web Sites– Please help] - Additional call to archive Government Web Services, not just Websites.
 
== Notable Press Mentions and References ==
 
''Note that the story oscillates between "Internet Archive is adding a mirror in Canada" and "Internet Archive is Moving to Canada".'' The actuality, for anyone viewing this page coming in cold, is that the Internet Archive has been building a mirror in Canada for a significant period of time and has a fully-functioning facility in Canada that has been a presence of some sort for nearly a decade as of 2016. The current effort was merely a speeding up of an inevitable timetable.
 
* [http://www.theverge.com/2016/11/29/13778188/internet-archive-of-canada-backup-trump-surveillance-censorship The Internet Archive is building a Canadian copy to protect itself from Trump], The Verge, November 29, 2016
* [http://www.nbcnews.com/news/us-news/internet-archive-web-s-warehouse-creating-trump-era-copy-canada-n689916 Internet Archive, Web's Warehouse, Creating Trump-Era Copy in Canada], NBC News, November 29, 2016
* [http://www.dailykos.com/story/2016/11/30/1605487/-The-Internet-Archive-is-Moving-to-Canada The Internet Archive is "Moving to Canada"], The Daily Kos, November 30, 2016
* [http://gothamist.com/2016/11/30/even_the_internet_is_getting_ready.php Even The Internet Archive Is Moving To Canada Because Of Trump], Gothamist, November 30, 2016
* [http://www.huffingtonpost.ca/2016/11/30/archive-org-canada-trump_n_13330492.html Archive.org Moving To Canada Over Trump Censorship Fears], Huffington Post Canada, November 30, 2016

Latest revision as of 14:32, 24 January 2017

Government data.jpg

The US Government has an awful lot of data, and it's in a lot of places. In 2016, elections were held that indicated deep sea changes in goals and ideals (although previous transitions have always contained such changes). Inspired by this, a number of groups and efforts have risen up to ensure backups of all government data possible are made off-site.

This page contains overviews of the effort by all the teams.

Internet Archive

Internet Archive has two teams, Wayback and Archive-It (archive-it.org), working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.

The result of these internal efforts are saved in this collection. (Note that other efforts exist under this collection as Sub-Collections, such as Archive Team efforts.)

#DATAREFUGE

The Data Refuge project has the following Google document about climate datasets.

Archive Team FTP Backup

The Archive Team project is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked here (discovery phase) and here (download phase). The results of this download are being sent to this collection.

Archive Team General Websites Download

Besides the FTP data download, Archive Team is also doing a general download (where possible) of many crawlable government websites, such as usa.gov.

Internet Archive Statements

Notable Press Mentions and References

Note that the story oscillates between "Internet Archive is adding a mirror in Canada" and "Internet Archive is Moving to Canada". The actuality, for anyone viewing this page coming in cold, is that the Internet Archive has been building a mirror in Canada for a significant period of time and has a fully-functioning facility in Canada that has been a presence of some sort for nearly a decade as of 2016. The current effort was merely a speeding up of an inevitable timetable.