Difference between revisions of "Government Backup"

From Archiveteam
Jump to navigation Jump to search
(add some redlinks; will try to create later)
Line 7: Line 7:
=== Internet Archive ===
=== Internet Archive ===


Internet Archive has two teams, Wayback and Archive-It, working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.
[[Internet Archive]] has two teams, [[Wayback Machine|Wayback]] and [[Archive-It]] ([https://archive-it.org/ archive-it.org]), working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.


=== #DATAREFUGE ===
=== #DATAREFUGE ===


The Data Refuge project has [https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit#gid=0 the following google document] about climate datasets.
The [[DataRefuge|Data Refuge]] project ([http://www.ppehlab.org/datarefuge ppehlab.org/datarefuge]) has [https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit#gid=0 the following Google document] about climate datasets.


=== Archive Team FTP Backup ===
=== Archive Team FTP Backup ===


The Archive Team project is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked [http://tracker.archiveteam.org/ftpdisco/ here] (discovery phase) and [http://tracker.archiveteam.org/ftp-gov/ here] (download phase). The results of this download are being sent to [https://archive.org/details/archiveteam_ftpgov this collection].
The [[USA-Gov|Archive Team project]] is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked [http://tracker.archiveteam.org/ftpdisco/ here] (discovery phase) and [http://tracker.archiveteam.org/ftp-gov/ here] (download phase). The results of this download are being sent to [https://archive.org/details/archiveteam_ftpgov this collection].


=== Archive Team General Websites Download ===
=== Archive Team General Websites Download ===


Besides the FTP data download, Archive Team is also doing a general download (where possible) of many crawlable government websites.
Besides the FTP data download, Archive Team is also doing a general download (where possible) of many crawlable government websites, such as [[USA-Gov|usa.gov]].

Revision as of 20:26, 29 December 2016

Government data.jpg

The US Government has an awful lot of data, and it's in a lot of places. In 2016, elections were held that indicated deep sea changes in goals and ideals (although previous transitions have always contained such changes). Inspired by this, a number of groups and efforts have risen up to ensure backups of all government data possible are made off-site.

This page contains overviews of the effort by all the teams.

Internet Archive

Internet Archive has two teams, Wayback and Archive-It (archive-it.org), working through listings of government websites and data stores. They are working internally using Internet Archive's crawlers and environment.

#DATAREFUGE

The Data Refuge project (ppehlab.org/datarefuge) has the following Google document about climate datasets.

Archive Team FTP Backup

The Archive Team project is backing up 750+ FTP sites hosted at .MIL and .GOV sites. These two projects can be tracked here (discovery phase) and here (download phase). The results of this download are being sent to this collection.

Archive Team General Websites Download

Besides the FTP data download, Archive Team is also doing a general download (where possible) of many crawlable government websites, such as usa.gov.