Difference between revisions of "US Government"

From Archiveteam
Jump to navigation Jump to search
(→‎See Also: Add link to RFERL)
(Move the site archival information to US Government/War room to prevent confusion)
Line 14: Line 14:
}}
}}


== Discovery ==
'''US Government websites''' are at the risk of going offline or seeing drastic changes in content under the Trump administration.


An official list of [https://flatgithub.com/cisagov/dotgov-data/blob/main/?filename=current-full.csv&sha=f4ab2336715a72522888b63b5ff92baf7c5a3a86 all registered .gov domains] and [https://flatgithub.com/cisagov/dotgov-data/blob/main/?filename=current-federal.csv federal .gov domains] is available. The raw CSV files and the .gov zone file are also [https://github.com/cisagov/dotgov-data/ available on GitHub]
[[US Government/War room]] contains some information being tracked about the archiving status of various government sites, but is not a comprehensive list of everything that has been archived.
 
=== Content at risk ===
<!--
|-
| Site
| Name
| Reason
| Archival Notes
| Status
-->
 
 
{| class="wikitable"
|-
! Site
! Name
! Reason
! Archival Notes
! Status
|-
| [https://data.gov https://data.gov]
| data.gov
| There have been reports of datasets disappearing from the website<ref>{{URL|https://old.reddit.com/r/climate/comments/1idiliv/the_us_governments_open_data_on_datagov_is/m9zacub/}}</ref> though this behavior might be normal due to the way that the site collects datasets from other locations.
| [https://catalog.data.gov https://catalog.data.gov]<br />[https://inventory.data.gov https://inventory.data.gov]<br />[https://resources.data.gov https://resources.data.gov]<br />[https://strategy.data.gov https://strategy.data.gov]<br />[https://sdg.data.gov https://sdg.data.gov]<br />[https://github.com/gsa/data.gov GitHub]
| {{Job|4hb15f3ijn846c1dw0w58k4fe}}<br />{{Job|4qlh2ol2vq2i525747l0yq6a4}}<br />{{Job|25o494lfnnlxtobegl9grx7tt}}<br />{{Job|e1ioqt5kilh8l4irihid8sqoq}}<br />{{Job|79u49omgtqkj83cnpyuhx0xr8}}<br />{{Job|akwvpyvnzeuhrvgh51tokrmsv}}<br />
|-
| [https://cdc.gov https://cdc.gov]
| Centers for Disease Control
| Directed to pause communication<ref>{{URL|https://www.cnn.com/2025/01/21/health/hhs-cdc-fda-trump-pause-communication/index.html}}</ref> along with other health agencies.
| [https://data.cdc.gov/ https://data.cdc.gov/]<br/>[https://ftp.cdc.gov/ https://ftp.cdc.gov/]<br/>[https://github.com/CDCGov GitHub]
| [https://cdc.gov https://cdc.gov] -> {{Job|hd3tvx4w14ybj2al0peewcv}}<br/>[https://ftp.cdc.gov/ https://ftp.cdc.gov/] -> {{Job|8zn8f6a2620t1tnje3f1cyr2o}}<br/>[https://data.cdc.gov/ https://data.cdc.gov/] -> {{Job|1u2ougx4kn6ueaiqddwjfeib7}}
|-
| [https://www.ncei.noaa.gov/ https://www.ncei.noaa.gov/]
| National Centers for Environmental Information
|
| (Some?) data is linked to from [https://data.gov data.gov].<br>It appears to be possible to enumerate datasets with 7-digit integer IDs starting at <code>0000001</code>, e.g. <code><nowiki>https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:0000001</nowiki></code>. Legacy URL format that redirects appears to be <code><nowiki>http://accession.nodc.noaa.gov/0000001</nowiki></code>
|
|-
| [https://www.nccs.nasa.gov/services/data-collections https://www.nccs.nasa.gov/services/data-collections]
| NASA Center for Climate Simulation Data
|
|
|
|-
| [https://www.ipcc-data.org https://www.ipcc-data.org]
| IPCC Data Distribution Centre
|
| Appears to have sequential IDs
|
|-
| [https://www.bco-dmo.org/ https://www.bco-dmo.org/]
| Biological and Chemical Oceanography Data Management Office
|
| Appears to have sequential IDs
|
|
|-
| [https://ncela.ed.gov/ https://ncela.ed.gov/]
| National Clearinghouse for English Language Acquisition
| Department of Education has been threatened many, many times
| Its "Resource Library" section is mainly a list of links to both internal and external links resources. e.g. could be PDFs or SoundCloud links.
|
|}
 
=== Domains and properties that require extra scripting ===
{| class="wikitable"
|-
! Site
! Name
! Reason
! Archival Notes
! Status
|-
| [https://fdic.gov https://fdic.gov]<br/>[https://banks.data.fdic.gov/bankfind-suite/bankfind https://banks.data.fdic.gov/bankfind-suite/bankfind]<br/>[https://orders.fdic.gov/s/?source=govdelivery https://orders.fdic.gov/s/?source=govdelivery]<br/>[https://catalog.fdic.gov/catalog/s/ https://catalog.fdic.gov/catalog/s/]<br/>[https://playmoneysmart.fdic.gov/games https://playmoneysmart.fdic.gov/games]
| FDIC Federal Deposit Insurance Corporation
| "The ones I spotted were https://banks.data.fdic.gov/bankfind-suite/bankfind https://orders.fdic.gov/s/?source=govdelivery https://catalog.fdic.gov/catalog/s/ https://playmoneysmart.fdic.gov/games but I suspect even more will need special handling"
|
|
|-
| [https://research-hub.nrel.gov/ https://research-hub.nrel.gov/]
| National Renewable Energy Laboratory
| Buttflare with TLS sniffing
|
|
|-
| [https://wisqars.cdc.gov/ https://wisqars.cdc.gov/]
| CDC Web-based Injury Statistics Query and Reporting System
| "I don't think they provide raw data for privacy/legal reasons, but I can probably save the data for the default charts at least"
|
|
|-
| [https://liheappm.acf.hhs.gov/datawarehouse https://liheappm.acf.hhs.gov/datawarehouse]
| HHS Low Income Home Energy Assistance Program
| "I've done the grantee profiles from https://liheappm.acf.hhs.gov/datawarehouse. There's probably additional data that could be exported through the reports but the site is fairly complicated"
|
|
|-
| [https://transfer.archivete.am/inline/qtDPx/apps.bea.gov_seed_urls.txt https://transfer.archivete.am/inline/qtDPx/apps.bea.gov_seed_urls.txt]
|
| has some sections using JS
|
|
|-
| [https://www.osti.gov https://www.osti.gov]
|
| https://www.osti.gov + https://www.osti.gov/opennet/ - lots of sitemaps in https://www.osti.gov/robots.txt. https://www.osti.gov/sitemap_ostigov/xml uses https://www.osti.gov/sitemap_ostigov_1.txt which I don't think archivebot will parse this looks like something that would be worth doing via DPoS
|
|
|-
| [https://www.eia.gov/ https://www.eia.gov/]
| Energy Information Administration
| Seems to have some scripty things
|
|
|-
| [https://chemview.epa.gov/chemview/ https://chemview.epa.gov/chemview/]
| EPA Chemview
| looks scripty and complicated, but they do have a tutorial
|
|
|-
| [https://campd.epa.gov/data/bulk-data-files https://campd.epa.gov/data/bulk-data-files]<br />[https://watersgeo.epa.gov/cwa/CWA-JDs/ https://watersgeo.epa.gov/cwa/CWA-JDs/]
| EPA Clean Air Markets Program Data
| looks scripty
|
|
|-
| [https://ejscreen.epa.gov/ https://ejscreen.epa.gov/]
|
| is arcgis
|
|
|-
| [https://www.facadatabase.gov/FACA/s/FACADatasets https://www.facadatabase.gov/FACA/s/FACADatasets]
|
| is salesforce
|
|
|-
| [https://liheappm.acf.hhs.gov/datawarehouse https://liheappm.acf.hhs.gov/datawarehouse]
|
| looks scripty but probably won't be too hard
|
|
|-
| [https://ecos.fws.gov/ecdms4/ https://ecos.fws.gov/ecdms4/]
|
| is arcgis and there's probably more on that domain
|
|
|-
| [https://www.lcacommons.gov/lca-collaboration/ https://www.lcacommons.gov/lca-collaboration/]
|
| is scripty and big looking
|
|
|-
| [https://usgovernmentmanual.gov/ https://usgovernmentmanual.gov/]
|
| looks like a really helpful resource for finding stuff
|
|
|-
| [https://adams-search.nrc.gov/ https://adams-search.nrc.gov/]
|
| is big - searching "nuclear" gives 3252079 results e.g. https://www.nrc.gov/docs/ML0726/ML072630079.pdf coverage seems to be decent but from lots of random projects. Of the few I sampled only https://www.nrc.gov/docs/ML2008/ML20083B799.pdf wasn't saved (added 2024-05-15 but created 1991-09-20)
|
|
|-
| [https://liheappm.acf.hhs.gov/api/search/years https://liheappm.acf.hhs.gov/api/search/years]
|
| etc requires a token from POST on https://liheappm.acf.hhs.gov/token.php. Will try to figure out if it's just those 3 or if there's more to it.
|
|
|-
| [https://ffrms.climate.gov/ https://ffrms.climate.gov/]<br />[https://floodstandard.climate.gov/ https://floodstandard.climate.gov/]
|
| is arcgis
|
|
|-
| [https://data.fs.usda.gov/geodata/edw/datasets.php https://data.fs.usda.gov/geodata/edw/datasets.php]
|
| arcgis/geodb zip archives
|
|
|-
| [https://remdb.nrel.gov/ https://remdb.nrel.gov/]
|
| is scripty; I'm doing a basic pass over it but it won't get everything (they do have a data download though)
|
|
|-
| [https://maps.nrel.gov/ https://maps.nrel.gov/] [https://climate.nrel.gov/ https://climate.nrel.gov/]
|
| uses arcgis (?)
|
|
|-
| [https://www.fs.usda.gov/nrs/atlas/bird/ https://www.fs.usda.gov/nrs/atlas/bird/]
| Climate Change Bird Atlas
| Looks script-y
|
|
|}
 
=== Other content that may be at risk based on subject matter ===
This list was based on a fast manual scroll through the list of .gov domains. It contains some domains that are already dead (and likely have been for a long time) and might contain duplicates with other lists on this page.
 
<nowiki>
https://www.headstart.gov/
https://www.section508.gov/
https://www.ada.gov/
https://agingstats.gov/
https://blackhistorymonth.gov/
https://www.hiv.gov/
https://www.benefeds.gov/
https://aviationweather.gov/
https://birthcontrol.gov/
https://www.childwelfare.gov/
https://childcare.gov/
https://www.childstats.gov/
https://www.coldcaserecords.gov/
https://www.conservation.gov/
https://coralreef.gov/
https://www.employeeexpress.gov/
https://www.evergladesrestoration.gov/
https://familyplanning.gov
https://findtreatment.gov/
https://www.fatherhood.gov/
https://www.samhsa.gov/
https://foreignassistance.gov/
https://girlshealth.gov/
https://forestsandrangelands.gov/
https://greengov.gov/
https://hispanicheritagemonth.gov/
https://www.jewishheritagemonth.gov/
https://www.macpac.gov/
https://migrantworker.gov (-> https://www.dol.gov/general/migrantworker )
https://mitigationcommission.gov/
https://www.ncd.gov/
https://www.nbrc.gov
https://nativeamericanheritagemonth.gov/
https://reproductivehealthservices.gov/
https://www.sustainability.gov/
https://www.usaid.gov/
https://www.vaccines.gov/en/
https://womenshealth.gov/
https://www.workwithusaid.gov/
https://womenshistorymonth.gov/
https://www.dvidshub.net/ - lgbtq content already removed
</nowiki>


== See Also ==
== See Also ==
Line 279: Line 27:
* [[Radio Free Asia]]
* [[Radio Free Asia]]
* [[Radio Free Europe]]
* [[Radio Free Europe]]
== References ==
<references/>

Revision as of 21:28, 4 April 2025

US Government websites are at the risk of going offline or seeing drastic changes in content under the Trump administration.

US Government/War room contains some information being tracked about the archiving status of various government sites, but is not a comprehensive list of everything that has been archived.

See Also