FOIAonline

From Archiveteam
Revision as of 22:15, 30 September 2023 by JustAnotherArchivist (talk | contribs) (It's dead, Jim.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
FOIAonline
URL https://foiaonline.gov/
Status Offline
Archiving status Saved!
Archiving type other
IRC channel #archiveteam-bs (on hackint)
Project lead User:JustAnotherArchivist

FOIAonline was a centralised platform for Freedom Of Information Act (FOIA) requests to several federal agencies in the United States. It was decommissioned on 2023-09-30.[1]

Site structure and quirks

  • There is no index of public entries. Iterating over receival dates in the advanced search[IAWcite.todayMemWeb] works but misses some requests. The tested affected requests do not even show up when specifically searched for.
  • The search only returns 10000 results for a particular query.
  • Each request has an associated 'type' with three possible values (Request, Appeal, and Referral). It must be present in the URL, although it's value doesn't matter beyond some headings.
  • Actual data retrieval happens through an internal/undocumented API by JavaScript. The code is inefficient and retrieves information multiple times with slight differences in the URL (the type being capitalised as above or lowercased for no obvious reason). Cookies, a CSRF token, and referrer checks are in place.
  • Request IDs seem to consist of the relevant agency, sometimes some extra identifiers (e.g. a year), and a sequential six-digit number. There are large gaps in the latter as returned by the receival date search, but at least the vast majority of those 'missing' requests appear to be inaccessible (403 on the internal API request).
  • Both the search and each request's file lists (attachments and records) use POST requests, i.e. they can't work correctly in the Wayback Machine.
  • API weirdness includes:
    • Attachment endpoint has pagination but it isn't used.
    • File sizes are returned in MiB as either a float or a string, depending on the endpoint.
    • Attachments are only listed when a flag on the request info is set, but that flat doesn't actually mean there are attachments.
    • The search for receival dates returns some duplicates – sometimes even on different dates!
    • On requests with a large number of records, the API returns recordsTotal: 99999 until you get near the end of the pagination.
    • The file list pages for some of the largest requests are very slow to retrieve, especially on the higher page numbers.
  • The site employs an aggressive Web Application Firewall that blocks the downloads of some files whose names contain characters like a semicolon. This can be bypassed by replacing the filename; the file is identified by a UUID, and the filename appears to only be used for the Content-Disposition header. This bypass can't be applied manually in a browser's URL bar since the Referer header must be present and correct.

References