CVE References

From Archiveteam
Jump to navigation Jump to search

The CVE Project has been worried about linkrot in CVE references for quite a while, and now it's time for action! This project started in Tod's junk drawer but it's here, now.

Reference Links Snapshot

These are all the CVE reference URLs as of 2023-05-17.

Some are useful. Some are not. The ratio is unknown.

Problem Statement

Archiving CVE references, ideally, will preserve vulnerability data for future security researchers, both professional and hobbyist, for effectively all time. We're currently in a phase of pretty serious enshittification on the internet right now, so having more than one copy of sometimes key vuln data seems important. Often, CVE ID descriptions are quite short and pithy, with rich vulnerability details stored only in off-site references. These details include:

  • Patch availability
  • Patch bypasses
  • Related CVE IDs
  • Precise, line-by-line analyses of vulnerable functions
  • Exploitation details, tips, gotchas
  • General, related bug-hunting guidance
  • Media coverage of affected software

Note that not all vulnerability details are particularly worthwhile; some advisories are just as opaque as CVE descriptions and offer nothing new for the intrepid researcher. Also, exploit details can be particularly touchy -- some people have trouble understanding that full transparency of software vulnerabilities is a net good for civilization. See Tod's blog, The Hidden Harm of Silent Patches for more detail on why this is.

Goals

  • Archive all CVE references from all time, including the past
    • An ArchiveBot job was ran on the URL mentioned above
  • Archive "important" CVE references
    • KEV seems important
  • Archive new CVE references as they come in
    • Via the URLs project
  • Have some mapping between original and archived references
    • It should be easily usable by end users, per CVE ID
    • The archive reference should be canonical since it was "true" when referenced.
    • The archive should be generated on the spot when CVE IDs are published
  • Archives should be distributed
    • archive.org seems trustworthy, but it's another egg basket.
    • Torrents seem to be a good format for enabling distribution
    • Torrents are hard to deeplink into
  • Archives should be easily hot-swappable when a particular archive goes sour.
    • If archive.org ever does go down, it should be easy to rebuild and redirect
  • Maintainers of this whole archive scheme should be easily replaceable

Issues

Lots of URLs no longer exist. Some domains in the AB job failed even though they shouldn't have. So once the AB job uploads are done, analysis of the meta WARCs will needed to be done.

Twitter

There is currently no feasible way to archive Twitter threads. At best we can (via socialbot's snscrape support) 3200 recent tweets from a Twitter user.

archives.neohapsis.com

This is a dead domain, so it was ignored. Much of the data has moved to seclists.org though, so maybe the CVE folks need to update the corresponding URLs.

marc.info

This is an alive domain that failed in AB, possibly the bot got banned.