Places to store data

From Archiveteam
Revision as of 06:39, 5 November 2017 by JesseW (talk | contribs) (more)
Jump to navigation Jump to search

This is a list/essay about places to store data, of varying permanence, cost, scale, etc -- mainly for use of ArchiveTeam material, but could be for any.

The first and most obvious choice is IA, the Internet Archive. They accept pretty much anything, of any scale, intend to keep all of it forever, and will distribute a very large range of material, particularly if no-one grumbles at them about it. And aside from direct uploading, for a lot of the other places listed below, you can save a copy of the other page containing the data into the Wayback Machine using the Save Page feature.

But it's good to put stuff in other places, too -- duplication keeps stuff safe, and something can't be erased from history if all the copies can't be found.

The first thing to think about when looking for a place to store data is: How much data is it? There are lots more places to dump a few kilobytes than are willing and able to accept a petabyte that needs a loving home.

Some good size categories (in order of increasing size) include: a few hashes; a few pages of text; a short video clip; a CD-ROM (half a gigabyte); a few dozen gigabytes; a terabyte; a petabyte.

Another important consideration is: human-readable text or not? While any byte sequence can be converted into human-readable (if boring) text, this generally expands it considerably, and tends to make people suspeious. So it's good to know what type of material a given place expects.

Below, grouped by size, are various suggested places to store data. Generally, all the bigger places can also be used for the smaller ones. Additional suggestions and commentary welcomed!

A few hashes

This size category refers to short strings; things like the toplevel hash of some larger pile of data, or a password or other secret, or a significant phrase.

  • Things this short can be stored into arbitrary website's log files, by simply appending them to the domain name; this will generate an error report containing the string. This doesn't provide any distribution, however.
  • Many sites usernames are flexible and long enough to be used as a distribution mechanism for such short strings. And while bigger sites have this happen offen enough that they have ways to hide usernames they don't want to distribute, many other sites don't -- and all of them have to notice before they will take any action.
  • Short strings can be embedded in various crypto-currency blockchains, although this generally isn't free.
  • Various URL shortening sites allow for custom short codes; if the string is short enough (or the URL site is flexible enough), you could likely put it there, although that doesn't help much for distribution. The string could also be put in the destination URL.

A few pages of text

This size category refers to data in the kilobyte range; individual essays in plain text, single images, a large bunch of hashes, etc.

  • The obvious option here are pastebin sites, of which there are many. Make sure to select the "keep forever" option, and know that they probably won't, anyway. But they are (mostly) anonymous, free, and quick. And you can duplicate the data into the Wayback Machine with the Save Page feature, in most cases.
  • Image hosting sites are another option, although lacking a business model, they are generally not very good for long-term storage -- but converting text into an image (or vice-versa) is a useful way to create a harder-to-find version.
  • Many wikis can be (mis-)used to host arbitrary content, esspecially if you insert it into a existing page, and (ideally from a different account) promptly revert the page back to its previous content.
  • All of the ideas documented at the famed DeCSS Gallery (someone add a link, please) are options, although generally labor-intensive.

A short video clip

This size category refers to data in the tens to hundreds of megabyte range; video clips (not full movies), photo albums, the text content of entire blogs or small forums, etc.