Nightmare Projects

From Archiveteam
Jump to navigation Jump to search

A nightmare project is defined as an Archive Team project that puts an outsize load on one of the resources of Archive Team's infrastructure.

Some examples of resources that can be overrun or overloaded:

  • Network bandwidth, either to the Internet Archive or Volunteers
  • Internet Archive Disk Storage
  • Massive amounts of small files
  • Individual processes per acquisition item taking huge amounts of time

The rules of thumb will shift, but currently, here are some examples of nightmare limits:

  • Projects that will take up more that 5 terabytes of Internet Archive space.
  • Projects that will take more than 30 days to complete.
  • Projects in which individual accounts (per person) are greater than 250mb.

While the Archive Team, in theory, does not care about the perceived "worth" of user data, it does have to realistically care about recreating the infrastructure of a massive company that had millions in funding and which then pulled it away.

At the very least, a discussion needs to happen between various members before moving forward. In some cases, it might make sense to take a representative sample (the first year, the most-viewed, the most-linked) instead of the full collection. (An example of this was the Justin.tv project, which had 1.1 petabytes of video data, but only 9 terabytes were grabbed, as they represented 10 or more views).

While many projects can be overtaken by ArchiveBot (although abuse of Archivebot is a separate problem), and the "standard" projects used by the Warrior or related to Archive Team projects, go well, a nightmare project needs to be prepared for and treated with the awareness of the effect it will have.