Talk:INTERNETARCHIVE.BAK
A note on the end-user drives
I feel it is really critical that the drives or directories sitting in the end-user's location be absolutely readable, as a file directory, containing the files. Even if that directory is inside a .tar or .zip or .gz file. Making it into a encrypted item should not happen, unless we make a VERY SPECIFIC, and redundant channel of such a thing. --Jscott 00:01, 2 March 2015 (EST)
Potential solutions to the storage problem
- Tahoe-LAFS - decentralized (mostly), client-side encrypted file storage grid
- Requires central introducer and possibly gateway nodes
- Any storage node could perform a Sybil attack until a feature for client-side storage node choice is added to Tahoe.
- git-annex - allows tracking copies of files in git without them being stored in a repository
- Also provides a way to know what sources exist for a given item. git-annex is not (AFAIK) locked to any specific storage medium. -- yipdw
Other anticipated problems
- Users tampering with data - how do we know data a user stored has not been modified since it was pulled from IA?
- Proposed solution: have multiple people make their own collection of checksums of IA files. --Mhazinsk 00:10, 2 March 2015 (EST)
- "Dark" items (e.g. the "Internet Records" collection)
- There are classifications of items within the Archive that should be considered for later waves, and not this initial effort. That includes dark items, television, and others.
- It seems like this would include a lot of what we would want to back up the most though, e.g. a substantial percentage of the books scanned are post-1923 and not public
- There are classifications of items within the Archive that should be considered for later waves, and not this initial effort. That includes dark items, television, and others.
- Data which may be illegal in certain countries/jurisdictions and expose volunteers to legal risk (terrorist propaganda, pornography, etc.)