Difference between revisions of "Talk:INTERNETARCHIVE.BAK"
Jump to navigation
Jump to search
Line 9: | Line 9: | ||
* [http://git-annex.branchable.com/ git-annex] - allows tracking copies of files in git without them being stored in a repository | * [http://git-annex.branchable.com/ git-annex] - allows tracking copies of files in git without them being stored in a repository | ||
** Also provides a way to know what sources exist for a given item. git-annex is not (AFAIK) locked to any specific storage medium. -- yipdw | ** Also provides a way to know what sources exist for a given item. git-annex is not (AFAIK) locked to any specific storage medium. -- yipdw | ||
'''Right now, git-annex seems to be in the lead. Besides being flexible about the sources of the material in question, the developer is a member of Archive Team AND has been addressing all the big-picture problems for over a year.''' | |||
==Other anticipated problems== | ==Other anticipated problems== |
Revision as of 08:02, 2 March 2015
A note on the end-user drives
I feel it is really critical that the drives or directories sitting in the end-user's location be absolutely readable, as a file directory, containing the files. Even if that directory is inside a .tar or .zip or .gz file. Making it into a encrypted item should not happen, unless we make a VERY SPECIFIC, and redundant channel of such a thing. --Jscott 00:01, 2 March 2015 (EST)
Potential solutions to the storage problem
- Tahoe-LAFS - decentralized (mostly), client-side encrypted file storage grid
- Requires central introducer and possibly gateway nodes
- Any storage node could perform a Sybil attack until a feature for client-side storage node choice is added to Tahoe.
- git-annex - allows tracking copies of files in git without them being stored in a repository
- Also provides a way to know what sources exist for a given item. git-annex is not (AFAIK) locked to any specific storage medium. -- yipdw
Right now, git-annex seems to be in the lead. Besides being flexible about the sources of the material in question, the developer is a member of Archive Team AND has been addressing all the big-picture problems for over a year.
Other anticipated problems
- Users tampering with data - how do we know data a user stored has not been modified since it was pulled from IA?
- Proposed solution: have multiple people make their own collection of checksums of IA files. --Mhazinsk 00:10, 2 March 2015 (EST)
- All IA items already include checksums in the _files.xml. So there could be an effort to back up these xml files in more locations than the data itself (should be feasible since they are individually quite small).
- "Dark" items (e.g. the "Internet Records" collection)
- There are classifications of items within the Archive that should be considered for later waves, and not this initial effort. That includes dark items, television, and others.
- It seems like this would include a lot of what we would want to back up the most though, e.g. a substantial percentage of the books scanned are post-1923 and not public
- There are classifications of items within the Archive that should be considered for later waves, and not this initial effort. That includes dark items, television, and others.
- Data which may be illegal in certain countries/jurisdictions and expose volunteers to legal risk (terrorist propaganda, pornography, etc.)
- Interesting! Several solutions come to mind. --Jscott 02:35, 2 March 2015 (EST)