INTERNETARCHIVE.BAK/iabak-sharp implementation

From Archiveteam
Jump to navigation Jump to search

iabak-sharp is an experimental implementation of INTERNETARCHIVE.BAK.

It is available for Windows and Linux, and is a command line tool that can be left running in the background.

A central server takes care of coordinating the Internet Archive items that each client should back up. Each item can be given a priority score (currently, these priorities are assigned based on size and "uniqueness"[1] of the item type).

iabak-sharp running on Windows 10.

Currently implemented features

  • User registration (optional)
  • Retrieval of items from IA
  • Hash consistency checks
  • Disk space checks
  • Coordination server and job assignment
  • Self-update
  • Download resume (file granularity)
  • Run on startup (Windows only)

GitHub repository

More info on the github page: iabak-sharp

Comparison with git-annex implementation

  • Written in a more maintainable language (as opposed to bash)
  • No concept of shards: because we're not constrained by git repository size limits, each client only has to worry about the metadata of the files that they're actually storing on their drive. The server only stores a minimal amount of metadata (identifier, total size, and users having that item).
  • We're free to implement features that don't perfectly match the git use cases (eg. remote verification/challanges, encryption support, alternate distribution mechanisms eg. ipfs)
  • Supports Windows (in addition to Linux)
  • Single binary, no external dependencies.

Notes

  1. For example, "warc-example1.com" has higher priority than all the "warc-example2-20200623", "warc-example2-20200624", "warc-example2-20200625" etc.