ArchiveBox

From Archiveteam
Jump to navigation Jump to search
Archiveteam1.png Historical content

This page or section is not really edited any more, probably because the project got abandoned, information is collected somewhere else in a different form etc.

However, this is a good and important record of ArchiveTeam's ancient times, thus must be preserved, but merging it into an other article would be difficult and/or some pieces of information are missing for a new form.

So feel free to read this, but it has probably nothing to be added now. However, if you resurrect the project or find a way to move this data to a fresh place, you can remove this template.

Note: This is a copy of the ArchiveBox notes from PiratePad, duplicated here in case PiratePad temporarily goes down again. For now, PiratePad is still the point of truth - edit that instead of this.

Archive Team Junior Woodchuck Kit - It's sort of Unix (Quote from SketchCow)

This project aims to equip people who have less experience with Linux systems with the needed tools to aid the Archive Team (AT). The project is split into two parts, one part being the live Debian environment with usable archiving tools, the other being a repository for the Archive Team to prepare for new backup projects via packages.

Virtual Debian environment

A VirtualBox hard drive image that can be booted into a minimal, customized version of Debian Squeeze which will have a terminal environment that downloads according to the "Project Definitions" described in 2.

Packages in the vanilla Archive Box:

  • wget
  • curl
  • rsync
  • archiveteam-console (see 2.1)
  • X environment with xfce I think this should be LXDE, personally. Smaller, faster, lighterweight
  • perl

xfce needs to be configured such that archiveteam-console runs in a terminal on boot. If possible we should avoid the user having to log in to the system before use, just power it on and download awesome shit.

From a securing-ourselves-from-stupid-users perspective it would make sense to chmod the files in a way that would make them hard to delete. Then, after rsyncing it to the server we could give them a token that would allow deletion of the files. But is that too 'evil'?

Debian ArchiveTeam repository

The Archive Team repository provides Debian packages with the needed scripts (and packages that are needed to use specific scripts for downloads, for example Perl or Python) to help in any archiving projects. The pros with this approach are that we can easily push updates to AT-provided software, and that we can define package dependencies (if an archiving script was written in some scripting language not available in the vanilla install).

The server is hosted by JC (jch) and is located at 130.225.236.19 or archiveteam.hackerspaces.dk. We probably need a domain name under .archiveteam.org before going live.

Project definitions are named under the following hierarchy: archiveteam-FOO (foo being the project name, for example flickrfckr). "archiveteam-" packages will also add a module to archiveteam-console (see 2.1) making it easy to navigate and participate in the project without any knowledge of Linux terminal scripting.

archiveteam-console

archiveteam-console is the main application in any Archive Box install. It's the springboard to any AT-related task, as it allows:

  • Running archiving scripts in the background (behind the scenes we'd like to use screen perhaps)
  • Checking the status of current AT efforts. IDEA: our dumping scripts should answer SIGUSR1 with relevant status info to be shown in archiveteam-console.
  • Pushing the downloaded data to a remote server with rsync.
  • Requesting remote support from IRC users by means of a reverse shell, granting the helper shell access to the machine.

As stated earlier archiveteam-console will have a directory structure for adding modules into it. This means that "archvieteam-" packages will actively alter the menu structure in archiveteam-console in order to reflect the new module having been installed.

Open problems

  • We need some way to dynamically and reliably resize the file system inside the virtual machine. VirtualBox and other VM hosts can allow this to be done, however, the Archive Box should be able to do this without any user intervention (out-of-the-box). (NOTE from BlueMaxima: Tested importing and exporting an appliance and it seemed that the dynamically expanding hard disk was preserved between import and export.)
  • Should we use a VirtualBox disk image or a VirtualBox appliance? It seems that Appliance is the best way to go.
  • Is it possible to run into problems like proxies or firewalls? If so, there should be a way to configure the Archive Box to work around these or allow configuration to bypass them.
  • Would it make sense to devise a protocol for archiving scripts to auto-report status back to. It would give us much better overview over how far our archiving effort has come. This would make sense. The question is, how would someone review this data and control the archiving.
  • Would we give the user a root account or a normal user account, to prevent any problems that may pop up (even though if the archiveteam-console package is designed right this issue could be avoided)

Who's working on what

  • jch: Writes everything, plans everything, etc.. Also, sets up the repository server.
  • Auguste: Prepares the actual distribution, currently working on the first version of it.
  • SketchCow: Watching us, making sure we know this is a bad idea. :P
  • BlueMax(ima): Corrects language, unofficial beta test, unofficial groupie
  • Underscor: Able to work on the archiveteam-console app, if people want him to. Python, bash, and ncurses junkie ;D