Note: This is a copy of the ArchiveBox notes from PiratePad, duplicated here in case PiratePad temporarily goes down again. For now, PiratePad is still the point of truth - edit that instead of this.
Archive Team Junior Woodchuck Kit - It's sort of Unix (Quote from SketchCow)
This project aims to equip people who have less experience with Linux systems with the needed tools to aid the Archive Team (AT). The project is split into two parts, one part being the live Debian environment with usable archiving tools, the other being a repository for the Archive Team to prepare for new backup projects via packages.
Virtual Debian environment
A VirtualBox hard drive image that can be booted into a minimal, customized version of Debian Squeeze which will have a terminal environment that downloads according to the "Project Definitions" described in 2.
Packages in the vanilla Archive Box:
- archiveteam-console (see 2.1)
- X environment with xfce I think this should be LXDE, personally. Smaller, faster, lighterweight
xfce needs to be configured such that archiveteam-console runs in a terminal on boot. If possible we should avoid the user having to log in to the system before use, just power it on and download awesome shit.
From a securing-ourselves-from-stupid-users perspective it would make sense to chmod the files in a way that would make them hard to delete. Then, after rsyncing it to the server we could give them a token that would allow deletion of the files. But is that too 'evil'?
Debian ArchiveTeam repository
The Archive Team repository provides Debian packages with the needed scripts (and packages that are needed to use specific scripts for downloads, for example Perl or Python) to help in any archiving projects. The pros with this approach are that we can easily push updates to AT-provided software, and that we can define package dependencies (if an archiving script was written in some scripting language not available in the vanilla install).
The server is hosted by JC (jch) and is located at 188.8.131.52 or archiveteam.hackerspaces.dk. We probably need a domain name under .archiveteam.org before going live.
Project definitions are named under the following hierachy: archiveteam-FOO (foo being the project name, for example flickrfckr). "archiveteam-" packages will also add a module to archiveteam-console (see 2.1) making it easy to navigate and participate in the project without any knowledge of Linux terminal scripting.
archiveteam-console is the main application in any Archive Box install. It's the springboard to any AT-related task, as it allows:
- Running archiving scripts in the background (behind the scenes we'd like to use screen perhaps)
- Checking the status of current AT efforts. IDEA: our dumping scripts should answer SIGUSR1 with relevant status info to be shown in archiveteam-console.
- Pushing the downloaded data to a remote server with rsync.
- Requesting remote support from IRC users by means of a reverse shell, granting the helper shell access to the machine.
As stated earlier archiveteam-console will have a directory structure for adding modules into it. This means that "archvieteam-" packages will actively alter the menu structure in archiveteam-console in order to reflect the new module having been installed.
- We need some way to dynamically and reliably resize the file system inside the virtual machine. VirtualBox and other VM hosts can allow this to be done, however, the Archive Box should be able to do this without any user intervention (out-of-the-box). (NOTE from BlueMaxima: Tested importing and exporting an appliance and it seemed that the dynamically expanding hard disk was preserved between import and export.)
- Should we use a VirtualBox disk image or a VirtualBox appliance? It seems that Appliance is the best way to go.
- Is it possible to run into problems like proxies or firewalls? If so, there should be a way to configure the Archive Box to work around these or allow configuration to bypass them.
- Would it make sense to devise a protocol for archiving scripts to auto-report status back to. It would give us much better overview over how far our archiving effort has come. This would make sense. The question is, how would someone review this data and control the archiving.
- Would we give the user a root account or a normal user account, to prevent any problems that may pop up (even though if the archiveteam-console package is designed right this issue could be avoided)
Who's working on what
- jch: Writes everything, plans everything, etc.. Also, sets up the repository server.
- Auguste: Prepares the actual distribution, currently working on the first version of it.
- SketchCow: Watching us, making sure we know this is a bad idea. :P
- BlueMax(ima): Corrects language, unofficial beta test, unofficial groupie
- Underscor: Able to work on the archiveteam-console app, if people want him to. Python, bash, and ncurses junkie ;D