ArchiveTeam Warrior
The ArchiveTeam Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!
The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the Tracker.
The warrior runs on Windows, OS X and Linux. You’ll need VirtualBox (recommended), VMware or a similar program to run the virtual machine.
Instructions for VirtualBox:
- Download the appliance (174MB).
- In VirtualBox, click File > Import Appliance and open the file.
- Start the virtual machine. It will fetch the latest updates and will eventually tell you to start your web browser.
Once you’ve started your warrior:
- Go to http://localhost:8001/ and check the Settings page.
- Choose a username — we’ll show your progress on the leaderboard.
- Go to the All projects tab and pick a project to work on. Even better: select ArchiveTeam’s Choice to let your warrior work on the most urgent project.
Warrior FAQ
Help! The warrior is eating all my bandwidth!
You can limit the warriors bandwidth quite easily for virtualbox as long as you are running a relatively recent version.
VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3
will limit the warrior instance called archiveteam-warrior-2 (The default name of the warrior vm currently) to 3Mb/s. Adjust as needed.
In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:
VBoxManage bandwidthctl archiveteam-warrior-2 add netlimit --type network --limit 3
I turned my warrior off, will those tasks be lost?
If you've killed your warrior instances then the work your warrior did has been lost, however the tasks will be returned to the pool after a period of time. If you want you can alert the admins via IRC of whats happened, and they can clear the claims your username may of made however this isn't very important on most projects.
I need to disconnect my internet / reboot my PC but I don't want to lose work
If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.
I told the warrior to shutdown from the interface but nothing has changed! what gives?
The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away; go ahead, your progress will be lost however the jobs will eventually cycle out to another user.
Projects
Previous and current warrior projects:
Project | Status | Began | Finished | Result | Archive Location |
---|---|---|---|---|---|
MobileMe | Archive Posted | April 3, 2012 | Aug 8, 2012 | Success | |
FortuneCity | Archive Posted | April 4, 2012 | April 11, 2012 | Partial Success | archive user lookup |
Tabblo | Archive Posted | May 23, 2012 | May 26, 2012 | Success | archive user lookup |
Picplz | Archive Posted | June 3, 2012 | June 15, 2012 | archive index user lookup | |
Tumblr (test project) | Archive Posted | August 9, 2012 | August 19, 2012 | archive (tar) archive (warc) | |
Cinch.FM | Archive Posted | August 20, 2012 | August 22, 2012 | Success | archive |
City of Heroes | Archive Posted | September 3, 2012 | December 1, 2012 | Success | www forums 1 2 3 4 5 |
Webshots | Archive Posted | October 4, 2012 | November 18, 2012 | index | |
BT Internet | Archive Posted | October 10, 2012 | November 2, 2012 | Success | archive |
Daily Booth | Archive Posted | November 19, 2012 | December 29, 2012 | archive lookup | |
Github | Archive Posted | December 13, 2012 | December 17, 2012 | Success | archive index |
Yahoo Blogs (Vietnamese) | Archive Posted | January 8, 2013 | January 19, 2013 | archive | |
weblog.nl | Archive Posted | January 19, 2013 | February 2, 2013 | archive lookup | |
URLTeam | Active | latest | |||
Punchfork | Archive Posted | January 11, 2013 | March 6, 2013 | archive user lookup | |
Xanga | Downloads Paused | January 22, 2013 | February 16, 2013 | archive user lookup user list | |
Posterous | Downloads Finished | February 23, 2013 | June 29, 2013 | archive | |
Storylane | Downloads Finished | March 8, 2013 | March 15, 2013 | ||
Yahoo! Messages | Downloads Finished | March 20, 2013 | March 31, 2013 | archive | |
Formspring | Downloads Finished | March 24, 2013 | September 19, 2013 | Success | archive |
Yahoo Upcoming | Archive Posted | April 20, 2013 | April 25, 2013 | archive | |
Streetfiles.org | Downloads Finished | April 28, 2013 | April 30, 2013 | Partial | archive |
Xanga | Downloads Paused | June 21, 2013 | August 31, 2013 | archive |
Status
- In Development
- a future project
- Active
- start up a Warrior and join the fun; this one is in progress right now
- Downloads Finished
- we've finished downloading the data
- Archived
- the collected data has been properly archived
- Archive Posted
- the archive is available for download
Result
- Success
- downloaded all of the data and posted the archive publicly
- Qualified Success
- either we couldn't get all of the data, or the archive can't be made public
- Failure
- the site closed before we could download anything
Testing pre-production code
(Don't do this unless you really need or want to.) If you are developing a warrior script, you can test it by switching your warrior from the production
branch to the master
branch.
- Start the warrior.
- Press Alt+F2 and log in with username
root
and passwordarchiveteam
. cd /home/warrior/warrior-code
sudo -u warrior git checkout master
reboot
By the same route you can return your warrior to the production
branch.