Difference between revisions of "ArchiveTeam Warrior"

From Archiveteam
Jump to navigation Jump to search
(→‎Projects: update verizon, ancestry, quizilla, panoramio)
(updated Warrior message to include reference to AT Docker guide.)
(134 intermediate revisions by 27 users not shown)
Line 1: Line 1:
{{notice|1=The Warrior virtual machine appliances are ''currently unable to run wget-at''. You may get a blank page trying to run projects requiring it, even if you select the project in the Warrior user interface. At present, only the Terror of Tiny Town/[[URLTeam]] project still works in the Warrior VM. This is a known issue. A replacement will be made but this will take a while.
In the meanwhile, in place of the Warrior VM appliance, you may manually run projects using Docker. (If you like, you can run Docker in a VM of your own choosing. Recent Ubuntu versions are known to work.) For further info, see our guide to [[Running_Archive_Team_Projects_with_Docker]] and also see the project's Readme instructions in the [https://github.com/ArchiveTeam/ ArchiveTeam GitHub repositories]. If you have any issues or feedback, see the [[Archiveteam:IRC|AT #warrior IRC channel on hackint]].}}
==What is the Archive Team Warrior?==
==What is the Archive Team Warrior?==


Line 4: Line 8:
[[Image:Warrior-vm-screenshot.png||256px|right]]
[[Image:Warrior-vm-screenshot.png||256px|right]]
[[Image:Warrior-web-screenshot.png|256px|right]]
[[Image:Warrior-web-screenshot.png|256px|right]]
[[File:Archiveteam_warrior_infrastructure.png|thumb|right|256px|[[Dev/Infrastructure|Warrior infastructure]]]]


The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!
The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!
Line 11: Line 16:
== Basic usage ==
== Basic usage ==


The warrior runs on Windows, OS X and Linux using a virtual machine.  You'll need one of:
The warrior runs on Windows, macOS, and Linux using a virtual machine.  You'll need:


* [https://www.virtualbox.org/ VirtualBox] (recommended)
* [https://warriorhq.archiveteam.org/downloads/warrior3/ Warrior Appliance] (300MB) - Download mirrors: [https://www.syping.de/archiveteam/ #1 (de)] [https://archive.org/details/archiveteam-warrior-v3-20171013 #2 (IA)]
Plus one of the virtualization applications below to run it:
* [https://hub.docker.com/r/archiveteam/warrior-dockerfile/ Docker Image] (recommended for anyone who has used docker in the past)
* [https://www.virtualbox.org/ VirtualBox] (recommended to [[#Beta version|use the beta version of the warrior]])  
* [https://www.vmware.com/products/player/ VMware workstation/player] (free-gratis for personal use)
* [https://www.vmware.com/products/player/ VMware workstation/player] (free-gratis for personal use)
* [[#Alternative virtual machines|See below for alternative virtual machines]]
* [[#Alternative virtual machines|See below for alternative virtual machines]]
Line 19: Line 27:
=== Quick start instructions for VirtualBox ===
=== Quick start instructions for VirtualBox ===


# Download the [http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova appliance] (174MB).
# Download the appliance from the link above.
# Launch VirtualBox
# Launch VirtualBox
# In VirtualBox, click File > Import Appliance and open the file.
# In VirtualBox, click File > Import Appliance and open the file.
Line 25: Line 33:
#* It will fetch the latest updates and will eventually tell you to start your web browser.
#* It will fetch the latest updates and will eventually tell you to start your web browser.
# Using your regular web browser, visit http://localhost:8001/
# Using your regular web browser, visit http://localhost:8001/
# On the left, click "Your settings".
# Choose a username - we'll show your progress on the [[tracker|leaderboard]].
# On the left, click "Available projects" tab and pick a project to work on.
#* Even better: select "ArchiveTeam's Choice" to let your warrior work on the most urgent project.
=== Start instructions for VMWare Player ===
# Download the appliance from the link above
# Launch VMWare Player
# In Player on the right, click "Open Virtual Machine", open the file and import the virtual machine.
# (Optional) Select the virtual machine and click "Edit virtual machine settings".
#* Select Network Adapter and set it to "Bridged: Connected directly to the physical network"
# Start the virtual machine.
#* It will fetch the latest updates and will eventually tell you to start your web browser.
# Using your regular web browser, visit the address that is shown on the bottom (e.g. http://192.168.0.100:8001/)
# On the left, click "Your settings".
# On the left, click "Your settings".
# Choose a username - we'll show your progress on the [[tracker|leaderboard]].
# Choose a username - we'll show your progress on the [[tracker|leaderboard]].
Line 32: Line 55:


__TOC__
__TOC__
== Beta version ==
Help us test the new version of the appliance available at https://warriorhq.archiveteam.org/downloads/warrior3/. Report any issues or feedback on [[IRC|#warrior on hackint]].
This version includes updated internal components that allow us to run newer and improved software. Specifically, it runs the warrior code within a Docker container.


== Alternative virtual machines ==
== Alternative virtual machines ==
Line 38: Line 67:


* [https://www.docker.io/ Docker] (Linux)
* [https://www.docker.io/ Docker] (Linux)
** ([https://github.com/ArchiveTeam/warrior-dockerfile modified dockerfile])
** [https://hub.docker.com/r/archiveteam/warrior-dockerfile/ official dockerfile] ([https://github.com/ArchiveTeam/warrior-dockerfile repository])
** [https://hub.docker.com/r/infrequent/at-as-dockerfile modified dockerfile - for manual script execution]


* [https://www.microsoft.com/en-us/server-cloud/solutions/virtualization.aspx Hyper-V] (Windows 8 Professional)
* [https://www.microsoft.com/en-us/server-cloud/solutions/virtualization.aspx Hyper-V] (Windows 8 Professional)
Line 45: Line 75:
Please note that these alternatives are not in widespread use by our warriors, so we may not be able to help with either issues or advanced usage.
Please note that these alternatives are not in widespread use by our warriors, so we may not be able to help with either issues or advanced usage.


==Warrior FAQ==
== Warrior FAQ ==
 
=== Why a virtual machine in the first place? ===
 
The virtual machine is a quick, safe, and easy way for newcomers to help us out. It offers many features:
 
* Graphical interface
* Automatically selects which project is important to run
* Self-updating software infrastructure
* Allows for unattended use
* In case of software faults, your machine is not ruined
* Restarts itself in case of runaway programs
* Runs on Windows, Mac, and Linux painlessly
* Ensures consistency in the archived data regardless of your machine's quirks
 
If you have suggestions for improving this system, please talk to us as described below.
 
=== Can I use whatever internet access for the warrior? ===
 
No. We need "clean" connections. Please ensure the following:
 
* No OpenDNS. No ISP DNS that redirects to a search page. Use non-captive DNS servers.
* No ISP connections that inject advertisements into web pages.
* No proxies. Proxies can return bad data. The original HTTP headers and IP address are needed for the WARC file.
* No content-filtering firewalls.
* No censorship. If you believe your country implements censorship, do not run a warrior.
* No Tor. The server may return an error page instead of content if they ban exit nodes.
* No free cafe wifi. Archiving your cafe's wifi service agreement repeatedly is not helpful.
* No VPNs. Data integrity is a very high priority for the Archive Team so use of VPNs with the official crawler is discouraged.
* We prefer connections from many public IP addresses if possible. (For example, if your apartment building uses a single IP address, we don't want your apartment banned.)
 
=== I turned my warrior VM appliance off. Will those tasks be lost? ===
 
If you've killed your warrior VM instance, then the work your warrior did has been lost. However, the tasks will be returned to the pool after a period of time, and other warriors may claim them.
 
=== I closed my browser or tab with the warrior's web interface. Will those tasks be lost? ===
 
No. The web browser interface just provides a user interface to the warrior. As long as the VM or docker container is not stopped, it will continue normally.


=== Why am I seeing a message that no item was received? ===
=== I need to disconnect my internet / reboot my PC. How can I do this without losing work? ===


It means that there is no work available. This happens for several reasons:
If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.


* There project has just finished and someone is inspecting the work done. If a problem is discovered, items may be re-queued and more work is available.
If you decide to use the suspend feature, please note that if you keep it suspended for too long (more than a few hours), the administrators will assume that the item is lost and re-queue it. Using the suspend feature so that you can reboot your computer is perfectly fine.
* In a rare case, you have been banned by a tracker administrator because you were requesting too much work or your internet connection is "unclean". We prefer connections from many public IP addresses, use of non-captive DNS servers, and no proxies/firewalls.


=== Why am I seeing a message about rate limiting? ===
=== How much disk space will the warrior use? ===


Keep in mind that although downloading the internet for digital preservation and fun are the primary goals of all Archive Team activities, serious stress on the target's server may occur. The rate limit is imposed by a [[Tracker#People|tracker administrator]] and should not be subverted.
Short answer: it depends on the project. (But never more than 60GB.)


=== Why am I seeing a message about code being out of date? ===
Long answer: because each project defines items differently, sizes may vary. A single task may be a small file or a whole subsection of a website. The virtual machine is configured by default to use an absolute maximum of 60GB. Any unused virtual machine disk space is not used on the host computer. You may run the virtual machine on less than 60GB if you like to live dangerously. We're downloading the internet, after all!


The warrior will update its code every hour. If you are impatient, please restart the warrior and it will download the latest code and resume work.
=== How can I log in to the virtual machine? ===
Unless you know what you are doing, you should not need to do this.  


===Help! The warrior is eating all my bandwidth!===


You can limit the warrior's bandwidth quite easily for VirtualBox as long as you are running a relatively recent version. The option is not offered with a GUI however.
To login, start up the Warrior VM and wait for it to finish booting with the screen showing "The warrior has successfully start up". Press ALT+F3 to switch to virtual console number 3. VirtualBox users may need to press the host key, RIGHT_CONTROL to enter capture mode before pressing ALT+F3. Use ALT+Left or ALT+Right to switch between virtual consoles. There are 6 virtual consoles in total. Consoles 1 and 2 are reserved for the warrior. Switching to a new virtual console will show a login shell. You can login using the username <code>root</code> and the password <code>archiveteam</code>. Once logged in as root, you can execute <code>sudo -u warrior -i</code> to log in as the warrior user.  


The command <pre>VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3</pre> will limit the warrior instance called archiveteam-warrior-2 (the default name of the warrior vm currently) to 3Mb/s. Adjust as needed.


In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:
=== How can I run multiple virtual machines at the same time? ===
 
You'll need to adjust the networking settings.
 
In VirtualBox, select a virtual machine and open up Settings → Network → Adapter 1 → Port Forwarding. You need to adjust the host port. For example, set your table to TCP | 127.0.0.1 | 8123 | | 8001. This maps port 8123 on the host machine (your computer) to port 8001 on the virtual machine (the warrior), and you can then access the warrior's web interface from port 8123 in your browser.
 
Each VM you want to access should have a different host port. Do not use port numbers below 1024 unless you know what you are doing.
 
VMWare installations should be using bridged networking. However, if you want, you can switch to NAT (under Settings → Hardware → Virtual Network Adapter) and click Edit to set up port forwarding. On Linux, you can also use lines like <code>8123 = 192.168.0.100:8001</code> in the <code>[incomingtcp]</code> section of nat.conf. (Make sure the VM IP is correct!)
 
=== How can I run the virtual machine headlessly (without leaving a window open)? ===
 
From the VirtualBox GUI, after opening the VM, click Machine > Detach GUI. You can then close the VirtualBox Manager window.
 
For the VirtualBox CLI, use this command:
 
<pre>vboxmanage startvm archiveteam-warrior-2 --type headless</pre>
 
Shut down the VM with:
 
<pre>vboxmanage controlvm archiveteam-warrior-2 acpipowerbutton</pre>
 
Substituting <code>suspend</code> or <code>resume</code> for <code>acpipowerbutton</code> suspends or resumes the VM. For more information, consult [http://www.virtualbox.org/manual/ch08.html#vboxmanage-startvm the VirtualBox manual (Chapter 8, Sections 12 and 13)].


<pre>VBoxManage bandwidthctl archiveteam-warrior-2 add netlimit --type network --limit 3</pre>
For the VMWare CLI, use this command:


For more info, consult the [http://www.virtualbox.org/manual/ch06.html#network_bandwidth_limit VirtualBox manual (Chapter 6, Section 9)].
<pre>vmrun start <path to vmx file> nogui</pre>


===NAT sucks! I want directly-bridged networking!===
Shut down with:


Simples! (If you're running linux, that is.)
<pre>vmrun stop <path to vmx file> soft</pre>


<pre>VBoxManage modifyvm "archiveteam-warrior-2" --nic1 bridged</pre>
Substituting <code>suspend</code> for <code>stop</code> suspends the VM. Resume with <code>start</code> again. For more information, including the paths to VMX files on different operating systems, consult [http://www.vmware.com/pdf/vix180_vmrun_command.pdf Using vmrun to Control Virtual Machines] (PDF), pages 10 and 11.


<pre>VBoxManage modifyvm "archiveteam-warrior-2" --bridgeadapter1 eth0</pre>
=== How can I set up the virtual machine as a system service (so that it starts up on boot and shuts down automatically)? ===


(We presume you want to bind to <code>eth0</code>. Adjust as required. :))
If you are using VirtualBox and running a Linux distribution that uses the systemd init system (like most recent releases), you can follow the short instructions on [http://www.ericerfanian.com/automatically-starting-virtualbox-vms-on-archlinux-using-systemd/ this page]. (The page title specifies Arch Linux, but this will work for other distros as long as they run systemd.)


=== I turned my warrior off. Will those tasks be lost? ===
=== How can I set up the virtual machine with directly-bridged networking instead of NAT? ===


If you've killed your warrior instances, then the work your warrior did has been lost, however the tasks will be returned to the pool after a period of time. If you want, you can alert the admins via IRC of what's happened, and they can clear the claims your username may have made. However, this isn't very important on most projects.
On VirtualBox, use these commands:


=== I need to disconnect my internet / reboot my PC, but I don't want to lose work. ===
<pre>vboxmanage modifyvm archiveteam-warrior-2 --nic1 bridged
vboxmanage modifyvm archiveteam-warrior-2 --bridgeadapter1 eth0</pre>


If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.
We presume you want to bind to <code>eth0</code>. Adjust as required. :)


=== I told the warrior to shutdown from the interface but nothing has changed! What gives? ===
VMWare installations should already be using bridged networking.


The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away, go ahead. Your progress will be lost, however the jobs will eventually cycle out to another user.
=== How can I access the virtual machine from another device on my network? ===


=== How much disk space will the warrior use? ===
Full guide for VirtualBox users is found [https://gist.github.com/HeliosLHC/cf3264c8d65b4680474ac13bcc6d0384 here]


Short answer: it depends on the project.


Long answer: because the way each project defines an item differently, the warrior may be downloading a small file or downloading a whole subsection of a website. The virtual machine is configured by default to use 60GB as an absolute maximum. Any unused virtual machine disk space is not used on the host computer. You may, however, run the virtual machine on less than 60GB if you like to live dangerously. We're downloading the internet after all!
=== How can I run the warrior without a virtual machine? (The VM has too much overhead for a VPS!) ===


=== The secondary disk is using up space even though it's not running a project. ===
One option is running a Docker container (see [[#Alternative_virtual_machines|above]]). Docker is based on LXC, and the overhead is far less than running a full VM. If you plan on running the [https://github.com/ArchiveTeam/warrior-dockerfile warrior-dockerfile], make sure to publish the port to allow access to the web interface:


Virtual machine disk images do not behave like a regular file. There are several ways to reclaim space:
<pre>docker run -d -p 8001:8001 archiveteam/warrior-dockerfile</pre>


* Delete the second disk and put back an empty disk. The warrior should reformat the second disk.
This creates a direct port mapping. For host port 38001 to container port 8001, use <code>38001:8001</code>. Adjust as required. :P
* Delete the entire warrior application and re-import it.
* Use the [http://intgat.tigress.co.uk/rmy/uml/index.html zerofree] program and then clone the disk image. Reattach the cloned disk image.


=== I can't connect to localhost. ===
(Multiple projects can be also run in isolated environments (containers) for rapid deployment using [https://hub.docker.com/r/infrequent/at-as-dockerfile at-as-dockerfile].)


The application includes a configuration to set up port forwarding to the guest machine on port 8001 so you can access the interface through your web browser. If this does not happen, you may need to double check your machine's network settings.
Another alternative is '''running the project manually.''' If you are managing a VPS, it's likely you are comfortable with some Linux stuff. Consult the project wiki page or the source code repository readme file.


=== The warrior can't connect to the internet. ===
=== How can I run tons of warriors easily? ===


It may be possible that the virtual machine has picked up the address of the local DNS cache on your computer which the virtual machine does not have access to.
We assume you've checked with the current ArchiveTeam project what concurrency and resources are needed or useful!


If you experience this on VirtualBox, see [http://askubuntu.com/questions/204953/virtualbox-dns-stopped-working-on-upgrade-to-12-10 this question and answer].
Whether your have your own virtual cluster or you're renting someone else's (aka a "[https://fsfe.org/activities/nocloud/ cloud]"), you probably need some [[wikipedia:Category:Orchestration_software|orchestration software]].


=== I'm looking at the text scrolling by and I notice some errors. rsync is not working. ===
ArchiveTeam volunteers have successfully used a variety of hosting providers and tools (including free trials on AWS and GCE), often just by building their own flavour of virtual server and then repeating it with simple [https://cloudinit.readthedocs.io/ cloud-init] scripts (to install and launch docker as above) or whatever tool the hosting provides. If you desire full automation, the [https://gitlab.com/diggan/archiveteam-infra archiveteam-infra repository by diggan] helps with [[wikipedia:Terraform (software)|Terraform]] on [[wikipedia:DigitalOcean|DigitalOcean]].


Uh-oh! Something is not right. Notify us immediately in the appropriate [[IRC]] channel.
Some custom monitoring scripts also exist, for instance [https://github.com/general-programming/gp-archiveteam-bs/blob/master/tumblr/watcher.py watcher.py].


=== I'm looking at the leaderboard. What's that icon beside the username? ===
=== I'm looking at the leaderboard. What's that icon beside the username? ===


That's just the warrior logo: [[File:Archive_team.png|42px]] (click on the image for a larger version). It means that that person is using the warrior. Those without the icon are running the scripts manually.
That's just the warrior logo: [[File:Archive_team.png|42px]] (click on the image for a larger version). It means that that person is using the warrior. Those without the icon are running the scripts manually.
[[Image:Archiveteam-warrior-sticker.png‎|256px|right]]


=== What's that guy doing in the logo? ===
=== What's that guy doing in the logo? ===
Line 132: Line 219:
The place is on fire! But don't worry, he safely escaped with the rescued data in his arms.
The place is on fire! But don't worry, he safely escaped with the rescued data in his arms.


=== I want to log in to the virtual machine. How do I do this? ===
=== That’s awesome – can I slap this logo on my laptop to show my Internet-preservation pride? ===
 
[http://www.redbubble.com/people/ajhajh/works/12857655-archive-team-warrior-stickers?p=sticker You sure can!] The ArchiveTeam Warrior laptop sticker can start conversations about archiving, if you’re into that.
 
=== I'd like to help write code or I want to tweak the scripts to run to my liking. Where can I find more info? Where is the source code and repository? ===
 
Check out the [[Dev]] documentation for details on the infrastructure and details of the source code layout.
 
=== I still have a question! ===
 
Check out the [[Frequently Asked Questions|general FAQ page]]. Talk to us on [[IRC]]. Use [ircs://irc.hackint.org:6697/warrior #warrior] for specific warrior questions or [ircs://irc.hackint.org:6697/archiveteam #archiveteam] for general questions.
 
== Troubleshooting ==
 
=== I'm getting errors when I try to launch the VM. ===
 
If you are receiving <code>Breakpoint has been reached (0x80000003)</code>, <code>A critical error has occurred while running the virtual machine and the machine execution has been stopped.</code>, or VT-X errors, you probably do not have virtualization enabled, either because it is turned off in your computer's BIOS or your CPU does not support it.
 
You can check CPU support on Linux with <code>cat /proc/cpuinfo | grep "(vmx|svm)" | uniq</code>. If there is a line of output starting with "flags", your processor supports virtualization; if there is no output, it does not. You can check whether virtualization is enabled in the BIOS using the <code>rdmsr</code> utility in your distro's <code>msr-tools</code> package.
 
You can check support and BIOS status on Windows using [https://www.microsoft.com/en-us/download/details.aspx?id=592 Microsoft's Hardware-Assisted Virtualization Detection Tool] or [http://openlibsys.org/index-ja.html VirtualChecker].
 
To enable virtualization on a CPU with support, reboot the computer and enter the BIOS. The virtualization setting is usually under something like 'CPU configuration' or 'advanced settings'.
 
=== I just imported the ova image and the warrior is stuck on "Preparing the data partition". ===
 
This issue has cropped up before, and we do not know what causes it. We recommend you delete the warrior image and import the ova again. Testing shows that such a reimport works in the majority of cases.
 
=== I can't connect to localhost. ===
 
The application is configured to set up port forwarding to the guest machine, and you should be able to access the interface through your web browser at port 8001. If this does not happen, and isn't resolved by rebooting the warrior (using the ACPI power signals, not suspend/save state and resume), you may need to double-check your machine's network settings (as described [[#How_can_I_run_multiple_virtual_machines_at_the_same_time.3F|above]]).
 
=== The warrior can't connect to the internet. ===
 
It's possible that the virtual machine has picked up the address of the local DNS cache on your computer, which the virtual machine does not have access to.
 
If you experience this on VirtualBox, see [http://askubuntu.com/questions/204953/virtualbox-dns-stopped-working-on-upgrade-to-12-10 this question and answer]. Additionally, check to see if "Cable Connected" is unchecked in the advanced settings of the virtual adapter, under the network tab in the virtual machine's settings. Check it if it's unchecked, then save your settings.
 
=== I see a message that no item was received. ===
 
This means that there is no work available. This can happen for several reasons:
 
* The project has just finished and someone is inspecting the work done. If a problem is discovered, items may be re-queued and more work will become available.
* You have checked out/claimed too many items. Reduce your concurrency and let others do some of the work too.
* In a rare case, you have been banned by a tracker administrator because there was a problem with your work: you were requesting too much, you were tampering with the scripts, a malfunction has occurred, or your internet connection is "unclean" (see [[#Can_I_use_whatever_internet_access_for_the_warrior.3F|above]]).
 
=== I see a message about rate limiting. ===
 
Don't worry. Keep in mind that although downloading the internet for fun and digital preservation are the primary goals of all Archive Team activities, serious stress on the target's server may occur. The rate limit is imposed by a [[Tracker#People|tracker administrator]] and should not be subverted.
 
(In other words, we don't want to DDoS the servers.)
 
If you like, you can switch to another [[Warrior projects|project]] with less load.
 
=== I see a message about code being out of date. ===
 
Don't worry. There is a new update ready. You do not need to do anything about this; the warrior will update its code every hour. If you are impatient, please reboot the warrior and it will download the latest code and resume work.
 
=== I'm running the scripts manually and I see a message about code being out of date. ===
 
This happens when a bug in the scripts is discovered. Bugs are unavoidable, especially when the server is out of our control.
 
Try the <code>--auto-update</code> option available in Seesaw version 0.8. However, please be aware that you are now executing code automatically. Be sure to run the scripts in a separate user account for safety.
 
=== I see messages about rsync errors. ===


Unless you know what you are doing, you should not need to do this. But if you want to, the username is <code>root</code> and the password is <code>archiveteam</code>. Then, you can execute <code>sudo -u warrior -i</code> to log in as the warrior user.  
Uh-oh! Something is not right. Please notify us immediately in the appropriate [[IRC]] channel.


Press ALT+F3 to switch to virtual console number 3. Use ALT+Left or ALT+Right to switch between virtual consoles. There are 6 virtual consoles in total. Consoles 1 and 2 are reserved for the warrior.
=== I told the warrior to shut down from the interface, but nothing has changed. ===


=== The warrior seems to have too much overhead. I can't run a VM in a VPS! ===
The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away, go ahead. Your progress will be lost, but the jobs will eventually cycle out to another user.


You don't need to run a virtual machine. If you are managing a VPS, it's likely you are comfortable with some Linux stuff. Projects can be run manually. Consult the project wiki page or the source code repository readme file.
=== The warrior is eating all my bandwidth! ===


=== Why a virtual machine in the first place? ===
On VirtualBox (relatively recent versions), use this command:


The virtual machine is a quick, safe, and easy way for newcomers to help us out. It offers many features:
<pre>vboxmanage bandwidthctl archiveteam-warrior-3 add limit --type network --limit 3m</pre>


* Graphical interface
This will limit the warrior to 3Mb/s. (Limit units are <code>k</code> for kilobit, <code>m</code> for megabit, <code>g</code> for gigabit, <code>K</code> for kilobyte, <code>M</code> for megabyte, and <code>G</code> for gigabyte.)  Adjust as required. :)
* Automatically selects which project is important to run
* Self-updating software infrastructure
* Allows for unattended use
* In case of software faults, your machine is not ruined
* Restarts itself in case of runaway programs
* Runs on Windows, Mac, and Linux painlessly
* Ensures consistency in the archived data regardless of your machine's quirks


If you have suggestions for improving this system, please talk to us as described below.
In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:


=== I'm running the scripts manually in a VPS but it says the code is out of date a while later ===
<pre>VBoxManage bandwidthctl archiveteam-warrior-3 add netlimit --type network --limit 3</pre>


It happens when a bug in the scripts is discovered. Bugs are unavoidable especially when the server is out of our control.
For more information, consult [http://www.virtualbox.org/manual/ch06.html#network_bandwidth_limit the VirtualBox manual (Chapter 6, Section 9)].


If you are good with scripting, try scripting <code>run-pipeline</code> with <code>--max-items N</code> and <code>git pull</code> in a loop. Or better yet, help us code an auto-updating-outside-the-VM feature.
On VMWare (versions 9 and above), select a virtual machine and open Settings → Hardware → Virtual Network Adapter → Advanced. You can set a bandwidth limit here.


=== I just imported the ova image and the warrior is stuck on "Preparing the data partition" ===
=== The warrior is using up disk space, even though it's not running a project! ===


This issue has cropped up before and we do not know what causes it. It is recommended to just delete the warrior image and import the ova again. Testing shows that such a reimport works in the majority of cases.
Virtual machine disk images do not behave like a regular file. There are several ways to safely reclaim space:


=== Why is the default project not working? / Why is a manual project not in the Warrior yet? ===
* Delete the second disk and put back an empty disk. The warrior should reformat the second disk.
* Delete the entire warrior application and re-import it.
* Use the [http://intgat.tigress.co.uk/rmy/uml/index.html zerofree] program and then clone the disk image. Reattach the cloned disk image.


Sorry. Sometimes the administrators are too busy...
=== The item I'm working on is downloading thousands of URLs and it's taking hours. ===


=== Why are there no projects? ===
Please notify us in the appropriate [[IRC]] channel. You may need to reboot the warrior.


If there are no projects showing, you can help us write one. No projects does ''not'' mean there is nothing left to archive!
=== Why is the default project not working? / Why is a manual project not in the warrior yet? ===


=== Where can I file a bug or a feature request? ===
Sorry. Sometimes the administrators are too busy...


If the issue is related to the warrior's web interface or the library that grab scripts are using, see [https://github.com/ArchiveTeam/seesaw-kit/issues seesaw-kit issues]. Other issues should be filed into their own [[Dev/Source_Code|repositories]].
=== Why are there no projects? ===


=== I still have a question! ===
We finished the ones we were working on! If there are no projects showing, you can [[Dev|help us write one]]. No projects does ''not'' mean there is nothing left to archive!


Talk to us on [[IRC]]. Use [irc://irc.efnet.org/warrior #warrior] for specific warrior questions or [irc://irc.efnet.org/archiveteam #archiveteam] for general questions.
=== The instructions to run the software/scripts are awful and they are difficult to set up. ===


== Projects ==
Well, excuuuuse me, princess!


Previous and current warrior projects:
We're not a professional support team so help us help you help us all. See above for [[#Where_can_I_file_a_bug.2C_suggestion.2C_or_a_feature_request.3F|bug reports]], [[#Where_can_I_file_a_bug.2C_suggestion.2C_or_a_feature_request.3F|suggestions]], or [[#I.27d_like_to_help_write_code._Where_can_I_find_more_info.3F|code contributions]].


{| class="wikitable"
=== Where can I file a bug, suggestion, or a feature request? ===
! Project !! Status !! Began !! Finished !! Result !! Archive Location
|-
| [[MobileMe]] || '''Archive Posted''' || April 3, 2012 || Aug 8, 2012 || Success ||
[http://archive.org/details/archiveteam-mobileme-hero archive] [http://archive.org/details/archiveteam-mobileme-index index] [http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html user lookup]
|-
| [[FortuneCity]] || '''Archive Posted''' || April 4, 2012 || April 11, 2012 || Partial Success || [http://archive.org/details/archiveteam-fortunecity archive] [http://archive.org/download/test-memac-index-test/fortunecity.html user lookup]
|-
| [[Tabblo]] || '''Archive Posted''' || May 23, 2012 || May 26, 2012 || Success || [http://archive.org/details/tabblo-archive archive] [http://archive.org/download/test-memac-index-test/tabblo.html user lookup]
|-
| [[Picplz]] || '''Archive Posted''' || June 3, 2012 || June 15, 2012 || || [http://archive.org/details/archiveteam-picplz archive] [http://archive.org/details/archiveteam-picplz-index index] [http://archive.org/download/archiveteam-picplz-index/picplz-20120823.html user lookup]
|-
| [[Tumblr]] (test project) || '''Archive Posted''' || August 9, 2012 || August 19, 2012 || || [http://archive.org/details/archiveteam-tumblr-test archive (tar)] [http://archive.org/details/archiveteam-tumblr-test-warc archive (warc)]
|-
| [[Cinch]].FM || '''Archive Posted''' || August 20, 2012 || August 22, 2012 || Success || [http://archive.org/details/archiveteam-cinch archive]
|-
| [[City of Heroes]] || '''Archive Posted''' || September 3, 2012 || December 1, 2012 || Success || [http://archive.org/details/archiveteam-city-of-heroes-www www] [http://archive.org/details/archiveteam-city-of-heroes-main forums] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-1 1] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-2 2] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-3 3] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-4 4] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-5 5]
|-
| [[Webshots]] || '''Archive Posted''' || October 4, 2012 || November 18, 2012 || || [http://archive.org/download/webshots-freeze-frame-index/index.html index]
|-
| [[BT Internet]] || '''Archive Posted''' || October 10, 2012 || November 2, 2012 || Success || [http://archive.org/details/archiveteam-btinternet archive]
|-
| [[DailyBooth| Daily Booth]] || '''Archive Posted''' || November 19, 2012 || December 29, 2012 || || [http://archive.org/details/archiveteam_dailybooth archive] [http://archive.org/download/dailybooth-freeze-frame-index/index.html lookup]
|-
| [[GitHub Downloads]] || '''Archive Posted''' || December 13, 2012 || December 17, 2012 || Success || [http://archive.org/details/github-downloads-2012-12 archive] [http://archive.org/details/archiveteam-github-repository-index-201212 index]
|-
| [[Yahoo! Blog]] || '''Archive Posted''' || January 8, 2013 || January 19, 2013 || || [http://archive.org/details/yahoo_korea_blogs archive]
|-
| [[weblog.nl]] || '''Archive Posted''' || January 19, 2013 || February 2, 2013 || || [http://archive.org/details/archiveteam_weblognl archive] [http://archive.org/download/archiveteam_weblognl-index/ lookup]
|-
| [[URLTeam]] || Active || || || || [http://urlte.am/releases/ all releases]
|-
| [[Punchfork]] || '''Archive Posted''' || January 11, 2013 || March 6, 2013 || || [http://archive.org/details/archiveteam_punchfork archive] [http://archive.org/download/archiveteam_punchfork_index/ user lookup]
|-
| [[Xanga]] || Downloads Paused || January 22, 2013 || February 16, 2013 || || [http://archive.org/details/archiveteam_xanga archive] [http://archive.org/download/archiveteam_xanga_index/ user lookup] [http://archive.org/details/archiveteam-xanga-userlist-20130142 user list]
|-
| [[Posterous]] || '''Archive Posted''' || February 23, 2013 || June 29, 2013 || || [http://archive.org/details/archiveteam_posterous archive]
|-
| [[Storylane]] || Downloads Finished || March 8, 2013 || March 15, 2013 || ||
|-
| [[Yahoo! Messages]] || '''Archive Posted''' || March 20, 2013 || March 31, 2013 || || [http://archive.org/details/archiveteam_yahoo_messages archive]
|-
| [[Formspring]] || '''Archive Posted''' || March 24, 2013 || September 19, 2013 || Success || [http://archive.org/details/archiveteam_formspring archive]
|-
| [[Yahoo Upcoming]] || '''Archive Posted''' || April 20, 2013 || April 25, 2013 || || [http://archive.org/details/archiveteam archive]
|-
| [[Streetfiles]].org || '''Archive Posted''' || April 28, 2013 || April 30, 2013 || Partial || [https://archive.org/search.php?query=streetfiles archive]
|-
| [[Xanga]] || Downloads Paused || June 21, 2013 || August 31, 2013 || || [http://archive.org/details/archiveteam_xanga archive]
|-
| [[Zapd]] || '''Archive Posted''' || October 1, 2013 || October 8, 2013 || Success || [https://archive.org/details/archiveteam_zapd archive]
|-
| [[Blip.tv]] || Hiatus || October 11, 2013 || ||  ||
|-
| [[Hyves]] || '''Archives Posted''' || November 10, 2013 || December 2, 2013 || Success ||  [http://archive.org/details/hyves archive]
|-
| [[Wretch]] & [[Yahoo! Blog]] || Archives Posted || December 17, 2013 || January 9, 2014 || Partial || [https://archive.org/details/archiveteam_wretch wretch] [https://archive.org/details/archiveteam_yahooblogs Yahoo Blog]
|-
| [[Dogster]] || '''Archives Posted''' || February 7, 2014 || February 16, 2014 || Success || [https://archive.org/details/archiveteam_dogster archive]
|-
| [[My Opera]] || Archives Posted || February 16, 2014 || March 3, 2014 || Success || [https://archive.org/details/archiveteam_myopera archive]
|-
| [[Bebo]] || Hiatus || February 18, 2014 ||  ||  || [https://archive.org/details/archiveteam_bebo archive]
|-
| [[Viddler]] || Cancelled || February 21, 2014 || February 27, 2014 || Qualified Success ||
|-
| [[Justin.tv]] || Archives Posted || June 5, 2014 || June 15, 2014 || Success || [https://archive.org/details/justintv archive]
|-
| [[Yahoo! Voices]] || Archives Posted  || July 28, 2014 || July 31, 2014  || Success || [https://archive.org/details/archiveteam_yahoovoices archive]
|-
| [[Fotopedia]] || Archives Posted  || August 5, 2014 || August 7, 2014 || Success || [https://archive.org/details/archiveteam_fotopedia archive]
|-
| [[Twitch.tv]] || Archives posted || August 9, 2014 || August 24, 2014 || Qualified Success ||
|-
| [[Canv.as]] || Archives Posted || August 11, 2014 || August 12, 2014|| Success || [https://archive.org/details/archiveteam_canvas archive]
|-
| [[Swipnet]] || Downloads Finished || August 19, 2014 || September 1, 2014 || Success || ||
|-
| [[Verizon Personal Web Space]] || Downloads Finished || September 2, 2014, || October 1, 2014 || Qualified Success ||
|-
| [[TwitPic]] || In progress || September 4, 2014 || || ||
|-
| [[Ancestry.com]] || In progress || September 19, 2014 || || ||
|-
| [[Quizilla]] || Downloads Finished || September 4, 2014 || October 1, 2014 || ||
|-
| [[Qwiki]] || In progress || September 28, 2014 || || ||
|-
| [[Panoramio]] || In development || October 4, 2014 || || ||
|-


|}
If the issue is related to the warrior's web interface or the library that grab scripts are using, see [https://github.com/ArchiveTeam/seesaw-kit/issues seesaw-kit issues]. Other issues should be filed into their own [[Dev/Source_Code|repositories]].


=== Status ===
== Projects ==
:; In Development : a future project
:; Active : start up a Warrior and join the fun; this one is in progress right now
:; Downloads Finished : we've finished downloading the data
:; Archived : the collected data has been properly archived
:; Archive Posted : the archive is available for download


=== Result ===
See [[Warrior projects]].
:; Success : downloaded all of the data and posted the archive publicly
:; Qualified Success :  either we couldn't get all of the data, or the archive can't be made public
:; Failure : the site closed before we could download anything


=== Are you a coder? ===
== Are you a coder? ==


Like the warrior? Interested in how it works under the hood? Got software skills? '''[[Dev|Help us improve it!]]'''
Like the warrior? Interested in how it works under the hood? Got software skills? '''[[Dev|Help us improve it!]]'''
{{Navigation box}}

Revision as of 06:53, 30 December 2020

Archiveteam1.png The Warrior virtual machine appliances are currently unable to run wget-at. You may get a blank page trying to run projects requiring it, even if you select the project in the Warrior user interface. At present, only the Terror of Tiny Town/URLTeam project still works in the Warrior VM. This is a known issue. A replacement will be made but this will take a while.

In the meanwhile, in place of the Warrior VM appliance, you may manually run projects using Docker. (If you like, you can run Docker in a VM of your own choosing. Recent Ubuntu versions are known to work.) For further info, see our guide to Running_Archive_Team_Projects_with_Docker and also see the project's Readme instructions in the ArchiveTeam GitHub repositories. If you have any issues or feedback, see the AT #warrior IRC channel on hackint.

What is the Archive Team Warrior?

Archive team.png
Warrior-vm-screenshot.png
Warrior-web-screenshot.png

The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!

The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the Tracker.

Basic usage

The warrior runs on Windows, macOS, and Linux using a virtual machine. You'll need:

Plus one of the virtualization applications below to run it:

Quick start instructions for VirtualBox

  1. Download the appliance from the link above.
  2. Launch VirtualBox
  3. In VirtualBox, click File > Import Appliance and open the file.
  4. Start the virtual machine.
    • It will fetch the latest updates and will eventually tell you to start your web browser.
  5. Using your regular web browser, visit http://localhost:8001/
  6. On the left, click "Your settings".
  7. Choose a username - we'll show your progress on the leaderboard.
  8. On the left, click "Available projects" tab and pick a project to work on.
    • Even better: select "ArchiveTeam's Choice" to let your warrior work on the most urgent project.

Start instructions for VMWare Player

  1. Download the appliance from the link above
  2. Launch VMWare Player
  3. In Player on the right, click "Open Virtual Machine", open the file and import the virtual machine.
  4. (Optional) Select the virtual machine and click "Edit virtual machine settings".
    • Select Network Adapter and set it to "Bridged: Connected directly to the physical network"
  5. Start the virtual machine.
    • It will fetch the latest updates and will eventually tell you to start your web browser.
  6. Using your regular web browser, visit the address that is shown on the bottom (e.g. http://192.168.0.100:8001/)
  7. On the left, click "Your settings".
  8. Choose a username - we'll show your progress on the leaderboard.
  9. On the left, click "Available projects" tab and pick a project to work on.
    • Even better: select "ArchiveTeam's Choice" to let your warrior work on the most urgent project.


Beta version

Help us test the new version of the appliance available at https://warriorhq.archiveteam.org/downloads/warrior3/. Report any issues or feedback on #warrior on hackint.

This version includes updated internal components that allow us to run newer and improved software. Specifically, it runs the warrior code within a Docker container.

Alternative virtual machines

Thanks to user-effort, there are alternatives:

Please note that these alternatives are not in widespread use by our warriors, so we may not be able to help with either issues or advanced usage.

Warrior FAQ

Why a virtual machine in the first place?

The virtual machine is a quick, safe, and easy way for newcomers to help us out. It offers many features:

  • Graphical interface
  • Automatically selects which project is important to run
  • Self-updating software infrastructure
  • Allows for unattended use
  • In case of software faults, your machine is not ruined
  • Restarts itself in case of runaway programs
  • Runs on Windows, Mac, and Linux painlessly
  • Ensures consistency in the archived data regardless of your machine's quirks

If you have suggestions for improving this system, please talk to us as described below.

Can I use whatever internet access for the warrior?

No. We need "clean" connections. Please ensure the following:

  • No OpenDNS. No ISP DNS that redirects to a search page. Use non-captive DNS servers.
  • No ISP connections that inject advertisements into web pages.
  • No proxies. Proxies can return bad data. The original HTTP headers and IP address are needed for the WARC file.
  • No content-filtering firewalls.
  • No censorship. If you believe your country implements censorship, do not run a warrior.
  • No Tor. The server may return an error page instead of content if they ban exit nodes.
  • No free cafe wifi. Archiving your cafe's wifi service agreement repeatedly is not helpful.
  • No VPNs. Data integrity is a very high priority for the Archive Team so use of VPNs with the official crawler is discouraged.
  • We prefer connections from many public IP addresses if possible. (For example, if your apartment building uses a single IP address, we don't want your apartment banned.)

I turned my warrior VM appliance off. Will those tasks be lost?

If you've killed your warrior VM instance, then the work your warrior did has been lost. However, the tasks will be returned to the pool after a period of time, and other warriors may claim them.

I closed my browser or tab with the warrior's web interface. Will those tasks be lost?

No. The web browser interface just provides a user interface to the warrior. As long as the VM or docker container is not stopped, it will continue normally.

I need to disconnect my internet / reboot my PC. How can I do this without losing work?

If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.

If you decide to use the suspend feature, please note that if you keep it suspended for too long (more than a few hours), the administrators will assume that the item is lost and re-queue it. Using the suspend feature so that you can reboot your computer is perfectly fine.

How much disk space will the warrior use?

Short answer: it depends on the project. (But never more than 60GB.)

Long answer: because each project defines items differently, sizes may vary. A single task may be a small file or a whole subsection of a website. The virtual machine is configured by default to use an absolute maximum of 60GB. Any unused virtual machine disk space is not used on the host computer. You may run the virtual machine on less than 60GB if you like to live dangerously. We're downloading the internet, after all!

How can I log in to the virtual machine?

Unless you know what you are doing, you should not need to do this.


To login, start up the Warrior VM and wait for it to finish booting with the screen showing "The warrior has successfully start up". Press ALT+F3 to switch to virtual console number 3. VirtualBox users may need to press the host key, RIGHT_CONTROL to enter capture mode before pressing ALT+F3. Use ALT+Left or ALT+Right to switch between virtual consoles. There are 6 virtual consoles in total. Consoles 1 and 2 are reserved for the warrior. Switching to a new virtual console will show a login shell. You can login using the username root and the password archiveteam. Once logged in as root, you can execute sudo -u warrior -i to log in as the warrior user.


How can I run multiple virtual machines at the same time?

You'll need to adjust the networking settings.

In VirtualBox, select a virtual machine and open up Settings → Network → Adapter 1 → Port Forwarding. You need to adjust the host port. For example, set your table to TCP | 127.0.0.1 | 8123 | | 8001. This maps port 8123 on the host machine (your computer) to port 8001 on the virtual machine (the warrior), and you can then access the warrior's web interface from port 8123 in your browser.

Each VM you want to access should have a different host port. Do not use port numbers below 1024 unless you know what you are doing.

VMWare installations should be using bridged networking. However, if you want, you can switch to NAT (under Settings → Hardware → Virtual Network Adapter) and click Edit to set up port forwarding. On Linux, you can also use lines like 8123 = 192.168.0.100:8001 in the [incomingtcp] section of nat.conf. (Make sure the VM IP is correct!)

How can I run the virtual machine headlessly (without leaving a window open)?

From the VirtualBox GUI, after opening the VM, click Machine > Detach GUI. You can then close the VirtualBox Manager window.

For the VirtualBox CLI, use this command:

vboxmanage startvm archiveteam-warrior-2 --type headless

Shut down the VM with:

vboxmanage controlvm archiveteam-warrior-2 acpipowerbutton

Substituting suspend or resume for acpipowerbutton suspends or resumes the VM. For more information, consult the VirtualBox manual (Chapter 8, Sections 12 and 13).

For the VMWare CLI, use this command:

vmrun start <path to vmx file> nogui

Shut down with:

vmrun stop <path to vmx file> soft

Substituting suspend for stop suspends the VM. Resume with start again. For more information, including the paths to VMX files on different operating systems, consult Using vmrun to Control Virtual Machines (PDF), pages 10 and 11.

How can I set up the virtual machine as a system service (so that it starts up on boot and shuts down automatically)?

If you are using VirtualBox and running a Linux distribution that uses the systemd init system (like most recent releases), you can follow the short instructions on this page. (The page title specifies Arch Linux, but this will work for other distros as long as they run systemd.)

How can I set up the virtual machine with directly-bridged networking instead of NAT?

On VirtualBox, use these commands:

vboxmanage modifyvm archiveteam-warrior-2 --nic1 bridged
vboxmanage modifyvm archiveteam-warrior-2 --bridgeadapter1 eth0

We presume you want to bind to eth0. Adjust as required. :)

VMWare installations should already be using bridged networking.

How can I access the virtual machine from another device on my network?

Full guide for VirtualBox users is found here


How can I run the warrior without a virtual machine? (The VM has too much overhead for a VPS!)

One option is running a Docker container (see above). Docker is based on LXC, and the overhead is far less than running a full VM. If you plan on running the warrior-dockerfile, make sure to publish the port to allow access to the web interface:

docker run -d -p 8001:8001 archiveteam/warrior-dockerfile

This creates a direct port mapping. For host port 38001 to container port 8001, use 38001:8001. Adjust as required. :P

(Multiple projects can be also run in isolated environments (containers) for rapid deployment using at-as-dockerfile.)

Another alternative is running the project manually. If you are managing a VPS, it's likely you are comfortable with some Linux stuff. Consult the project wiki page or the source code repository readme file.

How can I run tons of warriors easily?

We assume you've checked with the current ArchiveTeam project what concurrency and resources are needed or useful!

Whether your have your own virtual cluster or you're renting someone else's (aka a "cloud"), you probably need some orchestration software.

ArchiveTeam volunteers have successfully used a variety of hosting providers and tools (including free trials on AWS and GCE), often just by building their own flavour of virtual server and then repeating it with simple cloud-init scripts (to install and launch docker as above) or whatever tool the hosting provides. If you desire full automation, the archiveteam-infra repository by diggan helps with Terraform on DigitalOcean.

Some custom monitoring scripts also exist, for instance watcher.py.

I'm looking at the leaderboard. What's that icon beside the username?

That's just the warrior logo: Archive team.png (click on the image for a larger version). It means that that person is using the warrior. Those without the icon are running the scripts manually.

Archiveteam-warrior-sticker.png

What's that guy doing in the logo?

The place is on fire! But don't worry, he safely escaped with the rescued data in his arms.

That’s awesome – can I slap this logo on my laptop to show my Internet-preservation pride?

You sure can! The ArchiveTeam Warrior laptop sticker can start conversations about archiving, if you’re into that.

I'd like to help write code or I want to tweak the scripts to run to my liking. Where can I find more info? Where is the source code and repository?

Check out the Dev documentation for details on the infrastructure and details of the source code layout.

I still have a question!

Check out the general FAQ page. Talk to us on IRC. Use #warrior for specific warrior questions or #archiveteam for general questions.

Troubleshooting

I'm getting errors when I try to launch the VM.

If you are receiving Breakpoint has been reached (0x80000003), A critical error has occurred while running the virtual machine and the machine execution has been stopped., or VT-X errors, you probably do not have virtualization enabled, either because it is turned off in your computer's BIOS or your CPU does not support it.

You can check CPU support on Linux with cat /proc/cpuinfo | grep "(vmx|svm)" | uniq. If there is a line of output starting with "flags", your processor supports virtualization; if there is no output, it does not. You can check whether virtualization is enabled in the BIOS using the rdmsr utility in your distro's msr-tools package.

You can check support and BIOS status on Windows using Microsoft's Hardware-Assisted Virtualization Detection Tool or VirtualChecker.

To enable virtualization on a CPU with support, reboot the computer and enter the BIOS. The virtualization setting is usually under something like 'CPU configuration' or 'advanced settings'.

I just imported the ova image and the warrior is stuck on "Preparing the data partition".

This issue has cropped up before, and we do not know what causes it. We recommend you delete the warrior image and import the ova again. Testing shows that such a reimport works in the majority of cases.

I can't connect to localhost.

The application is configured to set up port forwarding to the guest machine, and you should be able to access the interface through your web browser at port 8001. If this does not happen, and isn't resolved by rebooting the warrior (using the ACPI power signals, not suspend/save state and resume), you may need to double-check your machine's network settings (as described above).

The warrior can't connect to the internet.

It's possible that the virtual machine has picked up the address of the local DNS cache on your computer, which the virtual machine does not have access to.

If you experience this on VirtualBox, see this question and answer. Additionally, check to see if "Cable Connected" is unchecked in the advanced settings of the virtual adapter, under the network tab in the virtual machine's settings. Check it if it's unchecked, then save your settings.

I see a message that no item was received.

This means that there is no work available. This can happen for several reasons:

  • The project has just finished and someone is inspecting the work done. If a problem is discovered, items may be re-queued and more work will become available.
  • You have checked out/claimed too many items. Reduce your concurrency and let others do some of the work too.
  • In a rare case, you have been banned by a tracker administrator because there was a problem with your work: you were requesting too much, you were tampering with the scripts, a malfunction has occurred, or your internet connection is "unclean" (see above).

I see a message about rate limiting.

Don't worry. Keep in mind that although downloading the internet for fun and digital preservation are the primary goals of all Archive Team activities, serious stress on the target's server may occur. The rate limit is imposed by a tracker administrator and should not be subverted.

(In other words, we don't want to DDoS the servers.)

If you like, you can switch to another project with less load.

I see a message about code being out of date.

Don't worry. There is a new update ready. You do not need to do anything about this; the warrior will update its code every hour. If you are impatient, please reboot the warrior and it will download the latest code and resume work.

I'm running the scripts manually and I see a message about code being out of date.

This happens when a bug in the scripts is discovered. Bugs are unavoidable, especially when the server is out of our control.

Try the --auto-update option available in Seesaw version 0.8. However, please be aware that you are now executing code automatically. Be sure to run the scripts in a separate user account for safety.

I see messages about rsync errors.

Uh-oh! Something is not right. Please notify us immediately in the appropriate IRC channel.

I told the warrior to shut down from the interface, but nothing has changed.

The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away, go ahead. Your progress will be lost, but the jobs will eventually cycle out to another user.

The warrior is eating all my bandwidth!

On VirtualBox (relatively recent versions), use this command:

vboxmanage bandwidthctl archiveteam-warrior-3 add limit --type network --limit 3m

This will limit the warrior to 3Mb/s. (Limit units are k for kilobit, m for megabit, g for gigabit, K for kilobyte, M for megabyte, and G for gigabyte.) Adjust as required. :)

In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:

VBoxManage bandwidthctl archiveteam-warrior-3 add netlimit --type network --limit 3

For more information, consult the VirtualBox manual (Chapter 6, Section 9).

On VMWare (versions 9 and above), select a virtual machine and open Settings → Hardware → Virtual Network Adapter → Advanced. You can set a bandwidth limit here.

The warrior is using up disk space, even though it's not running a project!

Virtual machine disk images do not behave like a regular file. There are several ways to safely reclaim space:

  • Delete the second disk and put back an empty disk. The warrior should reformat the second disk.
  • Delete the entire warrior application and re-import it.
  • Use the zerofree program and then clone the disk image. Reattach the cloned disk image.

The item I'm working on is downloading thousands of URLs and it's taking hours.

Please notify us in the appropriate IRC channel. You may need to reboot the warrior.

Why is the default project not working? / Why is a manual project not in the warrior yet?

Sorry. Sometimes the administrators are too busy...

Why are there no projects?

We finished the ones we were working on! If there are no projects showing, you can help us write one. No projects does not mean there is nothing left to archive!

The instructions to run the software/scripts are awful and they are difficult to set up.

Well, excuuuuse me, princess!

We're not a professional support team so help us help you help us all. See above for bug reports, suggestions, or code contributions.

Where can I file a bug, suggestion, or a feature request?

If the issue is related to the warrior's web interface or the library that grab scripts are using, see seesaw-kit issues. Other issues should be filed into their own repositories.

Projects

See Warrior projects.

Are you a coder?

Like the warrior? Interested in how it works under the hood? Got software skills? Help us improve it!