Dev/Source Code

From Archiveteam
< Dev
Jump to navigation Jump to search

Fork me on GitHub! File and triage issues, fix bugs, refactor code, submit pull requests… all welcome! Discussion in #archiveteam-dev (on hackint).

See this link for all issues.

The warrior uses the following repos:

Client code

Client code includes code that the Warrior executes.

warrior3 - bootstrap and tools to build the image
Bootstrap code that is pulled from GitHub by the appliance and starts a docker container
archiveteam/warrior-dockerfile - the container
Instructions to boostrap the docker container
warrior2 - warrior runner code
Main code that runs inside of the docker container
Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.


Projects are in separate repositories typically with the name -grab as a suffix.

Item lists that are loaded into the tracker are sometimes saved into a repo with -items as a suffix. Scripts to build searchable index HTML pages are usually suffixed with -index.

Server code

Server code includes code that the Tracker executes.

universal-tracker - Ruby

The server of which the Seesaw contacts

warrior-hq - Ruby

The server of which the warrior appliances contact for project metadata

archiveteam-megawarc-factory - shell

The scripts that bundles the WARC files.

URLTeam code

URLTeam code is independent from the tracker and warrior.



The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.


The server code for the tracker.



A pipeline shim to run the code.


The code for both the client library and tracker.



Dockerfile that runs the warrior inside a Docker container.

ArchiveBot - Ruby, Python, Lua

An IRC bot for archiving websites.

wget-lua - C, Lua

A patched version of Wget for web crawling.

standalone-readme-template - Markdown

A template for readme files included in grab repositories.

archiveteam-dev-env - Shell

Ubuntu preseed for a developer environment for ArchiveTeam projects.

wpull - Python

A Wget-compatible web downloader/crawler.

Developer Documentation