Dev/Source Code

From Archiveteam
< Dev
Revision as of 14:42, 15 December 2013 by Chfoo (talk | contribs) (label repos with programming language)
Jump to navigation Jump to search

Fork me on GitHub! File issues, fix bugs, refactor code, submit pull requests… all welcome!

The warrior uses the following repos:

Client code

Client code includes code that the Warrior executes.

warrior-preseed - shell

For constructing the warrior virtual appliance image

warrior-code2 - shell

Bootstrap code that is pulled from GitHub by the appliance

seesaw-kit - Python

Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.

Projects

Projects are in separate repositories typically with the name -grab as a suffix.

Server code

Server code includes code that the Tracker executes.

universal-tracker - Ruby

The server of which the Seesaw contacts

warrior-hq - Ruby

The server of which the warrior appliances contact for project metadata

archiveteam-megawarc-factory - shell

The scripts that bundles the WARC files.

URLTeam code

URLTeam code is independent from the tracker and warrior.

tinyback

The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.

tinyarchive

The server code for the tracker.

Misc

warrior-dockerfile

Dockerfile that runs the warrior inside a Docker container.

ArchiveBot - Ruby, Python, Lua

An IRC bot for archiving websites.

wget-lua - C, Lua

A patched version of Wget for web crawling.


Developer Documentation