Difference between revisions of "Dev/Source Code"

From Archiveteam
< Dev
Jump to navigation Jump to search
(reorganize dockerfile, add archivebot and wget-lua)
(label repos with programming language)
Line 1: Line 1:
'''[https://github.com/ArchiveTeam/ Fork me on GitHub!]''' The warrior uses the following repos:
'''[https://github.com/ArchiveTeam/ Fork me on GitHub!]''' File issues, fix bugs, refactor code, submit pull requests… all welcome!
 
The warrior uses the following repos:


== Client code ==
== Client code ==
Line 5: Line 7:
Client code includes code that the [[Warrior]] executes.
Client code includes code that the [[Warrior]] executes.


[https://github.com/ArchiveTeam/warrior-preseed warrior-preseed]
'''[https://github.com/ArchiveTeam/warrior-preseed warrior-preseed]''' - shell
:For constructing the warrior virtual appliance image
:For constructing the warrior virtual appliance image
[https://github.com/ArchiveTeam/warrior-code2 warrior-code2]
'''[https://github.com/ArchiveTeam/warrior-code2 warrior-code2]'''  - shell
:Bootstrap code that is pulled from GitHub by the appliance
:Bootstrap code that is pulled from GitHub by the appliance
[https://github.com/ArchiveTeam/seesaw-kit seesaw-kit]
'''[https://github.com/ArchiveTeam/seesaw-kit seesaw-kit]''' - Python
:Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.
:Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.


Line 20: Line 22:
Server code includes code that the [[Tracker]] executes.
Server code includes code that the [[Tracker]] executes.


[https://github.com/ArchiveTeam/universal-tracker universal-tracker]
'''[https://github.com/ArchiveTeam/universal-tracker universal-tracker]''' - Ruby
:The server of which the Seesaw contacts
:The server of which the Seesaw contacts
[https://github.com/ArchiveTeam/warrior-hq warrior-hq]
'''[https://github.com/ArchiveTeam/warrior-hq warrior-hq]''' - Ruby
:The server of which the warrior appliances contact for project metadata
:The server of which the warrior appliances contact for project metadata
[https://github.com/ArchiveTeam/archiveteam-megawarc-factory archiveteam-megawarc-factory]
'''[https://github.com/ArchiveTeam/archiveteam-megawarc-factory archiveteam-megawarc-factory]''' - shell
:The scripts that bundles the [[The WARC Ecosystem|WARC files]].
:The scripts that bundles the [[The WARC Ecosystem|WARC files]].


Line 31: Line 33:
URLTeam code is independent from the tracker and warrior.
URLTeam code is independent from the tracker and warrior.


[https://github.com/ArchiveTeam/tinyback tinyback]
'''[https://github.com/ArchiveTeam/tinyback tinyback]'''
: The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.
: The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.
[https://github.com/ArchiveTeam/tinyarchive tinyarchive]
'''[https://github.com/ArchiveTeam/tinyarchive tinyarchive]'''
: The server code for the tracker.
: The server code for the tracker.


== Misc ==
== Misc ==


[https://github.com/ArchiveTeam/warrior-dockerfile warrior-dockerfile]
'''[https://github.com/ArchiveTeam/warrior-dockerfile warrior-dockerfile]'''
:Dockerfile that runs the warrior inside a Docker container.
:Dockerfile that runs the warrior inside a Docker container.
[https://github.com/ArchiveTeam/ArchiveBot ArchiveBot]
'''[https://github.com/ArchiveTeam/ArchiveBot ArchiveBot]''' - Ruby, Python, Lua
:An IRC bot for archiving websites.
:An IRC bot for archiving websites.
[https://github.com/ArchiveTeam/wget-lua wget-lua]
'''[https://github.com/ArchiveTeam/wget-lua wget-lua]''' - C, Lua
:A patched version of Wget for web crawling.
:A patched version of Wget for web crawling.


{{devnav}}
{{devnav}}

Revision as of 14:42, 15 December 2013

Fork me on GitHub! File issues, fix bugs, refactor code, submit pull requests… all welcome!

The warrior uses the following repos:

Client code

Client code includes code that the Warrior executes.

warrior-preseed - shell

For constructing the warrior virtual appliance image

warrior-code2 - shell

Bootstrap code that is pulled from GitHub by the appliance

seesaw-kit - Python

Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.

Projects

Projects are in separate repositories typically with the name -grab as a suffix.

Server code

Server code includes code that the Tracker executes.

universal-tracker - Ruby

The server of which the Seesaw contacts

warrior-hq - Ruby

The server of which the warrior appliances contact for project metadata

archiveteam-megawarc-factory - shell

The scripts that bundles the WARC files.

URLTeam code

URLTeam code is independent from the tracker and warrior.

tinyback

The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.

tinyarchive

The server code for the tracker.

Misc

warrior-dockerfile

Dockerfile that runs the warrior inside a Docker container.

ArchiveBot - Ruby, Python, Lua

An IRC bot for archiving websites.

wget-lua - C, Lua

A patched version of Wget for web crawling.


Developer Documentation