Difference between revisions of "Dev/Source Code"
< Dev
Jump to navigation
Jump to search
(add dockerfile repo) |
(reorganize dockerfile, add archivebot and wget-lua) |
||
Line 3: | Line 3: | ||
== Client code == | == Client code == | ||
Client code includes code that the | Client code includes code that the [[Warrior]] executes. | ||
[https://github.com/ArchiveTeam/warrior-preseed warrior-preseed] | [https://github.com/ArchiveTeam/warrior-preseed warrior-preseed] | ||
:For constructing the virtual appliance image | :For constructing the warrior virtual appliance image | ||
[https://github.com/ArchiveTeam/warrior-code2 warrior-code2] | [https://github.com/ArchiveTeam/warrior-code2 warrior-code2] | ||
:Bootstrap code that is pulled from GitHub by the appliance | :Bootstrap code that is pulled from GitHub by the appliance | ||
[https://github.com/ArchiveTeam/seesaw-kit seesaw-kit] | [https://github.com/ArchiveTeam/seesaw-kit seesaw-kit] | ||
:Library that helps build grab scripts | :Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat. | ||
=== Projects === | |||
Projects are in separate repositories typically with the name <code>-grab</code> as a suffix. | Projects are in separate repositories typically with the name <code>-grab</code> as a suffix. | ||
Line 27: | Line 27: | ||
:The scripts that bundles the [[The WARC Ecosystem|WARC files]]. | :The scripts that bundles the [[The WARC Ecosystem|WARC files]]. | ||
=== URLTeam code === | |||
URLTeam code is independent from the tracker and warrior. | URLTeam code is independent from the tracker and warrior. | ||
Line 35: | Line 35: | ||
[https://github.com/ArchiveTeam/tinyarchive tinyarchive] | [https://github.com/ArchiveTeam/tinyarchive tinyarchive] | ||
: The server code for the tracker. | : The server code for the tracker. | ||
== Misc == | |||
[https://github.com/ArchiveTeam/warrior-dockerfile warrior-dockerfile] | |||
:Dockerfile that runs the warrior inside a Docker container. | |||
[https://github.com/ArchiveTeam/ArchiveBot ArchiveBot] | |||
:An IRC bot for archiving websites. | |||
[https://github.com/ArchiveTeam/wget-lua wget-lua] | |||
:A patched version of Wget for web crawling. | |||
{{devnav}} | {{devnav}} |
Revision as of 12:02, 5 December 2013
Fork me on GitHub! The warrior uses the following repos:
Client code
Client code includes code that the Warrior executes.
- For constructing the warrior virtual appliance image
- Bootstrap code that is pulled from GitHub by the appliance
- Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.
Projects
Projects are in separate repositories typically with the name -grab
as a suffix.
Server code
Server code includes code that the Tracker executes.
- The server of which the Seesaw contacts
- The server of which the warrior appliances contact for project metadata
- The scripts that bundles the WARC files.
URLTeam code
URLTeam code is independent from the tracker and warrior.
- The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.
- The server code for the tracker.
Misc
- Dockerfile that runs the warrior inside a Docker container.
- An IRC bot for archiving websites.
- A patched version of Wget for web crawling.