Difference between revisions of "Dev/Source Code"

From Archiveteam
< Dev
Jump to navigation Jump to search
m
(Add wpull and -dev IRC channel)
 
(10 intermediate revisions by 5 users not shown)
Line 1: Line 1:
[https://github.com/ArchiveTeam/ Fork me on GitHub!] File and triage issues, fix bugs, refactor code, submit pull requests… all welcome!
[https://github.com/ArchiveTeam/ Fork me on GitHub!] File and triage issues, fix bugs, refactor code, submit pull requests… all welcome! Discussion in {{IRC|archiveteam-dev}}.


'''[https://github.com/issues?q=is%3Aopen+user%3AArchiveTeam See this link for all issues]'''.
'''[https://github.com/search?l=&q=user%3AArchiveTeam+state%3Aopen&type=Issues See this link for all issues]'''.


The warrior uses the following repos:
The warrior uses the following repos:
Line 9: Line 9:
Client code includes code that the [[Warrior]] executes.
Client code includes code that the [[Warrior]] executes.


'''[https://github.com/ArchiveTeam/warrior-preseed warrior-preseed]''' - shell
;[https://github.com/ArchiveTeam/Ubuntu-Warrior warrior3]''' - bootstrap and tools to build the image
:For constructing the warrior virtual appliance image
:Bootstrap code that is pulled from GitHub by the appliance and starts a docker container
'''[https://github.com/ArchiveTeam/warrior-code2 warrior-code2]''' - shell
;[https://github.com/ArchiveTeam/warrior-dockerfile archiveteam/warrior-dockerfile] - the container
:Bootstrap code that is pulled from GitHub by the appliance
:Instructions to boostrap the docker container
'''[https://github.com/ArchiveTeam/seesaw-kit seesaw-kit]''' - Python
;[https://github.com/ArchiveTeam/warrior-code2 warrior2]''' - warrior runner code
:Main code that runs inside of the docker container
;[https://github.com/ArchiveTeam/seesaw-kit seesaw-kit]'''
:Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.
:Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.


Line 64: Line 66:
'''[https://github.com/ArchiveTeam/archiveteam-dev-env archiveteam-dev-env]''' - Shell
'''[https://github.com/ArchiveTeam/archiveteam-dev-env archiveteam-dev-env]''' - Shell
:Ubuntu preseed for a developer environment for ArchiveTeam projects.
:Ubuntu preseed for a developer environment for ArchiveTeam projects.
'''[https://github.com/ArchiveTeam/wpull wpull]''' - Python
:A Wget-compatible web downloader/crawler.


{{devnav}}
{{devnav}}


{{Navigation box}}
{{Navigation box}}

Latest revision as of 23:02, 2 May 2019

Fork me on GitHub! File and triage issues, fix bugs, refactor code, submit pull requests… all welcome! Discussion in #archiveteam-dev (on hackint).

See this link for all issues.

The warrior uses the following repos:

Client code

Client code includes code that the Warrior executes.

warrior3 - bootstrap and tools to build the image
Bootstrap code that is pulled from GitHub by the appliance and starts a docker container
archiveteam/warrior-dockerfile - the container
Instructions to boostrap the docker container
warrior2 - warrior runner code
Main code that runs inside of the docker container
seesaw-kit
Library that helps build grab scripts, the web interface, and pipeline engine for the warrior. The name "seesaw" comes from its original behavior: download, upload, and repeat.

Projects

Projects are in separate repositories typically with the name -grab as a suffix.

Item lists that are loaded into the tracker are sometimes saved into a repo with -items as a suffix. Scripts to build searchable index HTML pages are usually suffixed with -index.

Server code

Server code includes code that the Tracker executes.

universal-tracker - Ruby

The server of which the Seesaw contacts

warrior-hq - Ruby

The server of which the warrior appliances contact for project metadata

archiveteam-megawarc-factory - shell

The scripts that bundles the WARC files.

URLTeam code

URLTeam code is independent from the tracker and warrior.

Old:

tinyback

The client code that scrapes the shortlinks. It includes a pipeline shim to run the code.

tinyarchive

The server code for the tracker.

New:

terroroftinytown-client-grab

A pipeline shim to run the code.

terroroftinytown

The code for both the client library and tracker.

Misc

warrior-dockerfile

Dockerfile that runs the warrior inside a Docker container.

ArchiveBot - Ruby, Python, Lua

An IRC bot for archiving websites.

wget-lua - C, Lua

A patched version of Wget for web crawling.

standalone-readme-template - Markdown

A template for readme files included in grab repositories.

archiveteam-dev-env - Shell

Ubuntu preseed for a developer environment for ArchiveTeam projects.

wpull - Python

A Wget-compatible web downloader/crawler.


Developer Documentation