Dev/Seesaw

From Archiveteam
< Dev
Revision as of 16:18, 4 December 2013 by Chfoo (talk | contribs) (add stub about project contents)
Jump to navigation Jump to search
Archiveteam1.png this page is work in progress

What a Archive Team Project Contains

pipeline.py

This file contains the Seesaw client code for the project.

README.*

This file contains
* brief information about the project
* instructions on how to manually run the scripts

[Project Name Here].lua (optional)

This is the Lua script used by Wget-Lua.

warrior_install.sh (optional)

This file is executed by the Warrior to install extra libraries needed by the project.

wget-lua-warrior (optional)

This executable is a build of Wget-Lua for the warrior environment.

Writing a pipeline.py (Seesaw Client)

The Seesaw client is a specific set of tasks that must be done within an item. Think of it as a template of instructions. Typically, the file is called pipeline.py. The pipeline file uses the Seesaw Library.

The pipeline file will typically use Wget with Lua scripting. The Lua script provided as an argument to Wget within the pipeline file. It controls fine grain operations within Wget such as rejecting unneeded URLs or adding more URLs as they are discovered.

Take a look at the grab scripts in recent Archive Team repositories for examples of clients.

Installation

You will need:

  • Python 2.7
  • Lua
  • Wget with Lua hooks

Typically, you can install these by running:

sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev openssl libssl-dev python-pip make
sudo pip install seesaw

You will also need Wget with Lua. Look into recent repositories for the following script and run it:

./get-wget-lua.sh

The pipeline file

The pipeline file typically includes:

  • Copy and pasted monkey patches
  • A routine to find Wget Lua
  • A version number in the form of YYYYMMDD.NN
  • Tracker hostname
  • Custom Tasks:
    • PrepareDirectories
    • MoveFiles
  • Project information saved into the project variable
  • Instructions on how to deal with the item saved into the pipeline variable
  • An undeclared downloader variable which will be filled in by the Seesaw library

It is important to remember that each Task is a template on how to deal with each Item. Specific item variables should not be stored on a Task, but rather, it should be saved onto the item.

To run a pipeline file, run the command:

run-pipeline pipeline.py YOUR_NICKNAME

For more information, consult the seesaw-kit wiki.


Developer Documentation