Dev/Seesaw
Writing a Seesaw Client
The Seesaw client is a specific set of tasks that must be done within an item. Think of it as a template of instructions. Typically, the file is called pipeline.py. The pipeline file uses the Seesaw Library.
The pipeline file will typically use Wget with Lua scripting. The Lua script provided as an argument to Wget within the pipeline file. It controls fine grain operations within Wget such as rejecting unneeded URLs or adding more URLs as they are discovered.
Take a look at the grab scripts in recent Archive Team repositories for examples of clients.
Installation
You will need:
- Python 2.7
- Lua
- Wget with Lua hooks
Typically, you can install these by running:
sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev openssl libssl-dev python-pip make sudo pip install seesaw
You will also need Wget with Lua. Look into recent repositories for the following script and run it:
./get-wget-lua.sh
The pipeline file
The pipeline file typically includes:
- Copy and pasted monkey patches
- A routine to find Wget Lua
- A version number in the form of
YYYYMMDD.NN
- Tracker hostname
- Custom Tasks:
- PrepareDirectories
- MoveFiles
- Project information saved into the
project
variable - Instructions on how to deal with the item saved into the
pipeline
variable - An undeclared
downloader
variable which will be filled in by the Seesaw library
It is important to remember that each Task is a template on how to deal with each Item. Specific item variables should not be stored on a Task, but rather, it should be saved onto the item.
To run a pipeline file, run the command:
run-pipeline pipeline.py YOUR_NICKNAME
For more information, consult the seesaw-kit wiki.