Difference between revisions of "User:Djsmiley2k"
Djsmiley2k (talk | contribs) |
Djsmiley2k (talk | contribs) |
||
Line 62: | Line 62: | ||
Now we move onto the project specific stuff, for xanga we'd do: | Now we move onto the project specific stuff, for xanga we'd do: | ||
<pre>sudo git clone https://github.com/ArchiveTeam/xanga-grab.git | <pre>cd.. | ||
sudo git clone https://github.com/ArchiveTeam/xanga-grab.git | |||
cd ./xanga-grab | cd ./xanga-grab | ||
./get-wget-lua.sh ### building wget-lua | ./get-wget-lua.sh ### building wget-lua |
Revision as of 09:39, 25 June 2013
Stuff
- Need to figure full wiki/site layout - currently everything giant missmash
- Will set fire to anyone who breaks the nice design changes
- While html in pages can make them look "nice" its ****ing annoying to try and edit nicely if your not a html expert - look into converting into proper mediawiki mark up instead
- Can we get some templates for projects (what is a project!?) / archive tasks / other crap
Generic Wget command
export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" export SAVE_HOST="" export WARC_NAME=""
wget \ -e robots=off --mirror --page-requisites \ --waitretry 5 --timeout 60 --tries 5 --wait 1 \ --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \ -U "$USER_AGENT" "$SAVE_HOST"
Forum Grab
src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum"
Limit Warrior b/w
VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3
Must be done while VM is powered off - can't be done with saved state. :(
Remote warrior control
Either ssh forward to local system:
ssh -L 8001:localhost:8001 tim.bowers@xxx.xxx.xxx.xxx -f -N
OR
curl -d "project_name=punchfork" http://localhost:8001/api/select-project
New Versions
Build your own EC2 ami/instance
select which ever instance type you want - this is built out on ubuntu 13.04/lowest tier (free!)
login (on ubuntu you login as ubuntu) via ssh
Firstly we need to setup the basic system
sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev git-core openssl libssl-dev python-pip rsync gcc make git screen
Then we need the seesaw kit, which is used for the grabbing parts
sudo git clone https://github.com/ArchiveTeam/seesaw-kit.git cd ./seesaw-kit sudo pip install -r requirements.txt
Now we move onto the project specific stuff, for xanga we'd do:
cd.. sudo git clone https://github.com/ArchiveTeam/xanga-grab.git cd ./xanga-grab ./get-wget-lua.sh ### building wget-lua
And finally, we start the pipeline in a screensession
screen ../seesaw-kit/run-pipeline --concurrent 3 pipeline.py YOURNICKNAME
Important URLs
EC2 Instance setups
debian-squeeze-i386-warrior (ami-9c69f1f5)
User Text: {"downloader": "Smiley", "selected_project": "posterous", "concurrent_items": "6", "shared:rsync_threads": "4"}
Add second disk - 10Gb
Open port 22 0.0.0.0/0
Setup SSH forwarding: ssh -i ./.ssh/amazonkey.pem -N -f -L 8002:localhost:8001 ubuntu@***********.compute-1.amazonaws.com
Set automatic shutdown : echo "0 20 * * * root /sbin/shutdown -h now" | sudo tee /etc/cron.d/shutdown
Digital Ocean
sign up for DO -> use SSDTWEET code -> make a $10 payment -> unleash 500 instances upon the world
apt-get update && apt-get -y install git make python-pip libgnutls-dev liblua5.1-dev && pip install seesaw && git clone https://github.com/ArchiveTeam/yahoomessages-grab.git && cd yahoomessages-grab/ && ./get-wget-lua.sh && run-pipeline pipeline.py --disable-web-server Smiley