Difference between revisions of "User:Djsmiley2k"
Djsmiley2k (talk | contribs) |
(→Build your own EC2 ami/instance: https://aws.amazon.com/free/faqs/) |
||
(14 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==Stuff== | |||
* Need to figure full wiki/site layout - currently everything giant missmash | * Need to figure full wiki/site layout - currently everything giant missmash | ||
Line 7: | Line 7: | ||
** Can we get some templates for projects (what is a project!?) / archive tasks / other crap | ** Can we get some templates for projects (what is a project!?) / archive tasks / other crap | ||
== Generic Wget command == | |||
export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" | |||
export SAVE_HOST="" | |||
export WARC_NAME="" | |||
wget \ | |||
-e robots=off --mirror --page-requisites \ | |||
--waitretry 5 --timeout 60 --tries 5 --wait 1 \ | |||
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \ | |||
-U "$USER_AGENT" "$SAVE_HOST" | |||
=== Forum Grab === | |||
<pre>src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login | |||
src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum" | |||
</pre> | |||
== Limit Warrior b/w == | |||
<pre>VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3</pre> | |||
Must be done while VM is powered off - can't be done with saved state. :( | |||
==Remote warrior control== | |||
Either ssh forward to local system: | |||
<pre>ssh -L 8001:localhost:8001 tim.bowers@xxx.xxx.xxx.xxx -f -N </pre> | |||
OR | |||
<pre>curl -d "project_name=punchfork" http://localhost:8001/api/select-project</pre> | |||
New Versions | ==New Versions== | ||
[[djsmiley2k/main_page|main page]] | [[djsmiley2k/main_page|main page]] | ||
== Build your own EC2 ami/instance == | |||
select which ever instance type you want - this is built out on ubuntu 13.04/lowest tier ([https://aws.amazon.com/free/faqs/ free!]) | |||
login (on ubuntu you login as ubuntu) via ssh | |||
Firstly we need to setup the basic system | |||
<pre>sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev git-core openssl libssl-dev python-pip rsync gcc make git screen</pre> | |||
Then we need the seesaw kit, which is used for the grabbing parts | |||
<pre>sudo git clone https://github.com/ArchiveTeam/seesaw-kit.git | |||
cd ./seesaw-kit | |||
sudo pip install -r requirements.txt | |||
</pre> | |||
Now we move onto the project specific stuff, for xanga we'd do: | |||
<pre>cd.. | |||
sudo git clone https://github.com/ArchiveTeam/xanga-grab.git | |||
cd ./xanga-grab | |||
./get-wget-lua.sh ### building wget-lua | |||
</pre> | |||
And finally, we start the pipeline in a screensession | |||
<pre>screen ../seesaw-kit/run-pipeline --concurrent 3 pipeline.py YOURNICKNAME</pre> | |||
==Important URLs== | |||
debian-squeeze-i386-warrior (ami- | [http://isup.me/fos.textfiles.com Is the rsync host up?] | ||
==EC2 Instance setups== | |||
debian-squeeze-i386-warrior (ami-9c69f1f5) | |||
User Text: {"downloader": "Smiley", "selected_project": "posterous", "concurrent_items": "6", "shared:rsync_threads": "4"} | User Text: {"downloader": "Smiley", "selected_project": "posterous", "concurrent_items": "6", "shared:rsync_threads": "4"} | ||
Line 30: | Line 87: | ||
Open port 22 0.0.0.0/0 | Open port 22 0.0.0.0/0 | ||
Setup SSH forwarding: ssh -i ./.ssh/amazonkey.pem -N -f -L 8002:localhost:8001 | Setup SSH forwarding: ssh -i ./.ssh/amazonkey.pem -N -f -L 8002:localhost:8001 ubuntu@***********.compute-1.amazonaws.com | ||
Set automatic shutdown : echo "0 20 * * * root /sbin/shutdown -h now" | sudo tee /etc/cron.d/shutdown | Set automatic shutdown : echo "0 20 * * * root /sbin/shutdown -h now" | sudo tee /etc/cron.d/shutdown | ||
== Digital Ocean == | |||
sign up for DO -> use SSDTWEET code -> make a $10 payment -> unleash 500 instances upon the world | |||
<pre>apt-get update && apt-get -y install git make python-pip libgnutls-dev liblua5.1-dev && pip install seesaw && git clone https://github.com/ArchiveTeam/yahoomessages-grab.git && cd yahoomessages-grab/ && ./get-wget-lua.sh && run-pipeline pipeline.py --disable-web-server Smiley</pre> |
Latest revision as of 10:41, 21 October 2013
Stuff
- Need to figure full wiki/site layout - currently everything giant missmash
- Will set fire to anyone who breaks the nice design changes
- While html in pages can make them look "nice" its ****ing annoying to try and edit nicely if your not a html expert - look into converting into proper mediawiki mark up instead
- Can we get some templates for projects (what is a project!?) / archive tasks / other crap
Generic Wget command
export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" export SAVE_HOST="" export WARC_NAME=""
wget \ -e robots=off --mirror --page-requisites \ --waitretry 5 --timeout 60 --tries 5 --wait 1 \ --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \ -U "$USER_AGENT" "$SAVE_HOST"
Forum Grab
src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum"
Limit Warrior b/w
VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3
Must be done while VM is powered off - can't be done with saved state. :(
Remote warrior control
Either ssh forward to local system:
ssh -L 8001:localhost:8001 tim.bowers@xxx.xxx.xxx.xxx -f -N
OR
curl -d "project_name=punchfork" http://localhost:8001/api/select-project
New Versions
Build your own EC2 ami/instance
select which ever instance type you want - this is built out on ubuntu 13.04/lowest tier (free!)
login (on ubuntu you login as ubuntu) via ssh
Firstly we need to setup the basic system
sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev git-core openssl libssl-dev python-pip rsync gcc make git screen
Then we need the seesaw kit, which is used for the grabbing parts
sudo git clone https://github.com/ArchiveTeam/seesaw-kit.git cd ./seesaw-kit sudo pip install -r requirements.txt
Now we move onto the project specific stuff, for xanga we'd do:
cd.. sudo git clone https://github.com/ArchiveTeam/xanga-grab.git cd ./xanga-grab ./get-wget-lua.sh ### building wget-lua
And finally, we start the pipeline in a screensession
screen ../seesaw-kit/run-pipeline --concurrent 3 pipeline.py YOURNICKNAME
Important URLs
EC2 Instance setups
debian-squeeze-i386-warrior (ami-9c69f1f5)
User Text: {"downloader": "Smiley", "selected_project": "posterous", "concurrent_items": "6", "shared:rsync_threads": "4"}
Add second disk - 10Gb
Open port 22 0.0.0.0/0
Setup SSH forwarding: ssh -i ./.ssh/amazonkey.pem -N -f -L 8002:localhost:8001 ubuntu@***********.compute-1.amazonaws.com
Set automatic shutdown : echo "0 20 * * * root /sbin/shutdown -h now" | sudo tee /etc/cron.d/shutdown
Digital Ocean
sign up for DO -> use SSDTWEET code -> make a $10 payment -> unleash 500 instances upon the world
apt-get update && apt-get -y install git make python-pip libgnutls-dev liblua5.1-dev && pip install seesaw && git clone https://github.com/ArchiveTeam/yahoomessages-grab.git && cd yahoomessages-grab/ && ./get-wget-lua.sh && run-pipeline pipeline.py --disable-web-server Smiley