Wiki-Site

From Archiveteam
Jump to navigation Jump to search
Wiki-Site
A screen shot of the home page taken on 10 April 2012.
A screen shot of the home page taken on 10 April 2012.
URL http://www.wiki-site.com
Status Online!
Archiving status Not saved yet
Archiving type Unknown
IRC channel #wikiteam (on hackint)

Wiki-Site is a wikifarm.

For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis

This farm offers API, but applies very strict throttles and captchas in addition to being quite slow; best results have currently been reached with a --delay of 200 seconds.

If you see yourself in a loop of HTTP errors, stop everything for 3600 seconds or more before restarting.

Do not run uploader.py at the same time! Do not visit or query the website from the same machine at the same time!

Command

Concretely, you can apply something like the following patch and then run while true; do python launcher.py wiki-site.com ; sleep 3600s; done in a screen. Be ready to make more local patches to reduce requests (e.g. to saveIndexPHP() and friends) and work around blocks.

73,77c72,73
<
<     # time.sleep(60)
<     # Uncomment what above and add --delay=60 in the dumpgenerator.py calls below for broken wiki farms
<     # such as editthis.info, wiki-site.com, wikkii (adjust the value as needed;
<     # typically they don't provide any crawl-delay value in their robots.txt).
---
>
>     time.sleep(400)
80c76
<         subprocess.call('./dumpgenerator.py --api=%s --xml --images --resume --path=%s' % (wiki, wikidir), shell=True)
---
>         subprocess.call('./dumpgenerator.py --api=%s --delay=200 --xml --images --resume --path=%s' % (wiki, wikidir), shell=True)
82c78
<         subprocess.call('./dumpgenerator.py --api=%s --xml --images' % wiki, shell=True)
---
>         subprocess.call('./dumpgenerator.py --api=%s --delay=200 --xml --images' % wiki, shell=True)

See also

External links