From Archiveteam
Jump to navigation Jump to search

Archiving vBulletin (tested only with http://boards.cityofheroes.com/, you may have to change some things):

1. Get a recent Wget+Lua version (it should include WARC support).

2. Get the vbulletin.lua script: https://raw.github.com/ArchiveTeam/cityofheroes-grab/master/vbulletin.lua

3. Collect the forum IDs (the f= parameter in the urls) of forums and subforums. Some pages have a "Forum Jump" dropdown list that gives you the numbers.

Run Wget with the Lua script and seed it with the forum URLs. Start with the URL to /external.php?type=RSS2 to get a session cookie (having a session cookie is necessary to remove the session ID from the URLs).

The Lua script will navigate the forum pages: it will follow pagination links, go from forumdisplay to threads, from threads to posts and members. Use --page-requisites and --span-hosts to get the images. When preparing the seed URLs, be aware that the Lua script only crawls from forum to thread to post/member. It does not, for example, jump from one forum to the other or from a thread back to the forum.

For example, this works for the City of Heroes forums:

./wget-lua \
      -U "$USER_AGENT" \
      -nv \
      -o wget.log \
      --directory-prefix files/ \
      --keep-session-cookies \
      --save-cookies cookies.txt \
      --force-directories \ 
      --adjust-extension \
      -e "robots=off" \
      --page-requisites --span-hosts \
      --lua-script vbulletin.lua \
      --timeout 10 \
      --tries 3 \
      --waitretry 5 \
      --warc-file forum \
      --warc-header "operator: Archive Team" \
      "http://boards.cityofheroes.com/external.php?type=RSS2" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=547" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=569" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=660" \

Here is a list of some old forums (many which are vBulletin): http://web.archive.org/web/20061229181451/http://rankings.big-boards.com/?p=all

A very trivial way to archive vBulletin forums (with recent vBulletin software) is to just run a single for loop across all the posts. E.g. run a for loop on Physics Forums from https://www.physicsforums.com/posts/1 to https://www.physicsforums.com/posts/5223616.

See also