Cyberpunkreview.com

Cyberpunkreview.com

A screenshot of the home page taken on 10 April 2012.
URL	http://cyberpunkreview.com
Status	Online!
Archiving status	Not saved yet
Archiving type	Unknown
IRC channel	#archiveteam-bs (on hackint)

Cyberpunkreview.com is a Web site that reviews cyberpunk films, music, art, games and more.

Overview

The site doesn't appear to be inactive. A two month or so hiatus between October and December, but other than that, in good shape. It used to be host to a very popular forum for cyberpunks at http://cyberpunkreview.com/forums/index.php but they recently moved over to a different domain http://cyberpunkforums.com/ that is much more active. I have a penchant for anything cyberpunk and will keep an eye on the site.

It looks like a typical Wordpress instance. Not sure if we have a standard procedure for scraping them or not. Aggroskater 08:10, 19 March 2012 (EDT)

Mirroring

Currently in the process of mirroring with the following command set:

$ wget https://raw.github.com/ArchiveTeam/fortunecity/master/get-wget-warc.sh
$ chmod 0755 get-wget-warc.sh
$ ./get-wget-warc.sh
$ ./wget-warc --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites -e robots=off -w 5 --random-wait --warc-file=cpr http://cyberpunkreview.com/

Please email me or contact me in the #archiveteam channel if there's a better way of doing this. I've basically just taken the guideline from the Software page and applied it to alard's wget build with warc. The larger projects seem to have a more systematic way of grabbing terabytes worth of data. Since this is a relatively small site, I don't know if that's necessary or not. --Aggroskater 06:58, 13 April 2012 (EDT)

Friday April 20 2012 Update

Mirror complete. Appears fully operational on my machine. Warc gz is 385 megabytes in size. What's next? --Aggroskater 02:54, 20 April 2012 (EDT)

Yeah... so it seems the wordpress site is at www.cyberpunkreview.com and not just cyberpunkreview.com. Running the following to grab the blog itself. Shouldn't take nearly as long. The /wiki and /forums work great from the first run though.

./wget-warc --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --exclude-directories=wiki,forums -e robots=off -w 2 --random-wait --warc-file=cpr-wp-fix http://www.cyberpunkreview.com/

Sunday April 22 2012 Update

Running into problems downloading the wordpress portion. Wget keeps segfaulting when converting links. Managed to narrow down a replication case:

$ wget https://raw.github.com/ArchiveTeam/fortunecity/master/get-wget-warc.sh
$ chmod 0755 get-wget-warc.sh
$ sed -i 's/rm -rf \$TARFILE \$TARDIR\///g' get-wget-warc.sh
$ sed -i 's/.\/configure/CFLAGS="-g" .\/configure/' get-wget-warc.sh
$ ./get-wget-warc.sh
$ cd wget-1.13.4-2582/src
$ gdb ./wget

...

Reading symbols from /home/preston/cprwp/wget-1.13.4-2582/src/wget...done.
(gdb) set args --html-extension --page-requisites -k -e robots=off --exclude-directories=wiki,forums --reject "*action=print" -w 1 --random-wait --warc-file=cpr-wp-debug http://www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/
(gdb) run

...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000405d56 in convert_links_in_hashtable (downloaded_set=0x679e10, 
    is_css=0, file_count=0x7fffffffdf8c) at convert.c:127
127	          local_name = hash_table_get (dl_url_file_map, u->url);
(gdb) backtrace
#0  0x0000000000405d56 in convert_links_in_hashtable (downloaded_set=0x679e10, 
    is_css=0, file_count=0x7fffffffdf8c) at convert.c:127
#1  0x0000000000405ead in convert_all_links () at convert.c:189
#2  0x0000000000427a62 in main (argc=14, argv=0x7fffffffe2a8) at main.c:1572
(gdb) print 0x679e10
$1 = 6790672
(gdb) print 0x7fffffffdf8c
$2 = 140737488347020
(gdb) info args
downloaded_set = 0x679e10
is_css = 0
file_count = 0x7fffffffdf8c
(gdb) info locals
local_name = 0x67c6d0 "www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/index.html"
u = 0x0
pi = 0x677b80
urls = 0x6bd050
cur_url = 0x69edd0
url = 0x679b70 "http://www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/"
file = 0x67c8a0 "www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/index.html"
i = 0
cnt = 1
file_array = 0x7fffffffdee0

Oh goody. Null pointers. Or at least, I think that's what I'm looking at. Not sure what to do from here.

Cyberpunkreview.com

Contents

Overview

Mirroring

Friday April 20 2012 Update

Sunday April 22 2012 Update

Navigation menu

Cyberpunkreview.com

Overview

Mirroring

Friday April 20 2012 Update

Sunday April 22 2012 Update

Navigation menu

Search