Difference between revisions of "Wallbase"

From Archiveteam
Jump to navigation Jump to search
(Removed dead link (404))
Line 34: Line 34:


User arkiver is currently scraping as much as he can, and user godane has done a small portion that is available on archive.org [https://archive.org/details/wallpapers.wallbase.cc-rozne-wallpaper-jpg-1-to-100000-20140130 here]. Does anyone plan to implement a see-saw instance and tracker? Grabbing gigs and gigs of images can add up pretty quickly; going by the work done thus far, the total backup looks to be about 300GB.
User arkiver is currently scraping as much as he can, and user godane has done a small portion that is available on archive.org [https://archive.org/details/wallpapers.wallbase.cc-rozne-wallpaper-jpg-1-to-100000-20140130 here]. Does anyone plan to implement a see-saw instance and tracker? Grabbing gigs and gigs of images can add up pretty quickly; going by the work done thus far, the total backup looks to be about 300GB.
User H2u has the first 100k (excluding those that were removed from wallbase, which is most) including URLs and NSFW images available [https://www.dropbox.com/sh/eeo0eju89pcxmjz/xtH3rCoetu here] temporarily.


==Site Specifics==
==Site Specifics==

Revision as of 20:12, 24 September 2014

Wallbase
Wallbase logo
wallpaper repository
wallpaper repository
URL http://wallbase.cc
Status Online!
Archiving status In progress...
Archiving type Unknown
IRC channel #archiveteam (on hackint)

wallbase.cc is a store of wallpapers and other high-resolution media typically scraped from chans' /hr, /wg, and /w boards.

WALL YOUR BASE ARE BELONG TO US. I'm... I'm sorry. I'll see myself out.

Overview

Apparently, wallbase's forums have gone down. They are indeed unavailable at http://wallbase.cc/forum. This has prompted some concern that the website may too fade. So far, there has been no announcement of a shut-down. However, the site's owner ("Yotoon") is MIA according to the #wallbase twitter account (that is apparently run by staff) and the upload function has been disabled. See the following:

The staff are working on a project called WallHaven that might serve as a replacement. I don't know what the staff have access to, though presumably, they would have all of the metadata themselves. Should extra metadata be grabbed in our scrape?

Work thus far

Page seems to be gone? (2014-09-09)

User pluesch has all images (except the first 10k), all categories (category_id;category_name), all tags (tag_id;tag_name) and all image to tag relations (image_id;tag_id1;[...]). He will make it available somewhere as soon as possible.

THIS WILL BE A PROJECT, SCRIPTS ARE BEING WORKED ON. THIS WIKI WILL BE CHANGED VERY SOON.

So far, there is no repo on github, someone should change that. There is a small ruby script available on the discussion page, however.

User arkiver is currently scraping as much as he can, and user godane has done a small portion that is available on archive.org here. Does anyone plan to implement a see-saw instance and tracker? Grabbing gigs and gigs of images can add up pretty quickly; going by the work done thus far, the total backup looks to be about 300GB.

Site Specifics

The domain implements rate limiting. From my own experience, I seem to recall it being picky about the referrer header at one point too, but a brief examination seems to indicate that may no longer be the case.

Archives

Pluesch's Siterip

If you have enough space, please mirror this. I don't want to be the only one with the data.

Please make your mirrors public available if possible.

  • Size: ~1.2 TB

This siterip contains all wallpapers and some meta information.

The meta folder includes:

  • all categories (category_id,category_name)
  • all tags (tag_id,tag_name)
  • all tag to category relations (tag_id,category_id1,[...])
  • all image to tag relations (image_id,tag_id1,[...])

Images 1 to 999 are missing. If you have them, please contact me.