https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Mutoso&feedformat=atomArchiveteam - User contributions [en]2024-03-28T09:41:42ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Yuku.com&diff=25635Yuku.com2016-05-14T06:02:55Z<p>Mutoso: </p>
<hr />
<div>{{Infobox project<br />
| title = Yuku.com<br />
| logo = <br />
| image = Www.yuku.com_screencapture.png<br />
| description = <br />
| URL = http://yuku.com<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| source = [https://github.com/ArchiveTeam/yuku-grab yuku-grab]<br />
| tracker = [http://tracker.archiveteam.org/yuku yuku]<br />
| irc = archiveteam<br />
}}<br />
<br />
Yuku is an Internet forum site that allows users to generate forums that are a subdomain of yuku.com. Originally brought to ArchiveTeam's attention with [[The Classic Horror Film Board]]. <br />
<br />
==Structure==<br />
===Forums===<br />
Forums are separated by subdomains (example.yuku.com) and subforums are sequential and accessible via example.yuku.com/forums/<forum number>, topics are also sequential and can be accessed via example.yuku.com/topic/<topic number>. Pages within topics are indicated like: example.yuku.com/topic/56108/?page=2, topics can also be accessed via rss through example.yuku.com/feed/get/type/rss/source/lead/id/<topic number><br />
<br />
===Users===<br />
Users can choose their own name, profiles are viewed via <username>.u.yuku.com or <username>.<forum name>.yuku.com. The two may be different depending on how the user registered or if the registration system changed.<br />
<br />
===Images===<br />
All images hosted by yuku are shown in their [http://images.yuku.com.s3.amazonaws.com S3 bucket listing], and a complete scrape with object key and size can be found [https://archive.org/details/images.yuku.com.s3.amazonaws.com here].<br />
<br />
=== Items ===<br />
* Scrape Google [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/googlescrape-parsed.txt Parsed] [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/googlescrape-raw.txt Raw]<br />
* Scrape Bing [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/bing/bingscrape-parsed.txt Parsed] [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/bing/bingscrape-raw.txt Raw] (Extracted from the first 20000 pages)<br />
* TODO: Scrape DuckDuckGo [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/ddg/ddgscrape-parsed.txt Parsed] [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/ddg/ddgscrape-raw.txt Raw] ([https://duckduckgo.com/?q=yuku.com Didn't return many results])<br />
* TODO: Scrape Twitter<br />
* TODO: Scrape Reddit<br />
* TODO: Scrape links from MediaWiki wikis<br />
* TODO: Scrape the Open Directory Project<br />
* TODO: Scrape the Common Crawl Index<br />
* TODO: Scrape the Wayback Machine<br />
* TODO: Scrape URLTeam dumps<br />
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)<br />
* pentest-tools.com Subdomain search [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/subscan.txt Parsed]<br />
<br />
{{Navigation box}}</div>Mutosohttps://wiki.archiveteam.org/index.php?title=Yuku.com&diff=25634Yuku.com2016-05-14T03:47:20Z<p>Mutoso: </p>
<hr />
<div>{{Infobox project<br />
| title = Yuku.com<br />
| logo = <br />
| image = Www.yuku.com_screencapture.png<br />
| description = <br />
| URL = http://yuku.com<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| source = [https://github.com/ArchiveTeam/yuku-grab yuku-grab]<br />
| tracker = [http://tracker.archiveteam.org/yuku yuku]<br />
| irc = archiveteam<br />
}}<br />
<br />
Yuku is an Internet forum site that allows users to generate forums that are a subdomain of yuku.com. Originally brought to ArchiveTeam's attention with [[The Classic Horror Film Board]]. <br />
<br />
==Structure==<br />
===Forums===<br />
Forums are separated by subdomains (example.yuku.com) and subforums are sequential and accessible via example.yuku.com/forums/<forum number>, topics are also sequential and can be accessed via example.yuku.com/topic/<topic number>. Pages within topics are indicated like: example.yuku.com/topic/56108/?page=2, topics can also be accessed via rss through example.yuku.com/feed/get/type/rss/source/lead/id/<topic number><br />
<br />
===Users===<br />
Users can choose their own name, profiles are viewed via <username>.u.yuku.com or <username>.<forum name>.yuku.com. The two may be different depending on how the user registered or if the registration system changed.<br />
<br />
===Images===<br />
All images hosted by yuku are shown in their [http://images.yuku.com.s3.amazonaws.com S3 bucket listing], and a complete scrape with object key and size can be found [https://archive.org/details/images.yuku.com.s3.amazonaws.com here].<br />
<br />
=== Items ===<br />
* Scrape Google [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/googlescrape-parsed.txt Parsed] [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/googlescrape-raw.txt Raw]<br />
* Scrape Bing [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/bing/bingscrape-parsed.txt Parsed] [https://raw.githubusercontent.com/mutoso/yuku-scrape/master/bing/bingscrape-raw.txt Raw] (Extracted from the first 20000 pages)<br />
* TODO: Scrape DuckDuckGo<br />
* TODO: Scrape Twitter<br />
* TODO: Scrape Reddit<br />
* TODO: Scrape links from MediaWiki wikis<br />
* TODO: Scrape the Open Directory Project<br />
* TODO: Scrape the Common Crawl Index<br />
* TODO: Scrape the Wayback Machine<br />
* TODO: Scrape URLTeam dumps<br />
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)<br />
* pentest-tools.com Subdomain search [https://raw.githubusercontent.com/chpwssn/yuku-discovery/master/subscan.txt Parsed]<br />
<br />
{{Navigation box}}</div>Mutosohttps://wiki.archiveteam.org/index.php?title=Canv.as&diff=18810Canv.as2014-03-24T10:59:36Z<p>Mutoso: </p>
<hr />
<div>{{Infobox project<br />
| title = Canvas Networks<br />
| logo = Canvas-beta-logo-medium.png<br />
| image = Canvas homepage screenshot.png<br />
| URL = http://canv.as<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = canvas<br />
| source = https://github.com/ArchiveTeam/canvas-grab<br />
| tracker = http://tracker.archiveteam.org/canvas/<br />
}}<br />
<br />
'''Canvas Networks''' was a website centered on sharing and remixing media, particularly images. The website was founded by the owner of 4chan, Christopher Poole, and backed by Andreessen Horowitz, SV Angel, Lerer Ventures, Founder Collective, and Joshua Schachter. It closed in 2014.<ref>https://en.wikipedia.org/wiki/Canvas_Networks</ref><br />
<br />
== Site Structure == <br />
<br />
* <nowiki>uggcf://pnai.nf/fgngvp/nepuvir_guernq_yvaxf.ugzy</nowiki><br />
<br />
== How can I help? ==<br />
<br />
The project is not in the [[warrior]] yet.<br />
<br />
You can run the [https://github.com/ArchiveTeam/canvas-grab scripts manually].<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
== External Links ==<br />
<br />
* {{w|Canvas Networks}}</div>Mutosohttps://wiki.archiveteam.org/index.php?title=Starwars.yahoo.com&diff=18721Starwars.yahoo.com2014-02-28T01:33:35Z<p>Mutoso: Fixed broken link</p>
<hr />
<div>{{Infobox project<br />
| title = STARWARS.YAHOO.COM<br />
| image = Starwarszombies1.jpg<br />
| description = <br />
| URL = http://starwars.yahoo.com<br />
| project_status = {{offline}}<br />
| archiving_status = {{saved}}<br />
}}<br />
<br />
'''STARWARS.YAHOO.COM''' was a Star Wars themed user site, containing forums, user pages, and general movie information. Announced as a beta in October of 2007, it successfully suckered many users into posting, uploading, and sharing information on it before it was summarily shut off by Yahoo with a 30 day notice in December of 2009. On Dec 15, 2009, Yahoo shut the site down completely and redirected all queries to starwars.com. <br />
<br />
''Archive Team'' sprung into action and saved 99.9% of /users/, /forums/, /links/, and a smaller percentage of the other content on the site. (Most of the general star wars information was duplicated on starwars.com anyway - we focused our efforts on user-generated content.)<br />
<br />
Problems encountered:<br />
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this.<br />
** TOR (slow as molasses, but worked) - collected using httrack<br />
** multiple IPs (fast, but needs large IP resources) - collected using wget<br />
<br />
The tarballs in the archive reflect both archiving methods:<br />
-rw-r--r-- 1 root root 228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2<br />
-rw-r--r-- 1 root root 36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2<br />
<br />
This archive of the collected remnants is located [https://archive.org/details/archiveteam-starwars-yahoo here].<br />
<br />
[[Image:Ywosw-200x200.jpg]]<br />
''Just Kidding! Starwars.yahoo.com, 2007-2009''</div>Mutoso