https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Archive+Maniac&feedformat=atomArchiveteam - User contributions [en]2024-03-28T09:57:48ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=User_talk:Antonizoon&diff=24461User talk:Antonizoon2015-10-22T02:58:25Z<p>Archive Maniac: Replied</p>
<hr />
<div>== 4chan Uploads on IA ==<br />
Uh, hi, I used to be a part of the ArchiveTeam but now I just preserve things all on my own. You're a part of Bibliotheca Anonoma, right? Yeah, well, I've been helping upload some 4chan (and even 8chan) stuff on to the Internet Archive, like YouTube videos, MediaFire uploads, WARC crawls, etc. You might be interested in the "Everything You Need To Know Ever" upload I recently did [https://archive.org/details/EverythingYouNeedToKnowEver here]. Also excuse me for this but I also uploaded Bibliotheca Anonoma's Google Drive account [https://archive.org/details/googledrive-BibliothecaAnonoma here].<br />
<br />
I'm saying all of this thinking that you may be interested that I've been indirectly helping you guys. I also re-uploaded the archive.moe /sp/ archive at one point because users reported that your upload was broken. See [https://archive.org/details/archive.moe-sp here]. [[User:Archive Maniac|Archive Maniac]] 18:27, 14 October 2015 (EDT)<br />
<br />
:Awesome, thanks for inviting me. I can a little bit of regex, but I can't code. I'm an expert at searching online, though. [[User:Archive Maniac|Archive Maniac]] 22:58, 21 October 2015 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Antonizoon&diff=24427User talk:Antonizoon2015-10-14T22:27:01Z<p>Archive Maniac: Messaged</p>
<hr />
<div>== 4chan Uploads on IA ==<br />
Uh, hi, I used to be a part of the ArchiveTeam but now I just preserve things all on my own. You're a part of Bibliotheca Anonoma, right? Yeah, well, I've been helping upload some 4chan (and even 8chan) stuff on to the Internet Archive, like YouTube videos, MediaFire uploads, WARC crawls, etc. You might be interested in the "Everything You Need To Know Ever" upload I recently did [https://archive.org/details/EverythingYouNeedToKnowEver here]. Also excuse me for this but I also uploaded Bibliotheca Anonoma's Google Drive account [https://archive.org/details/googledrive-BibliothecaAnonoma here].<br />
<br />
I'm saying all of this thinking that you may be interested that I've been indirectly helping you guys. I also re-uploaded the archive.moe /sp/ archive at one point because users reported that your upload was broken. See [https://archive.org/details/archive.moe-sp here]. [[User:Archive Maniac|Archive Maniac]] 18:27, 14 October 2015 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=24268Audit20142015-10-02T02:22:21Z<p>Archive Maniac: /* WARC */ Added to bulleted list.</p>
<hr />
<div>We've uploaded a bunch of stuff: <br />
*[https://archive.org/search.php?query=subject:archiveteam subject:archiveteam] = 8,915 items<br />
*[https://archive.org/search.php?query=collection:archiveteam collection:archiveteam] = 41,946 items<br />
*[https://archive.org/search.php?query=NOT%20collection%3A%28archiveteam%29%20AND%20subject%3A%28archiveteam%29 subject:archiveteam AND NOT collection:archiveteam] = 1,561<br />
<br />
(The 3rd one should eventually be close to empty.)<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== High-level Collections ==<br />
* https://archive.org/details/web <br />
** https://archive.org/details/archiveteam<br />
*** https://archive.org/details/archiveteam-fire<br />
*** https://archive.org/details/archivebot<br />
** https://archive.org/details/wikiteam<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category (earbits)]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || 1070 || Sort of: [http://archive.fart.website/archivebot/viewer/ Viewer] || [[ArchiveBot]]; The viewer doesn't seem to index into crawls; there's no link from the collection or the items to the viewer (or anywhere else)<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || 53 || || See [[Splinder]]<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://archive.org/download/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || [[User:JesseW|JesseW]] || 2867 || || They appear to all be WARCs, most uploaded on the same day; it's not clear if all of them are in the Wayback Machine or not. Each item has no description or context.<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || 517 || || [[Hyves]]<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || 2163 || || [[Wretch]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || 454 || || [[Xanga]]; WARCs<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || 41 || || [[Twitter]] According to reviews, at least one file is empty.<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || 223 || || These are tarballs (less than 100 MBs, usually), containing each paste in a separate file. Most recently updated on July 1, 2014<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || 19 || || [[Zapd]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || 38 || || [[Patch]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || 444 || || [[Posterous]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || 368 || || [[Google Reader]]; 3 categories of WARCs: Directory, Stats & general. It would probably be good to also put them in separate collections. There is also a [https://archive.org/details/archiveteam_greaderstats_combined combined stats item].<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || 81 || || [[IGN]] (needs link to archive); Each item contains a particular subdomain. Descriptive names. ([https://archive.org/details/primeblog.ign.com primeblog.ign.com item] needs to be added to ''archiveteam'' and ''web'' collections)<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || 74 || || ARCs from [[wikipedia:G4 (TV channel)]], mainly from the forum<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || 156 || || [[Yahoo! Video]]; various inconsistency in naming and categories; some items contain [https://archive.org/details/ARCHIVETEAM-YV-4790761-4799994 zip files], while others contain [https://archive.org/details/ARCHIVETEAM-YV-04980027-04983272 tar files].<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || 137 || Maybe -> [https://archive.org/details/archiveteam-friendster-index archiveteam-friendster-index] item || [[Friendster]]; early (2011) project, variety of formats<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || 1477 || || [[Formspring]]; WARCs; some duplication in collection description<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || 17 || || [[Yahoo! Messages]]; WARCs; Minimal description on collection, none on items<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || 47 || [https://archive.org/download/archiveteam_punchfork_index/index.html Yes] || [[Punchfork]]; Needs link to index from collection description (and item descriptions); three different types of items, unclear differences<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || 10 || || WARCs; no item descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || 20 || No || [[Cinch.fm]]; 10 items, in both WARC and tar formats<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || 203 || [https://archive.org/download/dailybooth-freeze-frame-index/index.html Yes] || [[DailyBooth]]; link to index on collection page needs adjusting; images seem to be downloadable; individual items lack descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || 26 || No || [[Weblog.nl]]; no English-language description<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || 790 || || Videos from [[wikipedia:Stage6]]; many seem to be unavailable from IA, due to "issues with the item's content."<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || 27 || No || [[Google Groups]]; each item contains a single tar file (ranging in size from 300 MB to over 40 GB); the tar files contain separate zip files for each group; the zip files the actual files. This should probably be grouped with the other grabs of Google Groups.<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || 8 || No || WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || 7 || No || Many small WARCs in each item; lengthy explanation in collection description, none in each item<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || 2459 || No || [[Webshots]]; WARCs<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || 1806 || Maybe: [https://archive.org/details/tabblo-archive-groups groups] item || [[Tabblo]]; 9 MegaWARCs, the rest of the items are groups of indiviual accounts as zip files<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || [https://archive.org/details/archiveteam-fortunecity-list Yes] || 55 || || [[FortuneCity]]; 26 "Set" items (containing a single large tar in each one); also 26 WARC items, and one leftovers item<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || Nemo || 148 || Not really || Should become a subcollection of "wikicollections", so that it's next to "wikimediacommons". The "remote" tarballs partially overlap with xowa items nowadays. If a complete mirror of the Your.Org tarballs is desired, we should list it at [https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs] with some maintenance information. It's not clear whether investing N TB at IA is a priority here, nor whether IA expects WikiTeam to do the uploads instead (in that case, ask Hydriz or Arkiver). Also, the Your.Org dumps are currently blocked on the lack of a rsync server on Wikimedia servers.<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || 39 || || [[AnyHub]]; 18 each WARC & tar items, and one called the "Blue Collection" <br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || 675 || || [[FilePlanet]]<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || 52 || || <br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || 12 || || [[Geocities]]<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || 7135 || || A vast and misc. collection; needs quite a bit of TLC ; ([http://archive.org/details/www.asiatorrents.me-subtitle-1-to-38406-20141205 www.asiatorrents.me-subtitle-1-to-38406-20141205 item] needs to be added to the ''archiveteam'', and ''web'' collections)<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || 383 || || Each item is a separate podcast, containing indvidual sound files, playable through the IA interface; there is also a [https://archive.org/download/archiveteam-mypodcast-dataonly misc] item<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || [[User:JesseW|JesseW]] || 1,348 || Partial (each item has a list of groups, but there's no overall list) || [[Google Groups]]; This is divided into items by the initial two letters (or digits or underscore). The item for "[https://archive.org/details/archiveteam-googlegroups-th th]" has an inconsistent title and category.<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || 3 || No || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category (streetfiles)]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || 30 || No || [[Yahoo! Voices]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || 2213 || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] || [[Twitch.tv]]<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || 40 || || [[Fotopedia]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || 47 || || [[Canv.as]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || 82 || || [[Ancestry.com]]; WARCs<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== Collections ====<br />
* http://archive.org/details/archiveteam_atomicgamer<br />
* http://archive.org/details/archiveteam_layervault<br />
* http://archive.org/details/archiveteam_madden<br />
* http://archive.org/details/archiveteam_tele2<br />
* http://archive.org/details/archiveteam_viddler<br />
* http://archive.org/details/archiveteam_friendfeed<br />
* http://archive.org/details/archiveteam_furaffinity<br />
* http://archive.org/details/archiveteam_lastfm<br />
* http://archive.org/details/archiveteam_toshibadocs<br />
<br />
(The items within them also need to be added to the ''archiveteam'', and ''web'' collections.)<br />
<br />
==== WARC ====<br />
<br />
<br />
* Anything under https://archive.org/search.php?query=subject%3A%22warcarchives%22<br />
* https://archive.org/details/fenopy-se-fire-grab-2014-12-30-16-38-13<br />
* https://archive.org/details/netszar_com_2015_06<br />
* https://archive.org/details/swipnet-searchengine-crawl-nonrecursive<br />
* https://archive.org/details/swipnet-searchengine-crawl-recursive<br />
* https://archive.org/details/kajaszoszentpeter_hu_2015_06<br />
* https://archive.org/details/warc-hallofshame.gp.co.at<br />
* https://archive.org/details/warc-freakedenough.at<br />
* https://archive.org/details/nintendoukkidsclub-20150608.warc<br />
* https://archive.org/details/warc-9chin<br />
* https://archive.org/details/warcarchive-www.bun23.com<br />
* https://archive.org/details/warchive-www.sotipro.com<br />
* https://archive.org/details/files.hii-tech.com-warc<br />
* https://archive.org/details/www.synthfool.com<br />
* https://archive.org/details/wwwbiologyarizonaedu<br />
* https://archive.org/details/studionyami-com_penfifteen-2012-03-05<br />
* https://archive.org/details/fybertech<br />
* https://archive.org/details/wwwclarkuedu-djoyce-trig-20150608.warc<br />
<br />
==== FTP ====<br />
* https://archive.org/details/2014.0102.mail.digipro.rs<br />
* https://archive.org/details/2014.12.ftp.dlink.biz_201501<br />
* https://archive.org/details/2015.01.12.ftp.sunet.sePubOpenBSD<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
* https://archive.org/details/mail.google.com-saved-1Oct2014<br />
* https://archive.org/details/Data2September2013.tar (Gunnerkrigg Court homepage comments snapshots)<br />
* https://archive.org/details/fotodisco-raw-items<br />
* https://archive.org/details/qwikidisco-raw-items<br />
* https://archive.org/details/twitpicdisco-raw-items<br />
* https://archive.org/details/maemo-fremantle-ovi<br />
* https://archive.org/details/toontown_infinite_github_20150103<br />
* https://archive.org/details/amplicate_sitemaps_20140218<br />
* https://archive.org/details/twitch-raw-items<br />
* https://archive.org/details/actionbutton_mini.tar<br />
* https://archive.org/details/ageofnerds_mini<br />
* https://archive.org/details/2015feb06a07FuturamerlinAList<br />
* https://archive.org/details/worldpeacehaven_gmail_Xaa<br />
* https://archive.org/details/worldpeacehaven_gmail_Xab<br />
* https://archive.org/details/2015feb02ob<br />
* https://archive.org/details/2014dec09spe2<br />
* https://archive.org/details/bigougit_mini_v2<br />
* https://archive.org/details/galman33_mini<br />
* https://archive.org/details/urls2015dec02n2<br />
* https://archive.org/details/493nfos<br />
* https://archive.org/details/archiveteam_dev_env_v1_appliances<br />
* https://archive.org/details/Kazbeg_Panorama.jpg -- If tags can be edited by non-owners, this probably shouldn't have the ''archiveteam'' tag.<br />
* https://archive.org/search.php?query=subject%3A%22wallbase%22 -- 10 different items, representing efforts at saving [[wallbase.cc]]; need to be sorted and organized<br />
* https://archive.org/search.php?query=subject%3A%22aol%20archiveteam%2C%20aol%20files%2C%20aol%20protocol%22 -- 6 items that need their subject tags cleaned up<br />
* https://archive.org/search.php?query=subject%3A%22Tabblo%22%20AND%20NOT%20collection%3Aarchiveteam -- 5 of the 11 Tabblo items are not in the Archiveteam collection<br />
* https://archive.org/details/donkeykongsites<br />
* https://archive.org/details/dogpictbot<br />
* https://archive.org/details/HackerNewsStoriesAndCommentsDump<br />
* https://archive.org/details/flipnote-hatena-dkl3collection - in wikiteam collection but not a wiki so should be somewhere else<br />
* https://archive.org/details/msdos_Chenard_shareware<br />
* https://archive.org/details/msdos_Spanverb_shareware<br />
* https://archive.org/details/msdos_ADELINE_demo<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=24072Audit20142015-08-12T18:26:57Z<p>Archive Maniac: /* WARC */ Saves me time listing them all individually.</p>
<hr />
<div>We've uploaded a bunch of stuff: <br />
*[https://archive.org/search.php?query=subject:archiveteam subject:archiveteam] = 8,845 items<br />
*[https://archive.org/search.php?query=collection:archiveteam collection:archiveteam] = 39,562 items<br />
*[https://archive.org/search.php?query=NOT%20collection%3A%28archiveteam%29%20AND%20subject%3A%28archiveteam%29 subject:archiveteam AND NOT collection:archiveteam] = 1,507<br />
<br />
(The 3rd one should eventually be close to empty.)<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== High-level Collections ==<br />
* https://archive.org/details/web <br />
** https://archive.org/details/archiveteam<br />
*** https://archive.org/details/archiveteam-fire<br />
*** https://archive.org/details/archivebot<br />
** https://archive.org/details/wikiteam<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category (earbits)]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || 1070 || Sort of: [http://archive.fart.website/archivebot/viewer/ Viewer] || [[ArchiveBot]]; The viewer doesn't seem to index into crawls; there's no link from the collection or the items to the viewer (or anywhere else)<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || 53 || || See [[Splinder]]<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://archive.org/download/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || [[User:JesseW|JesseW]] || 2867 || || They appear to all be WARCs, most uploaded on the same day; it's not clear if all of them are in the Wayback Machine or not. Each item has no description or context.<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || 517 || || [[Hyves]]<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || 2163 || || [[Wretch]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || 454 || || [[Xanga]]; WARCs<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || 41 || || [[Twitter]] According to reviews, at least one file is empty.<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || 223 || || These are tarballs (less than 100 MBs, usually), containing each paste in a separate file. Most recently updated on July 1, 2014<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || 19 || || [[Zapd]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || 38 || || [[Patch]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || 444 || || [[Posterous]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || 368 || || [[Google Reader]]; 3 categories of WARCs: Directory, Stats & general. It would probably be good to also put them in separate collections. There is also a [https://archive.org/details/archiveteam_greaderstats_combined combined stats item].<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || 81 || || [[IGN]] (needs link to archive); Each item contains a particular subdomain. Descriptive names.<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || 74 || || ARCs from [[wikipedia:G4 (TV channel)]], mainly from the forum<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || 156 || || [[Yahoo! Video]]; various inconsistency in naming and categories; some items contain [https://archive.org/details/ARCHIVETEAM-YV-4790761-4799994 zip files], while others contain [https://archive.org/details/ARCHIVETEAM-YV-04980027-04983272 tar files].<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || 137 || Maybe -> [https://archive.org/details/archiveteam-friendster-index archiveteam-friendster-index] item || [[Friendster]]; early (2011) project, variety of formats<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || 1477 || || [[Formspring]]; WARCs; some duplication in collection description<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || 17 || || [[Yahoo! Messages]]; WARCs; Minimal description on collection, none on items<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || 47 || [https://archive.org/download/archiveteam_punchfork_index/index.html Yes] || [[Punchfork]]; Needs link to index from collection description (and item descriptions); three different types of items, unclear differences<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || 10 || || WARCs; no item descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || 20 || No || [[Cinch.fm]]; 10 items, in both WARC and tar formats<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || 203 || [https://archive.org/download/dailybooth-freeze-frame-index/index.html Yes] || [[DailyBooth]]; link to index on collection page needs adjusting; images seem to be downloadable; individual items lack descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || 26 || No || [[Weblog.nl]]; no English-language description<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || 790 || || Videos from [[wikipedia:Stage6]]; many seem to be unavailable from IA, due to "issues with the item's content."<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || 27 || No || [[Google Groups]]; each item contains a single tar file (ranging in size from 300 MB to over 40 GB); the tar files contain separate zip files for each group; the zip files the actual files. This should probably be grouped with the other grabs of Google Groups.<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || 8 || No || WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || 7 || No || Many small WARCs in each item; lengthy explanation in collection description, none in each item<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || 2459 || No || [[Webshots]]; WARCs<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || 1806 || Maybe: [https://archive.org/details/tabblo-archive-groups groups] item || [[Tabblo]]; 9 MegaWARCs, the rest of the items are groups of indiviual accounts as zip files<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || [https://archive.org/details/archiveteam-fortunecity-list Yes] || 55 || || [[FortuneCity]]; 26 "Set" items (containing a single large tar in each one); also 26 WARC items, and one leftovers item<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || Nemo || 148 || Not really || Should become a subcollection of "wikicollections", so that it's next to "wikimediacommons". The "remote" tarballs partially overlap with xowa items nowadays. If a complete mirror of the Your.Org tarballs is desired, we should list it at [https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs] with some maintenance information. It's not clear whether investing N TB at IA is a priority here, nor whether IA expects WikiTeam to do the uploads instead (in that case, ask Hydriz or Arkiver). Also, the Your.Org dumps are currently blocked on the lack of a rsync server on Wikimedia servers.<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || 39 || || [[AnyHub]]; 18 each WARC & tar items, and one called the "Blue Collection" <br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || 675 || || [[FilePlanet]]<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || 52 || || <br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || 12 || || [[Geocities]]<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || 7135 || || A vast and misc. collection; needs quite a bit of TLC<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || 383 || || Each item is a separate podcast, containing indvidual sound files, playable through the IA interface; there is also a [https://archive.org/download/archiveteam-mypodcast-dataonly misc] item<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || [[User:JesseW|JesseW]] || 1,348 || Partial (each item has a list of groups, but there's no overall list) || [[Google Groups]]; This is divided into items by the initial two letters (or digits or underscore). The item for "[https://archive.org/details/archiveteam-googlegroups-th th]" has an inconsistent title and category.<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || 3 || No || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category (streetfiles)]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || 30 || No || [[Yahoo! Voices]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || 2213 || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] || [[Twitch.tv]]<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || 40 || || [[Fotopedia]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || 47 || || [[Canv.as]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || 82 || || [[Ancestry.com]]; WARCs<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== WARC ====<br />
Anything under https://archive.org/search.php?query=subject%3A%22warcarchives%22<br />
<br />
* https://archive.org/details/fenopy-se-fire-grab-2014-12-30-16-38-13<br />
* https://archive.org/details/netszar_com_2015_06<br />
* https://archive.org/details/swipnet-searchengine-crawl-nonrecursive<br />
* https://archive.org/details/swipnet-searchengine-crawl-recursive<br />
* https://archive.org/details/kajaszoszentpeter_hu_2015_06<br />
* https://archive.org/details/warc-hallofshame.gp.co.at<br />
* https://archive.org/details/warc-freakedenough.at<br />
* https://archive.org/details/nintendoukkidsclub-20150608.warc<br />
* https://archive.org/details/warc-9chin<br />
* https://archive.org/details/warcarchive-www.bun23.com<br />
* https://archive.org/details/warchive-www.sotipro.com<br />
* https://archive.org/details/files.hii-tech.com-warc<br />
* https://archive.org/details/www.synthfool.com<br />
* https://archive.org/details/wwwbiologyarizonaedu<br />
* https://archive.org/details/studionyami-com_penfifteen-2012-03-05<br />
* https://archive.org/details/fybertech<br />
* https://archive.org/details/wwwclarkuedu-djoyce-trig-20150608.warc<br />
<br />
==== FTP ====<br />
* https://archive.org/details/2014.0102.mail.digipro.rs<br />
* https://archive.org/details/2014.12.ftp.dlink.biz_201501<br />
* https://archive.org/details/2015.01.12.ftp.sunet.sePubOpenBSD<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
* https://archive.org/details/mail.google.com-saved-1Oct2014<br />
* https://archive.org/details/Data2September2013.tar (Gunnerkrigg Court homepage comments snapshots)<br />
* https://archive.org/details/fotodisco-raw-items<br />
* https://archive.org/details/qwikidisco-raw-items<br />
* https://archive.org/details/twitpicdisco-raw-items<br />
* https://archive.org/details/maemo-fremantle-ovi<br />
* https://archive.org/details/toontown_infinite_github_20150103<br />
* https://archive.org/details/amplicate_sitemaps_20140218<br />
* https://archive.org/details/twitch-raw-items<br />
* https://archive.org/details/actionbutton_mini.tar<br />
* https://archive.org/details/ageofnerds_mini<br />
* https://archive.org/details/2015feb06a07FuturamerlinAList<br />
* https://archive.org/details/worldpeacehaven_gmail_Xaa<br />
* https://archive.org/details/worldpeacehaven_gmail_Xab<br />
* https://archive.org/details/2015feb02ob<br />
* https://archive.org/details/2014dec09spe2<br />
* https://archive.org/details/bigougit_mini_v2<br />
* https://archive.org/details/galman33_mini<br />
* https://archive.org/details/urls2015dec02n2<br />
* https://archive.org/details/493nfos<br />
* https://archive.org/details/archiveteam_dev_env_v1_appliances<br />
* https://archive.org/details/Kazbeg_Panorama.jpg -- If tags can be edited by non-owners, this probably shouldn't have the ''archiveteam'' tag.<br />
* https://archive.org/search.php?query=subject%3A%22wallbase%22 -- 10 different items, representing efforts at saving [[wallbase.cc]]; need to be sorted and organized<br />
* https://archive.org/search.php?query=subject%3A%22aol%20archiveteam%2C%20aol%20files%2C%20aol%20protocol%22 -- 6 items that need their subject tags cleaned up<br />
* https://archive.org/search.php?query=subject%3A%22Tabblo%22%20AND%20NOT%20collection%3Aarchiveteam -- 5 of the 11 Tabblo items are not in the Archiveteam collection<br />
* https://archive.org/details/donkeykongsites<br />
* https://archive.org/details/dogpictbot<br />
* https://archive.org/details/HackerNewsStoriesAndCommentsDump<br />
* https://archive.org/details/flipnote-hatena-dkl3collection - in wikiteam collection but not a wiki so should be somewhere else<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Blip.tv&diff=24040Blip.tv2015-08-07T10:13:10Z<p>Archive Maniac: For consistency's sake</p>
<hr />
<div>{{Infobox project<br />
| title = Blip.tv<br />
| logo = Blip_web_logo.png<br />
| image = Blip.tv_1303512711518.png<br />
| description = <br />
| URL = http://blip.tv/<br />
| project_status = {{closing}} (for real this time)<br />
| archiving_status = {{partiallysaved}} previously, new things {{notsaved}}<br />
| source = [https://github.com/ArchiveTeam/blip-grab blip-grab]<br />
| tracker = [http://tracker.archiveteam.org/bloopertv/ bloopertv] (previous; broken link)<br>[http://tracker.archiveteam.org/blip/ blip] (current)<br />
| irc = blooper.tv<br />
}}<br />
<br />
'''Blip.tv''', or now Blip, is a [[Video hostings|video hosting]] website for original web series. Video creators, or producers, can apply for an application for their content to be monetized. Blip.tv retains a portion of the advertising revenue.<br />
<br />
<blockquote><br />
<p>blip.tv is a free videoblogging, podcasting and video sharing service.</p><br />
<br />
<p>Our goal is to change the world by bringing excellent free video publishing services to people who are unable or unwilling to get outlets from major media organizations in the United States and throughout the world. Our passion is democratization. Our principles are the rules we live by.<ref>http://web.archive.org/web/20060718114128/http://blip.tv/about/</ref></p><br />
<br />
<p>If you don't have a blog we'll give you one, and if you have one already we'll make it a video blog.<ref>http://web.archive.org/web/20060719041121/http://blip.tv/faq/</ref></p><br />
</blockquote><br />
<br />
== Potential Shutdown==<br />
On May 28th, 2014, Blip sent out an email to account holders informing them that video uploads will be disabled after July 7th 2014, and their accounts will be deleted on September 1st 2014.<br />
<br />
[[File:Blip_shutdown_may2014.PNG]]<br />
<br />
== Blip.tv 2.0 ==<br />
<br />
[[File:Blip.jpg]]<br />
<br />
<blockquote><br />
<p>Blip.tv Acquired By Video-Blog Killers Maker Studios</p><br />
<p>''Posted on Oct 8 2013 - 3:12pm by Zennie Abraham</p><br />
<p>Blip.tv, the video sharing site that was created by a team lead by Mike Hudack and Dina Kaplan, is dead. It’s now called “the old Blip.tv” and has been replaced by something owned by that horrible video-channel eating company network Maker Studios. (Mike and Dina left Blip in 2012.)</p><br />
<br />
<p>And if you’re asking “Is that the same Maker Studios that took YouTube Partner Ray William Johnson’s Google AdSense account and never gave it back to him? The same Maker Studios that pushed Pew Die Pie at us on YouTube? The same Maker Studios that was founded by Danny Zappin, Lisa Donovan (LisaNova on YouTube), Scott Katz, Derek Jones and Will Watkin? The same Maker Studios that’s involved in a nasty lawsuit between Mr. Zappin and the others, including his ex-girlfriend Lisa Donovan?</p><br />
<br />
<p>The answer is yes.<ref>http://www.zennie62blog.com/2013/10/08/blip-tv-acquired-by-video-blog-killers-maker-studios-94233/</ref></p><br />
</blockquote><br />
<br />
== ROUND TWO...FIGHT!==<br />
<br />
Almost when people thought the site could survive yet another "shutdown", Disney (through Maker Studios) is finally going to be landing the killing blow on August 20th.<br />
<br />
https://twitter.com/theGunrun/status/623239664292335617<br />
<br />
== How can I help? ==<br />
<br />
You can run the [[ArchiveTeam Warrior]]! Check out that page for more info about running it. (Alternatively, you can run the scripts manually.)<br />
<br />
We need to save about ''228,000 videos before November 7th''.<br />
<br />
=== Additional FAQ ===<br />
<br />
==== How big are the files? ====<br />
<br />
File sizes range from 5 MB to 2 GB. If you have slow upload speeds, you can still help. A small contribution still helps along the way! Still not convinced? Then help us spread the news and get others to help.<br />
<br />
==== I'm running the warrior but it just says "Starting WgetDownload" ====<br />
<br />
It currently does not show your download progress, but don't worry, it is actually downloading. You can check by looking at the bandwidth graph in the lower left corner of the page.<br />
<br />
== Site Structure ==<br />
* On pages like http://blip.tv/comedy-videos there is pagination but the page links are only '#' and need js to work<br />
* Each anchor tag does have a data-results_page value that appears to carry the url information. Here is an example:<br />
<a class='currentResults' href="#" data-results_page="/channel/get_directory_listing?channels_id=46&page=1">1</a><br />
<a class='advanceResults' href="#" data-results_page="/channel/get_directory_listing?channels_id=46&page=2">2</a><br />
<a class='advanceResults' href="#" data-results_page="/channel/get_directory_listing?channels_id=46&page=3">3</a><br />
* Only 3 links of pagination are shown at a time. To find out how many pages a category has you must click the "double arrow right" at the bottom of the page, which requires javascript.<br />
* Each page of results in a category is only 8 shows at a time.<br />
* Each show has x many episodes<br />
* To video an episode list for a show you must have javascript enabled. This is also true for pagination on these pages.<br />
* Some shows have rss feeds. Example http://blip.tv/ylse/rss<br />
* RSS feeds only show a partial list of episodes. http://blip.tv/schlomo/ has more episodes than listed in the RSS feed.<br />
* If you have the url of a video's page http://blip.tv/zomblogalypse/zomblogalypse-series-trailer-5617646 for example get-flash-video can download the video in a nice mp4 file.<br />
* robots.txt crawl delay is 1 second.<br />
* Here's some 20,000 blip.tv urls from [[URLTeam]]: http://paste.archivingyoursh.it/raw/gidaqotimo.sm<br />
<br />
=== sitemap ===<br />
<br />
* The sitemap is http://blip.tv/sitemap/xml/bliptv-sitemap-index.xml which links to more sitemaps.<br />
* Contains 3,397 shows that consist of 1 or more episodes.<br />
* Here is a pretty printed example of one of the sitemap files. http://paste.archivingyoursh.it/vomilosedu.xml<br />
<br />
== URL Discovery ==<br />
<br />
Here are all the video urls for blip.tv from the sitemap files, sorted and de-duplicated. Total count 228,133. https://archive.org/details/2013_10_09_bliptv_urls<br />
<br />
== Archives ==<br />
<br />
A logistical snafu occurred and the project slipped through the cracks. (Only one project leader existed and this leader disappeared.) Only 7TB of 70TB was saved. These videos were added to the existing [https://archive.org/details/bliptv bliptv collection].<br />
<br />
== External links ==<br />
* {{url|1=http://blip.tv/|2=Blip.tv}}<br />
* [http://support.blip.tv/entries/23277196-An-Important-Update-from-Blip-Regarding-Account-Removals An Important Update from Blip Regarding Account Removals]<br />
* {{w|Blip (website)}}<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Video hosting]]</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Jscott&diff=24039User talk:Jscott2015-08-06T21:30:12Z<p>Archive Maniac: /* Lotsa Lotsa WARCs */ new section</p>
<hr />
<div>I'm putting this letter sent to me here, because I think it's a valid idea but don't have time to develop it. I'm throwing it out to the larger world to get people interested.<br />
<br />
Hi Jason,<br />
<br />
This is partly "fuck the cloud" and partly "archive team" related. Are we not preparing ourselves to support crippled "cloud enabled" devices when their master servers somewhere on the internet go bye-bye?<br />
<br />
I'm a fan of some of these neat little Internet-enabled devices. I got a second-hand Squeezebox (Internet radio) and a Karotz (Internet radio-rabbit-camera "companion") and have noticed that both of these products have gone through major outages and are as good as dead when the providers decide their useful life is done. What about when Honeywell finally buys out Nest and kills the early products? Will the firmware be smart enough to revert back to a dumb thermostat, or will it just freeze out the house (literally) when it can't contact the server?<br />
<br />
We're building up for a lot of future-worthless devices right now. The old BlackBerry phones will be quite worthless without a BlackBerry server on the other side of the cell antenna. iPhones will be pretty crap due to their hooks into iCloud. Androids should in theory be hackable to be somewhat autonomous.<br />
<br />
And these purpose-built devices will become more and more worthless. Not only data, but the devices themselves are going to go "down" and become junk. Sure, there's hope with standard ARM platforms, common Android operating systems that we can hack, but there's still plenty of embedded crap that has dependencies on some central server that's duct-taped together.<br />
<br />
Are many thinking about creating/reverse engineering some of these server-side apps to make these devices functional past their prime? I'm thinking about the process of patching various devices to point to a new server (or acquiring the old domain) and reviving/simulating server side functionality so these devices work once again. This shouldn't be required, it stinks of faking a DRM server to play weird windows media files or patching a FlexLM license server so some bizarro software can run again.<br />
<br />
Bandwidth, interactivity, complexity, it's all going to factor in hosting these kinds of things, but this is just a thought.<br />
<br />
Use this for your own thoughts, integrations, ideas, talks, I don't mind at all (but if you want to credit me in some footnote, I'm happy with that but not necessary).<br />
<br />
If this was a bunch of blah-blah, my apologies - but I was thinking this might be up your alley. Happy new year and good luck finishing up the DEFCON documentary!<br />
<br />
Regards,<br />
<br />
Ryan Sayre<br />
London, United Kingdom SW6 4UJ<br />
<br />
:The [http://fileformats.archiveteam.org/wiki/Networked_devices Networked devices page on the File Formats wiki] (an Archive Team subdomain) would be a good place to document anything anybody has managed to discover or reverse-engineer about how those gadgets store and transmit data, where they send it to and receive it from, and how to hack and jailbreak them. [[User:Dan Tobias|Dan Tobias]] 23:22, 10 May 2013 (EDT)<br />
<br />
== Lotsa Lotsa WARCs ==<br />
<br />
Hi, Jason, recently I've been uploading a lot of site archives (as WARCs) on to the Internet Archive, and I categorized all of them under the "warcarchives" subject tag specifically (and archiveteam & archivebot). Can 'warcarchives' be a tag used by the ArchiveTeam for WARC files? See, I cannot inject WARC files into the Wayback Machine but I know that you can. The only way I can think of queueing WARCs is by using a specific subject tag (see [https://archive.org/search.php?query=subject%3A%22warcarchives%22 here] or more [https://archive.org/search.php?query=subject%3A%22warc.gz%22 here]).<br />
<br />
In case you want to put them into the Wayback Machine, you are welcome to. That's why I uploaded the WARCs. A lot of the URLs in the WARCs haven't even been crawled by the Wayback Machine thus yet.<br />
<br />
And p.s., yes, I've changed a lot over the last year. I've matured a lot.<br />
<br />
Thanks for reading,<br />
<br />
[[User:Archive Maniac|Archive Maniac]] 17:30, 6 August 2015 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=24023Audit20142015-08-03T00:30:39Z<p>Archive Maniac: Actually decided to use https://archive.org/search.php?query=subject%3A%22warcarchives%22 for WARC files.</p>
<hr />
<div>We've uploaded a bunch of stuff: <br />
*[https://archive.org/search.php?query=subject:archiveteam subject:archiveteam] = 8,845 items<br />
*[https://archive.org/search.php?query=collection:archiveteam collection:archiveteam] = 39,562 items<br />
*[https://archive.org/search.php?query=NOT%20collection%3A%28archiveteam%29%20AND%20subject%3A%28archiveteam%29 subject:archiveteam AND NOT collection:archiveteam] = 1,507<br />
<br />
(The 3rd one should eventually be close to empty.)<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== High-level Collections ==<br />
* https://archive.org/details/web <br />
** https://archive.org/details/archiveteam<br />
*** https://archive.org/details/archiveteam-fire<br />
*** https://archive.org/details/archivebot<br />
** https://archive.org/details/wikiteam<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category (earbits)]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || 1070 || Sort of: [http://archive.fart.website/archivebot/viewer/ Viewer] || [[ArchiveBot]]; The viewer doesn't seem to index into crawls; there's no link from the collection or the items to the viewer (or anywhere else)<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || 53 || || See [[Splinder]]<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://archive.org/download/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || [[User:JesseW|JesseW]] || 2867 || || They appear to all be WARCs, most uploaded on the same day; it's not clear if all of them are in the Wayback Machine or not. Each item has no description or context.<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || 517 || || [[Hyves]]<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || 2163 || || [[Wretch]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || 454 || || [[Xanga]]; WARCs<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || 41 || || [[Twitter]] According to reviews, at least one file is empty.<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || 223 || || These are tarballs (less than 100 MBs, usually), containing each paste in a separate file. Most recently updated on July 1, 2014<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || 19 || || [[Zapd]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || 38 || || [[Patch]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || 444 || || [[Posterous]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || 368 || || [[Google Reader]]; 3 categories of WARCs: Directory, Stats & general. It would probably be good to also put them in separate collections. There is also a [https://archive.org/details/archiveteam_greaderstats_combined combined stats item].<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || 81 || || [[IGN]] (needs link to archive); Each item contains a particular subdomain. Descriptive names.<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || 74 || || ARCs from [[wikipedia:G4 (TV channel)]], mainly from the forum<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || 156 || || [[Yahoo! Video]]; various inconsistency in naming and categories; some items contain [https://archive.org/details/ARCHIVETEAM-YV-4790761-4799994 zip files], while others contain [https://archive.org/details/ARCHIVETEAM-YV-04980027-04983272 tar files].<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || 137 || Maybe -> [https://archive.org/details/archiveteam-friendster-index archiveteam-friendster-index] item || [[Friendster]]; early (2011) project, variety of formats<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || 1477 || || [[Formspring]]; WARCs; some duplication in collection description<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || 17 || || [[Yahoo! Messages]]; WARCs; Minimal description on collection, none on items<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || 47 || [https://archive.org/download/archiveteam_punchfork_index/index.html Yes] || [[Punchfork]]; Needs link to index from collection description (and item descriptions); three different types of items, unclear differences<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || 10 || || WARCs; no item descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || 20 || No || [[Cinch.fm]]; 10 items, in both WARC and tar formats<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || 203 || [https://archive.org/download/dailybooth-freeze-frame-index/index.html Yes] || [[DailyBooth]]; link to index on collection page needs adjusting; images seem to be downloadable; individual items lack descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || 26 || No || [[Weblog.nl]]; no English-language description<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || 790 || || Videos from [[wikipedia:Stage6]]; many seem to be unavailable from IA, due to "issues with the item's content."<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || 27 || No || [[Google Groups]]; each item contains a single tar file (ranging in size from 300 MB to over 40 GB); the tar files contain separate zip files for each group; the zip files the actual files. This should probably be grouped with the other grabs of Google Groups.<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || 8 || No || WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || 7 || No || Many small WARCs in each item; lengthy explanation in collection description, none in each item<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || 2459 || No || [[Webshots]]; WARCs<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || 1806 || Maybe: [https://archive.org/details/tabblo-archive-groups groups] item || [[Tabblo]]; 9 MegaWARCs, the rest of the items are groups of indiviual accounts as zip files<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || [https://archive.org/details/archiveteam-fortunecity-list Yes] || 55 || || [[FortuneCity]]; 26 "Set" items (containing a single large tar in each one); also 26 WARC items, and one leftovers item<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || Nemo || 148 || Not really || Should become a subcollection of "wikicollections", so that it's next to "wikimediacommons". The "remote" tarballs partially overlap with xowa items nowadays. If a complete mirror of the Your.Org tarballs is desired, we should list it at [https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs] with some maintenance information. It's not clear whether investing N TB at IA is a priority here, nor whether IA expects WikiTeam to do the uploads instead (in that case, ask Hydriz or Arkiver). Also, the Your.Org dumps are currently blocked on the lack of a rsync server on Wikimedia servers.<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || 39 || || [[AnyHub]]; 18 each WARC & tar items, and one called the "Blue Collection" <br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || 675 || || [[FilePlanet]]<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || 52 || || <br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || 12 || || [[Geocities]]<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || 7135 || || A vast and misc. collection; needs quite a bit of TLC<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || 383 || || Each item is a separate podcast, containing indvidual sound files, playable through the IA interface; there is also a [https://archive.org/download/archiveteam-mypodcast-dataonly misc] item<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || [[User:JesseW|JesseW]] || 1,348 || Partial (each item has a list of groups, but there's no overall list) || [[Google Groups]]; This is divided into items by the initial two letters (or digits or underscore). The item for "[https://archive.org/details/archiveteam-googlegroups-th th]" has an inconsistent title and category.<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || 3 || No || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category (streetfiles)]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || 30 || No || [[Yahoo! Voices]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || 2213 || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] || [[Twitch.tv]]<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || 40 || || [[Fotopedia]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || 47 || || [[Canv.as]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || 82 || || [[Ancestry.com]]; WARCs<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== WARC ====<br />
* https://archive.org/details/fenopy-se-fire-grab-2014-12-30-16-38-13<br />
* https://archive.org/details/netszar_com_2015_06<br />
* https://archive.org/details/swipnet-searchengine-crawl-nonrecursive<br />
* https://archive.org/details/swipnet-searchengine-crawl-recursive<br />
* https://archive.org/details/kajaszoszentpeter_hu_2015_06<br />
* https://archive.org/details/warc-hallofshame.gp.co.at<br />
* https://archive.org/details/warc-freakedenough.at<br />
* https://archive.org/details/nintendoukkidsclub-20150608.warc<br />
* https://archive.org/details/warc-9chin<br />
* https://archive.org/details/warcarchive-www.bun23.com<br />
* https://archive.org/details/warchive-www.sotipro.com<br />
* https://archive.org/details/files.hii-tech.com-warc<br />
<br />
==== FTP ====<br />
* https://archive.org/details/2014.0102.mail.digipro.rs<br />
* https://archive.org/details/2014.12.ftp.dlink.biz_201501<br />
* https://archive.org/details/2015.01.12.ftp.sunet.sePubOpenBSD<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
* https://archive.org/details/mail.google.com-saved-1Oct2014<br />
* https://archive.org/details/Data2September2013.tar (Gunnerkrigg Court homepage comments snapshots)<br />
* https://archive.org/details/fotodisco-raw-items<br />
* https://archive.org/details/qwikidisco-raw-items<br />
* https://archive.org/details/twitpicdisco-raw-items<br />
* https://archive.org/details/maemo-fremantle-ovi<br />
* https://archive.org/details/toontown_infinite_github_20150103<br />
* https://archive.org/details/amplicate_sitemaps_20140218<br />
* https://archive.org/details/twitch-raw-items<br />
* https://archive.org/details/actionbutton_mini.tar<br />
* https://archive.org/details/ageofnerds_mini<br />
* https://archive.org/details/2015feb06a07FuturamerlinAList<br />
* https://archive.org/details/worldpeacehaven_gmail_Xaa<br />
* https://archive.org/details/worldpeacehaven_gmail_Xab<br />
* https://archive.org/details/2015feb02ob<br />
* https://archive.org/details/2014dec09spe2<br />
* https://archive.org/details/bigougit_mini_v2<br />
* https://archive.org/details/galman33_mini<br />
* https://archive.org/details/urls2015dec02n2<br />
* https://archive.org/details/493nfos<br />
* https://archive.org/details/archiveteam_dev_env_v1_appliances<br />
* https://archive.org/details/Kazbeg_Panorama.jpg -- If tags can be edited by non-owners, this probably shouldn't have the ''archiveteam'' tag.<br />
* https://archive.org/search.php?query=subject%3A%22wallbase%22 -- 10 different items, representing efforts at saving [[wallbase.cc]]; need to be sorted and organized<br />
* https://archive.org/search.php?query=subject%3A%22aol%20archiveteam%2C%20aol%20files%2C%20aol%20protocol%22 -- 6 items that need their subject tags cleaned up<br />
* https://archive.org/search.php?query=subject%3A%22Tabblo%22%20AND%20NOT%20collection%3Aarchiveteam -- 5 of the 11 Tabblo items are not in the Archiveteam collection<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=24013Audit20142015-07-31T22:31:07Z<p>Archive Maniac: /* WARC */ See also: https://archive.org/search.php?query=subject%3A%22warc.gz%22</p>
<hr />
<div>We've uploaded a bunch of stuff: <br />
*[https://archive.org/search.php?query=subject:archiveteam subject:archiveteam] = 8,845 items<br />
*[https://archive.org/search.php?query=collection:archiveteam collection:archiveteam] = 39,562 items<br />
*[https://archive.org/search.php?query=NOT%20collection%3A%28archiveteam%29%20AND%20subject%3A%28archiveteam%29 subject:archiveteam AND NOT collection:archiveteam] = 1,507<br />
<br />
(The 3rd one should eventually be close to empty.)<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== High-level Collections ==<br />
* https://archive.org/details/web <br />
** https://archive.org/details/archiveteam<br />
*** https://archive.org/details/archiveteam-fire<br />
*** https://archive.org/details/archivebot<br />
** https://archive.org/details/wikiteam<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category (earbits)]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || 1070 || Sort of: [http://archive.fart.website/archivebot/viewer/ Viewer] || [[ArchiveBot]]; The viewer doesn't seem to index into crawls; there's no link from the collection or the items to the viewer (or anywhere else)<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || 53 || || See [[Splinder]]<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://archive.org/download/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || [[User:JesseW|JesseW]] || 2867 || || They appear to all be WARCs, most uploaded on the same day; it's not clear if all of them are in the Wayback Machine or not. Each item has no description or context.<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || 517 || || [[Hyves]]<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || 2163 || || [[Wretch]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || 454 || || [[Xanga]]; WARCs<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || 41 || || [[Twitter]] According to reviews, at least one file is empty.<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || 223 || || These are tarballs (less than 100 MBs, usually), containing each paste in a separate file. Most recently updated on July 1, 2014<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || 19 || || [[Zapd]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || 38 || || [[Patch]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || 444 || || [[Posterous]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || 368 || || [[Google Reader]]; 3 categories of WARCs: Directory, Stats & general. It would probably be good to also put them in separate collections. There is also a [https://archive.org/details/archiveteam_greaderstats_combined combined stats item].<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || 81 || || [[IGN]] (needs link to archive); Each item contains a particular subdomain. Descriptive names.<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || 74 || || ARCs from [[wikipedia:G4 (TV channel)]], mainly from the forum<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || 156 || || [[Yahoo! Video]]; various inconsistency in naming and categories; some items contain [https://archive.org/details/ARCHIVETEAM-YV-4790761-4799994 zip files], while others contain [https://archive.org/details/ARCHIVETEAM-YV-04980027-04983272 tar files].<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || 137 || Maybe -> [https://archive.org/details/archiveteam-friendster-index archiveteam-friendster-index] item || [[Friendster]]; early (2011) project, variety of formats<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || 1477 || || [[Formspring]]; WARCs; some duplication in collection description<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || 17 || || [[Yahoo! Messages]]; WARCs; Minimal description on collection, none on items<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || 47 || [https://archive.org/download/archiveteam_punchfork_index/index.html Yes] || [[Punchfork]]; Needs link to index from collection description (and item descriptions); three different types of items, unclear differences<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || 10 || || WARCs; no item descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || 20 || No || [[Cinch.fm]]; 10 items, in both WARC and tar formats<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || 203 || [https://archive.org/download/dailybooth-freeze-frame-index/index.html Yes] || [[DailyBooth]]; link to index on collection page needs adjusting; images seem to be downloadable; individual items lack descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || 26 || No || [[Weblog.nl]]; no English-language description<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || 790 || || Videos from [[wikipedia:Stage6]]; many seem to be unavailable from IA, due to "issues with the item's content."<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || 27 || No || [[Google Groups]]; each item contains a single tar file (ranging in size from 300 MB to over 40 GB); the tar files contain separate zip files for each group; the zip files the actual files. This should probably be grouped with the other grabs of Google Groups.<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || 8 || No || WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || 7 || No || Many small WARCs in each item; lengthy explanation in collection description, none in each item<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || 2459 || No || [[Webshots]]; WARCs<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || 1806 || Maybe: [https://archive.org/details/tabblo-archive-groups groups] item || [[Tabblo]]; 9 MegaWARCs, the rest of the items are groups of indiviual accounts as zip files<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || [https://archive.org/details/archiveteam-fortunecity-list Yes] || 55 || || [[FortuneCity]]; 26 "Set" items (containing a single large tar in each one); also 26 WARC items, and one leftovers item<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || Nemo || 148 || Not really || Should become a subcollection of "wikicollections", so that it's next to "wikimediacommons". The "remote" tarballs partially overlap with xowa items nowadays. If a complete mirror of the Your.Org tarballs is desired, we should list it at [https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs] with some maintenance information. It's not clear whether investing N TB at IA is a priority here, nor whether IA expects WikiTeam to do the uploads instead (in that case, ask Hydriz or Arkiver). Also, the Your.Org dumps are currently blocked on the lack of a rsync server on Wikimedia servers.<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || 39 || || [[AnyHub]]; 18 each WARC & tar items, and one called the "Blue Collection" <br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || 675 || || [[FilePlanet]]<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || 52 || || <br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || 12 || || [[Geocities]]<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || 7135 || || A vast and misc. collection; needs quite a bit of TLC<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || 383 || || Each item is a separate podcast, containing indvidual sound files, playable through the IA interface; there is also a [https://archive.org/download/archiveteam-mypodcast-dataonly misc] item<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || [[User:JesseW|JesseW]] || 1,348 || Partial (each item has a list of groups, but there's no overall list) || [[Google Groups]]; This is divided into items by the initial two letters (or digits or underscore). The item for "[https://archive.org/details/archiveteam-googlegroups-th th]" has an inconsistent title and category.<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || 3 || No || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category (streetfiles)]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || 30 || No || [[Yahoo! Voices]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || 2213 || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] || [[Twitch.tv]]<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || 40 || || [[Fotopedia]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || 47 || || [[Canv.as]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || 82 || || [[Ancestry.com]]; WARCs<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== WARC ====<br />
* https://archive.org/details/fenopy-se-fire-grab-2014-12-30-16-38-13<br />
* https://archive.org/details/netszar_com_2015_06<br />
* https://archive.org/details/swipnet-searchengine-crawl-nonrecursive<br />
* https://archive.org/details/swipnet-searchengine-crawl-recursive<br />
* https://archive.org/details/kajaszoszentpeter_hu_2015_06<br />
* https://archive.org/details/warc-hallofshame.gp.co.at<br />
* https://archive.org/details/warc-freakedenough.at<br />
* https://archive.org/details/nintendoukkidsclub-20150608.warc<br />
<br />
==== FTP ====<br />
* https://archive.org/details/2014.0102.mail.digipro.rs<br />
* https://archive.org/details/2014.12.ftp.dlink.biz_201501<br />
* https://archive.org/details/2015.01.12.ftp.sunet.sePubOpenBSD<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
* https://archive.org/details/mail.google.com-saved-1Oct2014<br />
* https://archive.org/details/Data2September2013.tar (Gunnerkrigg Court homepage comments snapshots)<br />
* https://archive.org/details/fotodisco-raw-items<br />
* https://archive.org/details/qwikidisco-raw-items<br />
* https://archive.org/details/twitpicdisco-raw-items<br />
* https://archive.org/details/maemo-fremantle-ovi<br />
* https://archive.org/details/toontown_infinite_github_20150103<br />
* https://archive.org/details/amplicate_sitemaps_20140218<br />
* https://archive.org/details/twitch-raw-items<br />
* https://archive.org/details/actionbutton_mini.tar<br />
* https://archive.org/details/ageofnerds_mini<br />
* https://archive.org/details/2015feb06a07FuturamerlinAList<br />
* https://archive.org/details/worldpeacehaven_gmail_Xaa<br />
* https://archive.org/details/worldpeacehaven_gmail_Xab<br />
* https://archive.org/details/2015feb02ob<br />
* https://archive.org/details/2014dec09spe2<br />
* https://archive.org/details/bigougit_mini_v2<br />
* https://archive.org/details/galman33_mini<br />
* https://archive.org/details/urls2015dec02n2<br />
* https://archive.org/details/493nfos<br />
* https://archive.org/details/archiveteam_dev_env_v1_appliances<br />
* https://archive.org/details/Kazbeg_Panorama.jpg -- If tags can be edited by non-owners, this probably shouldn't have the ''archiveteam'' tag.<br />
* https://archive.org/search.php?query=subject%3A%22wallbase%22 -- 10 different items, representing efforts at saving [[wallbase.cc]]; need to be sorted and organized<br />
* https://archive.org/search.php?query=subject%3A%22aol%20archiveteam%2C%20aol%20files%2C%20aol%20protocol%22 -- 6 items that need their subject tags cleaned up<br />
* https://archive.org/search.php?query=subject%3A%22Tabblo%22%20AND%20NOT%20collection%3Aarchiveteam -- 5 of the 11 Tabblo items are not in the Archiveteam collection<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=24012Audit20142015-07-31T22:04:03Z<p>Archive Maniac: /* WARC */ Sorry, just added a little bit of my WARC crawls on this list.</p>
<hr />
<div>We've uploaded a bunch of stuff: <br />
*[https://archive.org/search.php?query=subject:archiveteam subject:archiveteam] = 8,845 items<br />
*[https://archive.org/search.php?query=collection:archiveteam collection:archiveteam] = 39,562 items<br />
*[https://archive.org/search.php?query=NOT%20collection%3A%28archiveteam%29%20AND%20subject%3A%28archiveteam%29 subject:archiveteam AND NOT collection:archiveteam] = 1,507<br />
<br />
(The 3rd one should eventually be close to empty.)<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== High-level Collections ==<br />
* https://archive.org/details/web <br />
** https://archive.org/details/archiveteam<br />
*** https://archive.org/details/archiveteam-fire<br />
*** https://archive.org/details/archivebot<br />
** https://archive.org/details/wikiteam<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category (earbits)]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || 1070 || Sort of: [http://archive.fart.website/archivebot/viewer/ Viewer] || [[ArchiveBot]]; The viewer doesn't seem to index into crawls; there's no link from the collection or the items to the viewer (or anywhere else)<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || 53 || || See [[Splinder]]<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://archive.org/download/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || [[User:JesseW|JesseW]] || 2867 || || They appear to all be WARCs, most uploaded on the same day; it's not clear if all of them are in the Wayback Machine or not. Each item has no description or context.<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || 517 || || [[Hyves]]<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || 2163 || || [[Wretch]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || 454 || || [[Xanga]]; WARCs<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || 41 || || [[Twitter]] According to reviews, at least one file is empty.<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || 223 || || These are tarballs (less than 100 MBs, usually), containing each paste in a separate file. Most recently updated on July 1, 2014<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || 19 || || [[Zapd]]; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || 38 || || [[Patch]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || 444 || || [[Posterous]] ; WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || 368 || || [[Google Reader]]; 3 categories of WARCs: Directory, Stats & general. It would probably be good to also put them in separate collections. There is also a [https://archive.org/details/archiveteam_greaderstats_combined combined stats item].<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || 81 || || [[IGN]] (needs link to archive); Each item contains a particular subdomain. Descriptive names.<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || 74 || || ARCs from [[wikipedia:G4 (TV channel)]], mainly from the forum<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || 156 || || [[Yahoo! Video]]; various inconsistency in naming and categories; some items contain [https://archive.org/details/ARCHIVETEAM-YV-4790761-4799994 zip files], while others contain [https://archive.org/details/ARCHIVETEAM-YV-04980027-04983272 tar files].<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || 137 || Maybe -> [https://archive.org/details/archiveteam-friendster-index archiveteam-friendster-index] item || [[Friendster]]; early (2011) project, variety of formats<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || 1477 || || [[Formspring]]; WARCs; some duplication in collection description<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || 17 || || [[Yahoo! Messages]]; WARCs; Minimal description on collection, none on items<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || 47 || [https://archive.org/download/archiveteam_punchfork_index/index.html Yes] || [[Punchfork]]; Needs link to index from collection description (and item descriptions); three different types of items, unclear differences<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || 10 || || WARCs; no item descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || 20 || No || [[Cinch.fm]]; 10 items, in both WARC and tar formats<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || 203 || [https://archive.org/download/dailybooth-freeze-frame-index/index.html Yes] || [[DailyBooth]]; link to index on collection page needs adjusting; images seem to be downloadable; individual items lack descriptions<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || 26 || No || [[Weblog.nl]]; no English-language description<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || 790 || || Videos from [[wikipedia:Stage6]]; many seem to be unavailable from IA, due to "issues with the item's content."<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || 27 || No || [[Google Groups]]; each item contains a single tar file (ranging in size from 300 MB to over 40 GB); the tar files contain separate zip files for each group; the zip files the actual files. This should probably be grouped with the other grabs of Google Groups.<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || 8 || No || WARCs<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || 7 || No || Many small WARCs in each item; lengthy explanation in collection description, none in each item<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || 2459 || No || [[Webshots]]; WARCs<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || 1806 || Maybe: [https://archive.org/details/tabblo-archive-groups groups] item || [[Tabblo]]; 9 MegaWARCs, the rest of the items are groups of indiviual accounts as zip files<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || [https://archive.org/details/archiveteam-fortunecity-list Yes] || 55 || || [[FortuneCity]]; 26 "Set" items (containing a single large tar in each one); also 26 WARC items, and one leftovers item<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || Nemo || 148 || Not really || Should become a subcollection of "wikicollections", so that it's next to "wikimediacommons". The "remote" tarballs partially overlap with xowa items nowadays. If a complete mirror of the Your.Org tarballs is desired, we should list it at [https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Media_tarballs] with some maintenance information. It's not clear whether investing N TB at IA is a priority here, nor whether IA expects WikiTeam to do the uploads instead (in that case, ask Hydriz or Arkiver). Also, the Your.Org dumps are currently blocked on the lack of a rsync server on Wikimedia servers.<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || 39 || || [[AnyHub]]; 18 each WARC & tar items, and one called the "Blue Collection" <br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || 675 || || [[FilePlanet]]<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || 52 || || <br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || 12 || || [[Geocities]]<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || 7135 || || A vast and misc. collection; needs quite a bit of TLC<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || 383 || || Each item is a separate podcast, containing indvidual sound files, playable through the IA interface; there is also a [https://archive.org/download/archiveteam-mypodcast-dataonly misc] item<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || [[User:JesseW|JesseW]] || 1,348 || Partial (each item has a list of groups, but there's no overall list) || [[Google Groups]]; This is divided into items by the initial two letters (or digits or underscore). The item for "[https://archive.org/details/archiveteam-googlegroups-th th]" has an inconsistent title and category.<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || 3 || No || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category (streetfiles)]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || 30 || No || [[Yahoo! Voices]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || 2213 || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] || [[Twitch.tv]]<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || 40 || || [[Fotopedia]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || 47 || || [[Canv.as]]; WARCs<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || 82 || || [[Ancestry.com]]; WARCs<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== WARC ====<br />
* https://archive.org/details/fenopy-se-fire-grab-2014-12-30-16-38-13<br />
* https://archive.org/details/netszar_com_2015_06<br />
* https://archive.org/details/swipnet-searchengine-crawl-nonrecursive<br />
* https://archive.org/details/swipnet-searchengine-crawl-recursive<br />
* https://archive.org/details/kajaszoszentpeter_hu_2015_06<br />
* https://archive.org/details/warc-hallofshame.gp.co.at<br />
* https://archive.org/details/warc-freakedenough.at<br />
<br />
==== FTP ====<br />
* https://archive.org/details/2014.0102.mail.digipro.rs<br />
* https://archive.org/details/2014.12.ftp.dlink.biz_201501<br />
* https://archive.org/details/2015.01.12.ftp.sunet.sePubOpenBSD<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
* https://archive.org/details/mail.google.com-saved-1Oct2014<br />
* https://archive.org/details/Data2September2013.tar (Gunnerkrigg Court homepage comments snapshots)<br />
* https://archive.org/details/fotodisco-raw-items<br />
* https://archive.org/details/qwikidisco-raw-items<br />
* https://archive.org/details/twitpicdisco-raw-items<br />
* https://archive.org/details/maemo-fremantle-ovi<br />
* https://archive.org/details/toontown_infinite_github_20150103<br />
* https://archive.org/details/amplicate_sitemaps_20140218<br />
* https://archive.org/details/twitch-raw-items<br />
* https://archive.org/details/actionbutton_mini.tar<br />
* https://archive.org/details/ageofnerds_mini<br />
* https://archive.org/details/2015feb06a07FuturamerlinAList<br />
* https://archive.org/details/worldpeacehaven_gmail_Xaa<br />
* https://archive.org/details/worldpeacehaven_gmail_Xab<br />
* https://archive.org/details/2015feb02ob<br />
* https://archive.org/details/2014dec09spe2<br />
* https://archive.org/details/bigougit_mini_v2<br />
* https://archive.org/details/galman33_mini<br />
* https://archive.org/details/urls2015dec02n2<br />
* https://archive.org/details/493nfos<br />
* https://archive.org/details/archiveteam_dev_env_v1_appliances<br />
* https://archive.org/details/Kazbeg_Panorama.jpg -- If tags can be edited by non-owners, this probably shouldn't have the ''archiveteam'' tag.<br />
* https://archive.org/search.php?query=subject%3A%22wallbase%22 -- 10 different items, representing efforts at saving [[wallbase.cc]]; need to be sorted and organized<br />
* https://archive.org/search.php?query=subject%3A%22aol%20archiveteam%2C%20aol%20files%2C%20aol%20protocol%22 -- 6 items that need their subject tags cleaned up<br />
* https://archive.org/search.php?query=subject%3A%22Tabblo%22%20AND%20NOT%20collection%3Aarchiveteam -- 5 of the 11 Tabblo items are not in the Archiveteam collection<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=JuniorNet&diff=22019JuniorNet2015-02-25T02:09:51Z<p>Archive Maniac: /* External Links */ Captialization</p>
<hr />
<div>{{Infobox project<br />
| title = JuniorNet<br />
| logo = Juniornet logo.gif<br />
| image = Juniornet screenshot wayback 20010605004318.png<br />
| description = Screenshot of juniornet.com on 2001-06-05<br />
| URL = {{url|1=http://www.juniornet.com}}<br />
| project_status = {{closed}}<br />
| archiving_status = {{lost}}<br />
}}<br />
<br />
'''JuniorNet''' was a subscription based online portal for children, which included games, email ('Steammail'), children-friendly chatrooms, etc. JuniorNet often distributed free promotional CD-ROMs of some of its software to schools and through software compilations and bundles. <br />
<br />
JuniorNet "nearly became a victim of the dot-com downturn"<ref name=juniornetdownturn>https://web.archive.org/web/20061101052855/https://www.juniornet.com/companyinfo_new/pressreleases/11-16-01.html</ref> when the company went bankrupt in 2001 and was acquired by former employees. The website remained virtually unchanged until around mid-2010 when the site was quietly shut down.<br />
<br />
== History ==<br />
JuniorNet was founded in Boston, Massachusetts<ref>https://web.archive.org/web/20131125171420/http://edition.cnn.com/2000/TECH/computing/01/13/super.kids.idg/index.html</ref> in 1996 by Alan Rothenberg and officially launched in March 1999 as an "[ad-free] on-line learning service for children". The service was marketed mainly towards parents who were concerned about their children's safety online. The idea was that JuniorNet would provide a "safer [online] playground" for children, via its own proprietary network.<ref>https://web.archive.org/web/20141111144109/http://partners.nytimes.com/library/tech/99/03/circuits/articles/11juni.html</ref> Children would not only be protected from "seedy websites" and predators, but also from any advertisements. <br />
<br />
JuniorNet offered games, stories, and other interactive content to their subscribers via CD-ROM and internet download, as well as providing email and screened bulletin boards. JuniorNet also had partnerships with various companies to provide new content to their users, such as Highlights, Sports Illustrated, and Ranger Rick magazine.<br />
<br />
In April 1999, internet service provider RCN Corporation bought a 47.5% stake in JuniorNet Corporation for $47 million,<ref>https://web.archive.org/web/20131125171207/http://www.nytimes.com/1999/04/29/business/company-news-rcn-takes-47.5-stake-in-juniornet-for-47-million.html</ref> with the intention of incorporating the service into its regular plan.<ref>https://web.archive.org/web/20131126013725/http://kidscreen.com/1999/06/01/25645-19990601/</ref> Three VC firms also invested an additional $23 million total into JuniorNet.<ref>https://web.archive.org/web/19991128232433/http://www.boston.com/technology/venture_capital/hub_net.shtml</ref> Concurrently, JuniorNet acquired one of RCN's subsidiaries, Lancit Media Entertainment, for $25 million.<ref name=rcn_annualreport_1>https://web.archive.org/web/20040726080214/http://www.rcn.com/investor/annual_reports/2001/AnnualReport_Financial_1.pdf</ref> Lancit Media had been acquired by RCN in February 1998, as the company was having significant financial problems.<ref>http://www.secinfo.com/dQgdj.79.htm</ref><ref>https://web.archive.org/web/20000607111910/http://www.current.org/ch/ch803L1.html</ref><ref>https://web.archive.org/web/20000919132620/http://www.current.org/ch/ch803L2.html</ref><ref>https://web.archive.org/web/20150222062044/http://www.fool.com/dtrouble/1997/DTrouble970703.htm</ref> At the time, Lancit Media produced several award winning TV shows, including Reading Rainbow and The Puzzle Place (Reading Rainbow later became a JuniorNet "Premier Partner",<ref>http://web.archive.org/web/20001018013039/http://www.juniornet.com/partners/partners/reading_pop.cgi</ref> with LeVar Burton being named "company spokesperson".<ref>https://web.archive.org/web/20010907093613/http://www.juniornet.com/companyinfo/pressroom.cgi?TAB=pressreleases&NAME=47</ref>) Around the same time of acquiring Lancit Media, JuniorNet independently co-produced a children's television show called The Zack Files.<ref>http://www.imdb.com/company/co0025329/</ref><br />
<br />
Buying Lancit Media helped JuniorNet get a deal with several PBS stations in June 2000,<ref>https://web.archive.org/web/20131125223519/http://www.current.org/wp-content/themes/current/archive-site/cm/cm013jrnet.html</ref><ref>https://web.archive.org/web/20131125225006/http://www.nonprofitnews.com/archive/55977</ref> who were then struggling from government funding cutbacks. The deal involved these PBS stations getting an equity stake in JuniorNet (described as "a sliver" of the company<ref>https://web.archive.org/web/20131125225018/http://articles.baltimoresun.com/2000-06-29/features/0006290016_1_mpt-public-broadcasting-service-online-service</ref>) in exchange for promoting and distributing JuniorNet software and supposedly producing a JuniorNet TV show.<ref>https://web.archive.org/web/20131125225005/http://www.siliconinvestor.com/readreplies.aspx?msgid=13953617</ref> Stations that agreed to promote JuniorNet would also receive $9.95 (the cost of the subscription per month) for each family the station recruited if the subscription lasted at least one year. (JuniorNet later used a similar payment model for their affiliate program.<ref>https://web.archive.org/web/20020415014742/http://www.juniornet.com/affiliate/</ref>)<br />
<br />
The PBS deal wasn't exactly popular. Critics accused the PBS stations of being "self-serving" and complained that the venture threatened the PBS stations non-commercialism and could lead to "conflicts of interests". PBS itself wasn't a part of the deal either and had "sensitivities" about a potential upcoming JuniorNet-funded season of Reading Rainbow via a limited partnership called "PTV VisionWorks". The supposed JuniorNet inspired show never came to be, probably partly due to PBS' strict guidelines.<br />
<br />
Apparently neither the PBS deal nor Lancit Media's productions were very profitable for JuniorNet. In early 2000, at the peak of the dot-com bubble, JuniorNet's finances weren't looking very good: JuniorNet executives were looking for someone to buy-out or merge the company and RCN was getting concerned about the "ultimate recovery of its investment"<ref>https://web.archive.org/web/20131125173341/http://www.sec.gov/Archives/edgar/containers/fix036/1041858/00/000104185800000015/0001041858-00-000015.txt</ref> as the "estimated future cash flows related to this investment indicated that an impairment of the full value had occurred".<ref>https://web.archive.org/web/20050409000559/http://rcn.com/investor/annual_reports/2000/Final_RCN_Financial.pdf</ref><br />
<br />
In September 2000, JuniorNet announced that 45 elementary schools across the US would receive a free subscription to JuniorNet for the entire 2000-2001 school year, as part of a test to "find out more about how educators and students can use JuniorNet in a school setting."<ref>https://web.archive.org/web/20001030043859/http://www.juniornet.com/companyinfo/pressroom.cgi?TAB=pressreleases&NAME=50</ref><ref>https://web.archive.org/web/20010303134559/http://www.juniornet.com/schooltest/index.cgi</ref> However, in December 2000, JuniorNet fired about a third of their workforce <ref>https://web.archive.org/web/20131125171250/http://www.dmnews.com/four-more-dot-coms-reduce-staff-to-stay-afloat/article/69841/</ref> and by August 2001, the JuniorNet Corporation was no longer in operation.<ref>https://web.archive.org/web/20131125173238/http://www.wnd.com/markets/action/getedgarwindow?accesscode=95012301508149</ref> (According to a former JuniorNet employee, the writing had been on the wall since around late 1999.<ref>https://web.archive.org/web/20131126002028/http://www.mail-archive.com/lingo-l@penworks.com/msg03371.html</ref>) Responsibility for juniornet.com was taken over by J2 Interactive, a company founded by former JuniorNet employees,<ref name=juniornetdownturn /> with RCN owning 23.5% equity in J2.<ref name=rcn_annualreport_1 /> Lancit Media was foreclosed back to RCN; RCN used Lancit Media to start RCN Entertainment, effectively ending the 23 year-old company.<ref>http://www.linkedin.com/pub/laurence-lancit/12/2bb/82a</ref><br />
<br />
After going under, JuniorNet continued to operate in a sort of corporate auto-pilot with virtually no changes to the website until 2010, when the site was quietly shut down.<ref>https://web.archive.org/web/20100723143604/http://www.juniornet.com/</ref> In 2007, JuniorNet's trademarks were canceled by the USPTO due to inactivity/non-payment.<ref>https://web.archive.org/web/20131126004212/http://www.trademarkia.com/juniornet-75438867.html</ref> J2 Interactive went from being "creative media specialists" (as well as specializing in "cybercafe services" and "custom kiosks"<ref>https://web.archive.org/web/20050404133853/http://www.j2kiosks.com/index.asp</ref>) to "healthcare technology consultants" sometime in the early 2010s.<ref>https://web.archive.org/web/20070209042255/http://www.j2interactive.com/</ref><ref>https://web.archive.org/web/20130610165554/http://j2interactive.com/</ref> <br />
<br />
On March 19, 2014, the domain name juniornet.com expired and was not renewed by J2 Interactive.<ref>http://whois.domaintools.com/juniornet.com</ref><br />
<br />
== References ==<br />
<references /><br />
<br />
== External links ==<br />
* [https://web.archive.org/web/20100723143604/http://www.juniornet.com/ JuniorNet Official Website] (2010)<br />
* [https://archive.org/details/juniornet-v1.1 JuniorNet v1.1 CD-ROM] (1999)<br />
* [https://archive.org/details/juniornet-demo-video JuniorNet Demo Intro Video] (1999)<br />
* [https://archive.org/details/juniornet-demo-video-2 JuniorNet Web Demo Intro Video] (~2002)<br />
* [https://archive.org/details/juniornet-demo-games JuniorNet Sample Games (Windows/Mac)]<br />
* [https://archive.org/details/weekly_reader_gore_interview Weekly Reader Magazine - Al Gore Presidential Student Q&A Interview]</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21681User talk:Bzc6p2015-02-05T19:48:18Z<p>Archive Maniac: /* FTP Sites */ Messaged</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative.<br />
<br />
Or if you don't want to save files on to your computer and delete them every time you crawl a site:<br />
<br />
<pre>wget http://web.archive.org/save/urlgoeshere.com -r --spider -np -e robots=off</pre> [[User:Archive Maniac|Archive Maniac]] 14:15, 1 February 2015 (EST)<br />
<br />
:Thanks for your input; allows me to learn more. And if you'd like, I'm finding some Hungarian FTP hosts to archive. :) [[User:Archive Maniac|Archive Maniac]] 23:45, 1 February 2015 (EST)<br />
<br />
:Like here's the first one I uploaded: https://archive.org/details/ftp.debella.aszi.sztaki.hu . [[User:Archive Maniac|Archive Maniac]] 12:17, 2 February 2015 (EST)<br />
<br />
::Oh, and about about the fact that bulk saving URLs on to the Wayback Machine is not as efficient and making WARC files, the newer Wget releases can create WARC files; to be hones, it's effectively an alternative to the ArchiveBot. According to Arkiver, the Internet Archive staff can inject WARC files into the Wayback Machine, even if crawled by Wget.<br />
<br />
::Oh yeah, and check the dec3199 tag daily ([https://archive.org/search.php?query=subject%3A%22dec3199%22 link]). I've put more Hungarian FTP sites on the Internet Archive for your sake (because you're like my buddy) and categorized it under said tag. And guess what surprise I have for you? That's right—I'm busy uploading Microsoft's FTP site (66 GB zipped!) on to the Internet Archive! [[User:Archive Maniac|Archive Maniac]] 14:48, 5 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21659User talk:Bzc6p2015-02-02T17:17:06Z<p>Archive Maniac: /* FTP Sites */ Messaged</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative.<br />
<br />
Or if you don't want to save files on to your computer and delete them every time you crawl a site:<br />
<br />
<pre>wget http://web.archive.org/save/urlgoeshere.com -r --spider -np -e robots=off</pre> [[User:Archive Maniac|Archive Maniac]] 14:15, 1 February 2015 (EST)<br />
<br />
:Thanks for your input; allows me to learn more. And if you'd like, I'm finding some Hungarian FTP hosts to archive. :) [[User:Archive Maniac|Archive Maniac]] 23:45, 1 February 2015 (EST)<br />
:Like here's the first one I uploaded: https://archive.org/details/ftp.debella.aszi.sztaki.hu . [[User:Archive Maniac|Archive Maniac]] 12:17, 2 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21657User talk:Bzc6p2015-02-02T04:45:27Z<p>Archive Maniac: /* FTP Sites */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative.<br />
<br />
Or if you don't want to save files on to your computer and delete them every time you crawl a site:<br />
<br />
<pre>wget http://web.archive.org/save/urlgoeshere.com -r --spider -np -e robots=off</pre> [[User:Archive Maniac|Archive Maniac]] 14:15, 1 February 2015 (EST)<br />
<br />
:Thanks for your input; allows me to learn more. And if you'd like, I'm finding some Hungarian FTP hosts to archive. :) [[User:Archive Maniac|Archive Maniac]] 23:45, 1 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21654User talk:Bzc6p2015-02-01T19:16:56Z<p>Archive Maniac: /* FTP Sites */ Minor fix. Apologies for all these edits</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative.<br />
<br />
Or if you don't want to save files on to your computer and delete them every time you crawl a site:<br />
<br />
<pre>wget http://web.archive.org/save/urlgoeshere.com -r --spider -np -e robots=off</pre> [[User:Archive Maniac|Archive Maniac]] 14:15, 1 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21653User talk:Bzc6p2015-02-01T19:15:42Z<p>Archive Maniac: Sorry for doing this.</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative.<br />
<br />
Or if you don't want to save files on to your computer and delete them every time you crawl a site:<br />
<br />
<pre>wget http://web.archive.org/save/urlgoeshere.com-r --spider -e robots=off</pre> [[User:Archive Maniac|Archive Maniac]] 14:15, 1 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21652User talk:Bzc6p2015-02-01T18:05:42Z<p>Archive Maniac: /* FTP Sites */ More</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. By the way, I figured out that you can save tons of urls on the Wayback Machine if you crawl/mirror a site using Wget (url should be http://web.archive.org/save/urlgoeshere ). In total, I do: <pre>Wget http://web.archive.org/save/http://exampleurl.com -m -p -np -e robots=off</pre> Hope this helps. It's sort of an ArchiveBot alternative. [[User:Archive Maniac|Archive Maniac]] 13:05, 1 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21651User talk:Bzc6p2015-02-01T17:22:52Z<p>Archive Maniac: /* FTP Sites */ new section</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)<br />
<br />
== FTP Sites ==<br />
<br />
Hey, bzc6p, have you ever considered trying to crawl FTP sites (see [[FTP]] article]])? As of now, I uploaded two on to the Internet Archive. [[User:Archive Maniac|Archive Maniac]] 12:22, 1 February 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=AOL&diff=21650AOL2015-02-01T17:21:47Z<p>Archive Maniac: /* Software */ Removed broken links.</p>
<hr />
<div>{{Infobox project<br />
| title = AOL<br />
| image = AOL_Screen_Shot_2013-01-27_at_8.42.32_PM.png<br />
| project_status = {{online}} on January 28, 2013<br />
| archiving_status = {{inprogress}} by godane<br />
| irc = aohell<br />
}}<br />
<br />
This is about archiving the original AOL, not AOL's current website.<br />
The AOL system is currently in major disrepair. It is as if they left the machines sitting in the datacenter, and as they die, they do not fix any issues.<br />
There is much broken infrastructure.<br />
<br />
= Getting Started =<br />
You'll need to sign up for a user account here: [https://new.aol.com https://new.aol.com]. Not every field is required- phone definitely isn't.<br />
Make sure there are no special chars in the username or password, and the username and password is short (8 chars or less).<br />
<br />
== Client software ==<br />
If you're running the software through Wine, your choices are very simple: AOL 4.0 or AOL 5.0. Everything else is in various states of [http://appdb.winehq.org/objectManager.php?sClass=application&iId=80 not-workingness].<br />
<br />
If you're not running through Wine, you can use any version you want, but there's good reasons for not using AOL 6 or higher:<br />
* Fairly rich legacy of third-party tools up to 5.0<br />
* Documentation is probably applicable to every later version, but there's not as many or any ways to make good use of it in later versions<br />
<br />
Why you might want to use a later version:<br />
* More things on the site will certainly work (there may be less stuff accessible, but more of it should work because it's actively supported by AOL)<br />
<br />
AOL has copies of the 9.x series on their site, and [http://www.oldversion.com/windows/america-online-5-0 Oldversion] has copies of all the other versions.<br />
<br />
== Setup ==<br />
In general (these instructions assume you create the account beforehand on AOL's site):<br />
<br />
<blockquote><br />
Run aol50.exe (or your chosen version's installer) <br><br />
Choose Current member, and then add my account to this computer. <br><br />
When the installer finishes, and AOL launches, it will bring up the AOL setup screen. <br><br />
You'll want Expert Setup, so you can tell AOL right away that you want to use your TCP/IP connection to connect to AOL (instead of dial-up). <br><br />
Once you select TCP/IP connection, AOL asks if you want to sign on right away- unless you have proxy settings or can't use the defaults, just hit next. <br><br />
If you need to change any of the settings before connecting, read the message box carefully- it explains where to go to change the settings. <br><br />
You want to pick You already have a screen name and password - fill in the account name and password you created earlier on AOL's site. <br><br />
Press Next, and you are done. <br><br />
</blockquote><br />
<br />
Wine notes:<br />
<blockquote><br />
Biggest difference is that you can't create an account inside of AOL 5 or less, so you must create the account beforehand.<br />
</blockquote><br />
<br />
= Protocol =<br />
<br />
== Diagrams ==<br />
<br />
<pre>Quantumlink Packet<br />
+---+---+---+---+---+---+---+---+---+======+---+<br />
| Z | CRC | Seq |ProgID | Data |CR | <br />
+---+---+---+---+---+---+---+---+---+======+---+<br />
</pre><br />
<br />
<pre>P3 Packet (old style)<br />
+---+---+---+---+---+---+---+======+---+<br />
| Z | CRC | Seq | Token | Data |CR | <br />
+---+---+---+---+---+---+---+======+---+<br />
</pre><br />
<br />
<pre>P3 Packet (new style)<br />
+---+---+---+---+---+---+---+---+---+======+---+<br />
| Z | CRC | Seq | Token | Data |CR | <br />
+---+---+---+---+---+---+---+---+---+======+---+<br />
</pre><br />
<br />
== Definitions ==<br />
<br />
=== P3 ===<br />
<br />
Communication protocol used to communicate over lossy channels (but can be used over TCP).<br />
<br />
There's two types of the packet:<br />
<br />
# the old one with a plain CRC<br />
# the new one with a CRC where it is encoded redundantly as defined in the Q-Link protocol.<br />
<br />
It consists of packets containing magic start byte, CRC, sequence number, packet type, data, and the magic stop byte.<br />
<br />
=== Form Definition Operator / Form Display Operation (FDO91 or FDO) ===<br />
<br />
Form display convention or protocol. It consists of a Token and an Atom Stream. It goes into the data portion of a P3 packet. Officially, it's a programming language—but in reality—it's like if someone mixed a scripting language, a state machine, X11 display protocol, tree structure, database, GUI toolkit, and RPC protocol into one big ugly mess.<br />
<br />
FDO91 is for Windows and Macintosh. FDO88 is for Apple II.<br />
<br />
=== Token ===<br />
<br />
A Token is a 2 byte value used to dispatch handlers to handle the Atom Stream.<br />
<br />
=== Atom Stream ===<br />
<br />
An Atom Stream consists of an Atom Stream ID and Atoms. Atom Streams are serialized (assembled) and unserialized (disassembled).<br />
<br />
=== Atom ===<br />
<br />
An Atom consists of a 2 byte Atom ID. The Atom ID is processed as two values. The first byte describes the category of the Atom and the second byte describes a specific command or Turing machine operation. Next is zero or more arguments of various types. <br />
<br />
Care is needed to assemble/dissemble the Atom Streams because a framing error will cause the rest of the Atom Stream to read out like garbage.<br />
<br />
=== Star Tool ===<br />
<br />
The Star Tool is additional blobs that is appended or patched into an existing install of an AOL client. <br />
<br />
Typically you find a zip file of the tools and you copy it into the install folder with the appropriate files into the appropriate subfolder (ie, TOL files go into the folder with TOL files). When installed, appears as a <code>*</code> in the application's menu bar. Things such as "Invoke Database Record" are located in this menu.<br />
<br />
=== Atomic Debugger ===<br />
<br />
The Atomic Debugger disassembles the Atom Streams as it passes through the client.<br />
<br />
=== Remote Area Information Manager (Rainman) ===<br />
<br />
Protocol for displaying information (Pages) in a window.<br />
<br />
=== Visual Publisher ===<br />
<br />
Designs Pages to create AMP files.<br />
<br />
=== Database Form ID ===<br />
<br />
A 32 bit unsigned integer represented as two unsigned 16 bit integers in decimal format used to retrieve forms. For example: the ID "123-4567" is 8065495 in base 10 or 0x007B11D7.<br />
<br />
=== Tool On Demand (TOD) ===<br />
<br />
AOL software update.<br />
<br />
== Links ==<br />
<br />
=== P3/FDO/Tokens/Atoms ===<br />
<br />
* http://web.archive.org/web/20020205182212/http://www.aol-files.com/fdo91/index.html<br />
* http://web.archive.org/web/20020329213511/http://www.aol-files.com/downloads/docs/index.shtml<br />
* http://www.angelfire.com/sk2/Twisted/Anti-AOL.htm<br />
* http://web.archive.org/web/20011011181824/http://www.aol-files.com/misc/theaolprotocol.wri<br />
** Plaintext: http://www.cs.columbia.edu/~ricardo/misc/docs/theaolprotocol.txt<br />
* http://sicexcels.tripod.com/rm-vpd.html<br />
* https://code.wireshark.org/review/gitweb?p=wireshark.git;a=blob;f=epan/dissectors/packet-aol.c<br />
* http://lithiumnode.com/supernewhappy/oldschool/v3/<br />
** http://lithiumnode.com/extract/ln/aol/<br />
<br />
Lots of sources:<br />
Documents covering how to make AOL forms and various such things:<br />
* http://sicexcels.tripod.com/rm-vpd.html<br />
* aol://4344:1087.navbar.8505088.517972198 Navigation bar specifications- AOL resource with some forms and examples<br />
<br />
Samples of custom forms:<br />
* http://web.archive.org/web/20010123213500/http://www.aol-files.com/fdo91/fdoorig/index.html<br />
<br />
* http://www.mattmazur.com/archive/aol-files/downloads/tools/win/star/index.html - Open up the links under More Info<br />
<br />
Some FDO lessons:<br />
* http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson01.html<br />
* https://web.archive.org/web/20130525202943/http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson02.html<br />
* https://web.archive.org/web/20130525192737/http://www.mattmazur.com/archive/aol-files/fdo91/tutorial_lesson03.html<br />
* http://web.archive.org/web/20010418134911/http://www.aol-files.com/fdo91/fdoman.html<br />
<br />
About the class names:<br />
* http://web.archive.org/web/20010620011948/http://www.aol-files.com/articles/other_classes.html<br />
<br />
Here is an early version of aol-files.com:<br />
* http://web.archive.org/web/20010201224500/http://www.aol-files.com/index.html<br />
<br />
Atoms list:<br />
* http://web.archive.org/web/20010407015458/http://www.aol-files.com/fdo91/atoms/index.html<br />
<br />
More internal docs:<br />
* https://archive.org/details/aol-file-protocol-4400-701-to-800 (has select chapters from the FDO manual!)<br />
<br />
=== General ===<br />
<br />
* http://slashdot.org/story/01/10/09/1826205/the-america-online-protocol-revealed<br />
* http://www.applefritter.com/aol<br />
* http://billpstudios.blogspot.com/ncr/2012/07/how-america-online-created-internet.html<br />
<br />
=== PengAOL / Penggy ===<br />
<br />
* https://github.com/chfoo/penggy-mirror/tree/master/pengfork/src/p3<br />
* https://github.com/chfoo/penggy-mirror/tree/master/pengfork/src/fdo<br />
<br />
=== PlayNet / Quantum Link (Q-Link) ===<br />
* http://games.slashdot.org/comments.pl?sid=22404&cid=2408020<br />
* https://en.wikipedia.org/wiki/PlayNET<br />
* https://en.wikipedia.org/wiki/Quantum_Link<br />
* http://www.lyonlabs.org/svn/<br />
** https://github.com/chfoo/lyonlabs-org-mirror<br />
<br />
=== Instant AOL (Linux) "Gamera" ===<br />
<br />
* http://www.internetnews.com/xSP/article.php/439931<br />
* http://beta.slashdot.org/story/00/08/13/137233/gamera--aol-for-linux<br />
* instantaol.12-03-01.tar.gz<br />
* http://betanews.com/2000/05/29/kenton-releases-information-on-aol-for-linux/<br />
<br />
=== AOL Chat ===<br />
<br />
* http://www.scribd.com/doc/7234971/chat-arch<br />
<br />
== Reverse Engineering ==<br />
The trunk version of Wireshark includes a dissassembler for the AOL protocol that breaks out the basic header information, such as the packet type and the token. It doesn't go into any detail about the contents of the packet, but this is a good start. This isn't available for download yet, so you'll have to build it yourself, from the svn trunk; once built wireshark reports itself as 1.9.0.<br />
<br />
http://db48x.net/temp/Screenshot%20-%2001292013%20-%2008:28:31%20PM.png<br />
<br />
== Packet Types ==<br />
<br />
; INIT (x’23’) : Client sends this to the server to begin comminucation.<br />
; ACK (x’24’) : Acknowledge a packet as recieved, for instance an INIT or heartbeat.<br />
; SS (x’21’) : An SS requests the other end of the connection to send an SSR.<br />
; SSR (x’22’) : An SSR is a response to an SS.<br />
; NAK (x’25’) : Negative acknowlegement of a packet, when the packet was recieved incorrectly.<br />
; DATA (x’20’) : A packet containing data, identified by a token.<br />
; HEARTBEAT (x’26’) : The other side suspects that the line has dropped; respond with an ACK<br />
<br />
== Tokens ==<br />
Each packet has a token that determines what is in the data field of the packet. Documentation for these tokens is very sparse; it's likely that AOL never had a comprehensive document listing all of them. Instead, the documentation merely tells the reader to view the list of tokens while logged into the server.<br />
<br />
* http://web.archive.org/web/20010821101122/http://www.aol-files.com/fdo91/tokens/list_tokens.html<br />
* http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Basic.txt<br />
* http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Plus.txt<br />
* https://web.archive.org/web/20130130023852/http://koin.org/files/aol.aim/aol/fdo/manuals/WAOL.doc<br />
<br />
== Downloading a file ==<br />
#← mD – client requests a file (by id?)<br />
#→ uJ – unicode file name<br />
#→ tf – start of a download; includes file name (non-unicode?); requests immediate xG ack<br />
#← xG – client acks download<br />
#→ FF – packet containing file data, no ack requested<br />
#→ F7 – packet containing file data, no ack requested<br />
#← xG – periodic acks<br />
#→ F9 – packet containing file data, last in sequence<br />
#← eX – mail download complete (unrelated?)<br />
<br />
[21:24:10] <db48x> there's a packet coming from the server with a token tf<br />
[21:24:16] <db48x> the data has a filename in it<br />
[21:24:59] <db48x> the data is in a series of packets with token FF and F7 (no explanation of the difference is available)<br />
[21:25:24] <balrog_> but like when you view a file library <br />
[21:25:34] <balrog_> how does it tell the server which library to display?<br />
[21:25:36] <db48x> the last packet of the file has token F9<br />
21:25:42] <db48x> haven't figured that out yet<br />
[21:25:56] <balrog_> ah<br />
[21:26:01] <db48x> before this file in the capture there are packets with tokens EB and uJ going from the client to the server<br />
[21:26:03] <balrog_> none of the documentation covers this?<br />
[21:26:09] <balrog_> aaah<br />
[21:26:44] <db48x> and mD<br />
[21:26:51] <db48x> and tokens AT and tD coming back<br />
[21:29:29] <db48x> looks like the tD coming back has the metadata in it<br />
[21:30:50] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Basic.txt<br />
[21:31:12] <balrog_> http://sicexcels.tripod.com/~SicExcels/rm-vpd_info/TokenTypes_Plus.txt<br />
[21:31:16] <balrog_> quite incomplete <br />
[21:33:26] <db48x> mD = download now, then<br />
[21:34:31] <db48x> and an mF, file description<br />
[21:34:41] <db48x> followed by an AT with a bunch of data<br />
[21:35:35] <db48x> looks like labels for buttons like 'download now', 'download later', 'ask the staff', 'related files'<br />
[21:35:56] <db48x> packet 538<br />
[21:37:19] <db48x> continues in the next AT packet, 540, which looks like it has the description in it<br />
[21:37:29] <db48x> talks about using ShrinkIt to unpack the file<br />
<br />
== Retransmissions ==<br />
<br />
In normal transmission, messages are being passed in both directions. Each message sent carries the number of the last message correctly received, which is an implicit acknowledgement of all messages up to and including that one. When a message is received correctly, it is passed up to the application level. Then the response number of the message is examined. If it acknowledges any messages currently in the buffer, they are dropped from the buffer. If the receiver of the message has received a certain number of messages without acknowledging, it will send an ACK to keep the sender’s window from closing. (A window is closed when the buffer of sent messages is full, preventing any more transmissions.)<br />
<br />
If a single message gets mangled, the receiver will get a bad checksum, and send a NAK (assuming its window is open) requesting re-transmission of all messages starting at the mangled one. It will then ignore out of sequence messages until it gets the mangled message correctly. If its window is closed, and there is no NAK queued, it will queue the NAK for transmission when the window opens. If there is a NAK queued already, it will ignore the new one.<br />
<br />
The same NAK logic would apply to messages received out of sequence, assuming that a NAK had not already been sent.<br />
<br />
In all cases, where a numbered message is sent, the window is checked. If it is closed, an SS is sent to try to re-open the wondow. When an SS has been sent, and no SSR has been received, all NAKs are accepted, but they are ignored, instead of being processed.<br />
<br />
When a SSR is received, any messages that were not received are queued for transmission. When there is a message to send, and the window is open, it is sent and put into buffer. If the window is closed, the message is queued for transmission. This is separate from NAK queue.<br />
<br />
= URLs =<br />
<br />
From [http://www.applefritter.com/aol http://www.applefritter.com/aol]:<br />
The url for the Apple II New Files library is aol://4400:8287, and here is the URL for the UnForkIt file, contained in that library: aol://4401:8287:636250. <br />
The first value identifies the resource type. In this case, either 4400 for a library, or 4401 for a file. <br />
The second number, 8287, is the library ID. 636250 is the file ID. The file IDs are not consecutive within libraries.<br />
== aol://nnnn ==<br />
<br />
* 1722: Keywords<br />
* 2719: Chatrooms (Private room through keyword: aol://2719:2-2-room name)<br />
* 3548: User profiles<br />
* 4344: Interactive page<br />
* 4400: File libraries<br />
* 4401: Files<br />
* 586x: ???<br />
* 9293: IM: aol://9293:[sn] (from http://justinakapaste.com/category/aolaim-tutorials/)<br />
<br />
== Examples ==<br />
* aol://4344:1264.a2main.10029531.514525857<br />
* aol://4400:8287<br />
* aol://4344:1264.a2abt.10037404<br />
* aol://4344:117.mtv.591130<br />
* aol://4344:226.llll.2755674.520114429 (Access code: 3675)<br />
<br />
== Sources ==<br />
<br />
List of <code>aol://</code> URLs. See Links section above for HTTP links about AOL.<br />
<br />
* http://web.archive.org/web/20060207004722/http://daol.aol.com/aolatoz<br />
* http://aolhostages.tripod.com/oldused-KWs.txt<br />
* http://www.oocities.org/sunsetstrip/club/5468/secretz.txt<br />
* http://koin.org/files/aol.aim/aol/fdo/tools/software%20library%20list%20bmb_libs.xls<br />
<br />
== Structure ==<br />
<balrog_> yes, but aol://4344:nnnn doesn't work without the extra<br />
[19:52] <balrog_> aol://4344:1264.a2main.10029531 also works<br />
<balrog_> simply aol://4344:1264.a2main does not work.<br />
<br />
[20:17] <DrainLbry> so to summarize we've got aol://4400:ID (from<br />
spreadsheet), for file libraries, and<br />
aol://4344:uniqueidentifier for interactive content<br />
[20:18] <balrog_> aol://4344:uniqueidentifier:ID<br />
<balrog_> as per<br />
http://web.archive.org/web/20060207004722/http://daol.aol.com/aolatoz<br />
keywords used to be aol://1722:keyword<br />
<balrog_> but that's no longer working<br />
<br />
= Software =<br />
* http://web.archive.org/web/20010713011523/http://www.aol-files.com/downloads/tools/win/dbview/index.html<br />
* http://web.archive.org/web/20010128152400/http://www.aol-files.com/downloads/tools/mac.html<br />
* http://www.ppcmla.com/downloads/<br />
* aol://4344:1344.bpsfront.10164598.526964474 BPS SoftWare - addons for AOL<br />
<br />
<raylee> given #aohell seems dead,<br />
<raylee> i'll just say it here<br />
<raylee> I just found startools aol / master aol / the debugging tools for AOL<br />
<raylee> http://www.aciddr0p.net/aolunorgd/<br />
<raylee> maol*.zip<br />
<raylee> the master.tol / master.aol file goes into the tools dir.. then dbinvokes work PERFECTLY...<br />
<br />
* Regarding archival of file libraries: I (slipstream/raylee) made an autoit script to drive the AOL client (only works perfectly on 9.7) to get everything (metadata, and files). [http://pastebin.com/bXrENu3W Here's the script. Updated 7-Sep-2014.] The script only fails on connection loss or AOL client crash. (By the way, the reason it doesn't work on 7.0 and below, is because I already tried that, and random lag (if you've used these old clients you'll know what I'm talking about) basically kills the script.)<br />
<br />
* https://savannah.nongnu.org/projects/pengfork (Penggy)<br />
** https://lists.gnu.org/archive/html/pengfork-devel/<br />
<br />
= Goals =<br />
== save forums/files/etc ==<br />
AOL has a large number of forums on every topic, file libraries containing art, shareware, game mods, etc, etc. These should be fairly easy to enumerate, and once found it should be fairly easy to download all of the forum messages and files. Archives of these would be worth saving.<br />
== save everything ==<br />
Every window that you can click on in AOL was created by a 'producer' at AOL. Many of them are essentially identical, file libraries for instance, but many are also one-offs created for a specific purpose. We ought to save these as well. Going this route will take a more thorough understanding of both the AOL protocol and the FDO scripts.<br />
= Plans =<br />
There are several ways to go about this, with tradeoffs that are only lightly explored.<br />
== custom scraper ==<br />
Write a scraper in python that understands the AOL protocol and FDO scripts, and writes everything to warc files. Warc save us much of the trouble of figuring out how to organize everything on disk. they also make it much easier to create a server than can pretend to be the AOL server, or that can translate into http/html to allow anyone with a web browser to see what AOL was like.<br />
== wget-aol ==<br />
Modify wget to support the AOL protocol. Very ambitious, but it would let us reuse wget's infrastructure, which may make the task easier; we'd be able to concentrate on just implementing the protocol and FDO parsing and leave the rest to wget. Would that reuse save us time, or would dealing with wget's internals drive us mad? Hard to say. This method would also allow us to create warc files.<br />
== script the client ==<br />
Drive the real AOL client, perhaps with debugging tools installed, in order to capture both the FDO sources and screenshots of the rendering. Probably more fragile, but we wouldn't have to understand the actual protocol. Wouldn't be able to create warc files.<br />
<br />
== Complete client clone ==<br />
<br />
An attempt to write a client library, client interface, client recording suite is located at https://github.com/chfoo/notaol/. It's far from complete; currently stuck on atom serialize/unserialize.<br />
<br />
= Archives =<br />
<br />
Archives are being uploaded to IA by godane: https://archive.org/search.php?query=creator%3A%22AOL+Files%22<br />
<br />
{{navigation box}}</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21575User talk:Bzc6p2015-01-26T19:14:19Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Another tid-bit.</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). Oh yeah, and I should probably not tell anyone else about it, which I will do. [[User:Archive Maniac|Archive Maniac]] 14:14, 26 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21574User talk:Bzc6p2015-01-26T19:06:19Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)<br />
:::Add a period in front of the domain name, e.g. https://web.archive.org/web/20011211041409/http://.eecad.sogang.ac.kr/~chang/games/dkc2/ (note that you need to do this for all links too) [[User:PiRSquared|PiRSquared]] 23:59, 25 January 2015 (EST)<br />
::::Thanks PiR. (and sorry for what I said above; I was upset about something on IRC). [[User:Archive Maniac|Archive Maniac]] 14:06, 26 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21559User talk:Bzc6p2015-01-24T22:30:48Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)<br />
:::Wow, he is not nice. Just look how he talks about the people on the IA Forums on IRC. He's also gloating about having access to everything on the Internet Archive. (And I saved your email in case I get banned for voicing my opinion, which really is true...) [[User:Archive Maniac|Archive Maniac]] 17:30, 24 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21557User talk:Bzc6p2015-01-24T19:47:46Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Oh yeah.</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::Ah, yes. That's what they mentioned. Thanks, bzc6p. I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21556User talk:Bzc6p2015-01-24T19:46:14Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)<br />
::I also have a bit of a problem—see, I want to access a site (http://eecad.sogang.ac.kr/~chang/games/dkc2/) on the Wayback Machine, but it's blocked by robots.txt... Also, many of Nintendo Europe's sites (e.g. nintendo.co.uk, nintendo.es, nintendo.fr) are excluded from the Wayback Machine entirely. Is there any way for me to access them? I mean, J.Scott's obviously not going to help out here. [[User:Archive Maniac|Archive Maniac]] 14:46, 24 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21552User talk:Bzc6p2015-01-23T23:32:38Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)<br />
:I literally meant what I said. The link I gave you lists all of the URLs on the Internet Archive. I asked how to view it as text-only. (By the way, it was taught to me on the #archivebot channel, which isn't on BadCheese). [[User:Archive Maniac|Archive Maniac]] 18:32, 23 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21550User talk:Bzc6p2015-01-22T23:56:54Z<p>Archive Maniac: /* View Archive.org Directories as Text Only */ new section</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)<br />
<br />
== View Archive.org Directories as Text Only ==<br />
<br />
Hi Bzc6p, I remember someone on the ArchiveTeam taught me how to view archive.org site directories (e.g. like these: http://web.archive.org/*/media.nintendo-europe.com/* ) as text-only in the browser. I forgot how to do it, so I've come to ask you how to do it. Do you know how? [[User:Archive Maniac|Archive Maniac]] 18:56, 22 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21435User talk:Bzc6p2015-01-06T17:25:06Z<p>Archive Maniac: /* What I'm Currently Doing */ Replied</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)<br />
<br />
:Thanks for replying. :) Shortly after I messaged you, somebody on a forum site taught me how to properly burn files to an M-Disc. And it was a success! A good, long offline backup for me! :D<br />
<br />
And it's a shame [[extra.hu]] is gone... It looked like an excellent web host...<br />
<br />
<br />
I also have issues with using wikiadownloader.py. It gives me this error:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
<br />
Do you know what that is? [[User:Archive Maniac|Archive Maniac]] 12:25, 6 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=21433User talk:Bzc6p2015-01-06T02:15:59Z<p>Archive Maniac: /* What I'm Currently Doing */ new section</p>
<hr />
<div>{{DISPLAYTITLE:User talk&#58;bzc6p}}<br />
<br />
== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)<br />
<br />
== What I'm Currently Doing ==<br />
<br />
Hi Bzc6p, it's been a little bit since I last talked to you. If you want to know what I'm currently doing, it's that I'm searching the depths of the Internet for links and saving them on to the Wayback Machine. I'm also uploading [https://archive.org/search.php?query=subject%3A%22dec3199%22 my own collections to the Internet Archive]. There's some stuff in there which you'll probably enjoy. :)<br />
<br />
And the icing on the cake is that I'm editing a few wikis, cleaning them up and trying to make them more informative.<br />
<br />
P.S. Do you forgive me and understand why I went into a very mad rage here those few times (which I shouldn't have)? I know the experience is over, but I feel embarrassed around you, given my extremely vulgar actions and how you're aware of it.<br />
<br />
Anyway, nice to message you again. Good luck saving Hungarian sites! :) [[User:Archive Maniac|Archive Maniac]] 21:15, 5 January 2015 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Deathwatch&diff=21076Deathwatch2014-12-14T19:05:00Z<p>Archive Maniac: Is this a better pun-based phrase?</p>
<hr />
<div>The '''Deathwatch''' is a central indicator of websites and networks that are shutting down and serves as an indicator of what happened to particular sites that shut down quickly.<br />
<br />
New sites should be added in chronological order, newest death date first. Forward-looking death dates should be added to the first list only. Sites large enough to warrant additional information will receive a dedicated page, linked from here and on [[:Category:Closing projects]].<br />
<br />
== Watchlist ==<br />
<br />
=== Getting things done ===<br />
<br />
[[Current Projects]] contains the up-to-date projects that are in progress. This small table keeps track of smaller projects by individual members.<br />
<br />
{| class="wikitable"<br />
! Website<br />
! Closing date<br />
! Project status<br />
! User<br />
! Archiving Status<br />
! Details<br />
! Archives<br />
! Archive Date<br />
! Archive Format<br />
|-<br />
| [[ArchiveBot]]<br />
| {{green|Saved}}<br />
| Downloaded website, dev and blog subdomains<br />
|<br />
|<br />
|<br />
|<br />
|<br />
| .warc.gz<br />
|-<br />
| Various<br />
| {{green|Saved}}<br />
| Downloaded website, forums, skins/plugins<br />
| [https://archive.org/search.php?query=winamp+warc]<br />
| 2013-11<br />
|<br />
|<br />
|<br />
| .warc.gz<br />
|- <br />
| [[Quick.io]] [http://www.quik.io/]<br />
| 2013-12-31<br />
| Closing<br />
| [[User:Arkiver]]<br />
| {{green|Saved}}<br />
| Downloaded the main website and the subdomains of the main website<br />
| COMING<br />
| 2013-12-13<br />
| .warc.gz<br />
|-<br />
| [[widgetbox]] [http://www.widgetbox.com/] [http://support.widgetbox.com/] [http://blog.widgetbox.com/] [http://cdn.widgetbox.com/] [http://help.widgetbox.com/] [http://pub.widgetbox.com/] [http://files.widgetbox.com/]<br />
| 2014-03-28<br />
| Closing<br />
| [[User:Arkiver]]<br />
| {{orange|In progress...}}<br />
| Downloading all the websites<br />
|<br />
| 2013-12-19 - present<br />
| .warc.gz<br />
|-<br />
| [[TechNet]] [http://technet.microsoft.com/]<br />
| 2014-09-30<br />
| Closing<br />
| [[User:Arkiver]]<br />
| {{orange|In progress...}}<br />
| Downloading full website<br />
|<br />
|<br />
| .warc.gz<br />
|}<br />
<br />
=== Pining for the Fjords (Dying) ===<br />
<br />
* [[Viddy]] and [[Epic]], video sharing services acquired by Fullscreen will both be shut down on December 15, 2014.<br />
<br />
* [[Relay.im]] was bought by Kik and closes on December 15, 2014.<br />
<br />
* [[Google News]] shuts down in Spain on December 16, 2014.<ref>http://googlepolicyeurope.blogspot.ca/2014/12/an-update-on-google-news-in-spain.html?m=1</ref><br />
<br />
* thatguywiththeglasses.com gets replaced by channelawesome.com on December 17, 2014.<br />
<br />
* [[Roon]] (roon.io) will meet it's doom on December 31, 2014.<br />
<br />
* [[Nokia Trailers]] shuts down on December 31, 2014.<br />
<br />
* Yahoo! is destroying more sites. [[Yahoo! Directory]] will shut down on December 31, 2014.<br />
<br />
* nRelate is closing on December 31, 2014.<br />
<br />
* Samsung Video Hub and WatchON shut down on December 31, 2014.<br />
<br />
* [[Nokia Memories]] is no longer a memory on January 12, 2015<ref>http://www.nokia.com/ca-en/support/faq/?action=singleTopic&topic=FA144076</ref>.<br />
<br />
* HP will be shutting down webOS cloud services on January 15, 2015.<br />
<br />
* [[Brace.io]] has been aquired by Squarespace and shuts down on January 19, 2015.<br />
<br />
* Mail.in.com will send its last mail on March 5, 2015.<br />
<br />
* Hiveminder will shut down on April 30, 2015.<br />
<br />
* [[Ovi Store]]'s infrastructure is slowly rotting away.<br />
<br />
* YoyoGames, developer of the GameMaker application, is planning to retire [http://gamemakerblog.com/2014/10/04/its-official-digital-store-will-replace-gamemaker-sandbox/ the old "GameMaker Sandbox" game hosting website] in favor of the "GameMaker: Player" service, by late October. [http://help.yoyogames.com/entries/101815476-GameMaker-Player-FAQs "Sandbox content will remain available for a period of time until the GameMaker: Player is fully live."]<br />
<br />
* [[nokia.com]] is [http://thenextweb.com/microsoft/2014/09/19/goodbye-nokia-com-microsoft-makes-new-home-devices-services-microsoft-com/ being destroyed by Microsoft]. It's currently being saved by the [[ArchiveBot]].<br />
<br />
* [[TwitPic]] shuts down [http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ September 25, 2014].<br />
<br />
* 20 newspapers in Quebec will shutdown in the coming weeks. Here's a list [http://pastebin.com/Xwt19JFQ] of those still up that needs to be archived ASAP.<br />
<br />
* Rue Frontenac was a website created during a newspaper lockout in Canada back in 2009. It was saved [http://exruefrontenac.com/ here] , but I'm not sure if anybody is maintaining it. Copy ?<br />
<br />
* [http://www.thegridto.com The Grid] (magazine in Toronto) printed its last issue on July 3rd 2014 ([https://twitter.com/TheGridTO/status/484352888635129856 see here]) not sure how long the site will stay up.<br />
<br />
* [[Blip.tv]] will be removing accounts/videos on September 1st, 2014.<br />
<br />
* [[Full Disclosure]] (http://seclists.org/fulldisclosure/) is a security/hacker mailing list that was [http://seclists.org/fulldisclosure/2014/Mar/332 suddenly suspended] as of March 19, 2014.<br />
<br />
* [[Nakido]] ([http://www.nakido.com/ site]) claims to be a "time capsule" that will "host your files for decades" - except it's a commercial enterprise selling premium acounts, and uses a proprietary P2P platform for delivery. What could possibly go wrong?<br />
<br />
* [[Gatsby]], not sure whether to file this here or under "Dead as a Doornail". [http://gatsby.im/ Frontpage] says that it's dead, but it's unclear whether hosted content is still available. Awaiting [https://vpsboard.com/topic/3475-gatsbyim-discontinued/#entry52574 response] as to what happened to the data.<br />
<br />
* LEGO has a bad habit of deleting Flash games and other materials from their sites. Some of them still lie in pieces on cache.lego.com, awaiting their deletion. Fortunately, some games are still available to play on [http://biomediaproject.com/bmp/games/ BioMediaProject] or [http://4t2portfolio.co.uk 4T2 Portfolio].<br />
<br />
* Nintendo shut down [http://www.fullscreenmario.com Full Screen Mario]. It's [https://github.com/Diogenesthecynic/FullScreenMario GitHub repository] should be archived in case it goes down.<br />
<br />
* [[Yahoo! China]] appears to be in the process of [http://wayback.archive.org/web/201429000000/http://www.bbc.co.uk/news/technology-23929002/ completely shutting itself down].<br />
<br />
* [[Yahoo!]] [http://www.yqlblog.net/blog/2013/11/11/y-ahoo-it-url-shortener-end-of-life-announcement/ retired] the y.ahoo.it [[URLTeam|URL shortener]] November 20th 2013 but the shortener is still active.<br />
<br />
* WordChamp was supposed to have shut down on June 30, 2013, later changed September 15, 2013, but is still up and running.<br />
<br />
* [[TuneWiki]] (not a wiki)<br />
<br />
* [[Readmill]], a social e-reader thing, is closing its doors on July 1, 2013. They have quite a few user pages documenting who read what, who says what and what people think of books.<br />
<br />
* These sites are getting an update in the next few months:<br />
** [http://www.lincsfm.co.uk Lincs FM], [http://www.traxfm.co.uk Trax FM], Rutland Radio [www rutlandradio.co.uk - spam filter on here blocked this url], [http://www.dearnefm.co.uk Dearne FM], [http://www.rotherfm.co.uk Rother FM], [http://www.compassfm.co.uk Compass FM], [http://www.kcfm.co.uk KCFM 99.8], [http://www.ridingsfm.co.uk Ridings FM]. All are getting an update, so you might want to back these up; not sure what the best means are, but making a mirror of Lincs FM Group websites is good for historical reasons.<br />
<br />
* [http://pic2.piczo.com/go/home Piczo], a social network for teens, has announced that it's shutting down.<br />
<br />
* '''[http://1up.com 1up.com]''', [http://www.ugo.com/ ugo.com], and [http://www.gamespy.com/ gamespy.com] a collection of video game, news, and fan sites with lots of user-generated content, was purchased by Ziff Davis in February. CEO Vivek Shah announced on February 21st, 2013 that it will be [http://www.polygon.com/2013/2/21/4014196/ign-layoffs-1up-ugo-and-gamespy-shutting-down "winding down 1UP.com, UGO and Gamespy"].<br />
<br />
* The '''Centralstation Community''' [http://community.thisiscentralstation.com/_Central-Station-v2-Q38As/blog/5449967/126249.html has closed]. The site is a UK-based social network for artists and creatives that provides hosting for content and portfolio. Users are being advised to back up their work as the new version of their platform will rely on existing media hosting sites like Flickr, Vimeo, and Soundcloud.<br />
<br />
* '''[http://www.groklaw.net/article.php?story=20130818120421175 Groklaw]''' will no longer be posting new articles, "due to government monitoring of the internet, particularly e-mail." Whether or not its archives will remain online is unclear, although it does seem rather unlikely it will 100% disappear. OTOH, better safe than sorry.<br />
<br />
* '''[[Webmonkey]]''' won't be posting new content anymore. It probably won't disappear overnight, but [http://longhandpixels.net/blog/2013/sep/20/whatever-happened-to-webmonkey/ "it wouldn’t hurt to create a backup."]<br />
<br />
=== Pre-emptive Alarmbells (Likely To Die) ===<br />
<br />
* Archive Team officially proclaims '''[[Yahoo!]]''' the least trustable host and its arch-enemy. Prove us different, Yahooligans. Or... don't. Expect anything in [http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_Yahoo! this list] and [http://en.wikipedia.org/wiki/List_of_Yahoo!-owned_sites_and_services this list] to shutdown (if it already hasn't).<br />
** Please follow the feeds! [https://twitter.com/YahooVictims] [http://www.google.com/alerts/feeds/03733117766037168292/11115209096644139952]<br />
<br />
* [https://encrypted.google.com/search?&q=inurl%3Arobots.txt+filetype%3Atxt+%2B%22ia_archiver%22 Sites that block the Wayback Machine] are at risk of being completely lost if they ever shut down.<br />
<br />
* [[Google]] has [http://www.seopedia.org/internet-marketing-and-seo/googles-secret-andor-forgotten-places/ quite] [http://www.seopedia.org/seo-news/google-2/googles-56-forgotten-secret-pages-part-two/ a few] old pages on their servers which haven't been updated in a long time. Might be a good idea to save these before they disappear.<br />
<br />
* Like Google, Nintendo of Japan has its share of ancient pages, like [http://www.nintendo.co.jp/n02/dmg/mla/index.html this one].<br />
<br />
* '''[[cyberpunkreview.com]]''': 80s science fiction fansite and community {{url|1=http://cyberpunkreview.com/}} hasn't seen much staff activity in a long time, although the forums are going strong. UPDATE: Looking active again. [[User:Aggroskater|Aggroskater]] 08:26, 19 March 2012 (EDT)<br />
<br />
* '''[[WikiLeaks]]''' ({{url|1=http://wikileaks.org/}}) has an uncertain financial situation, and the site was inaccessible for some time in 2010.<br />
<br />
* '''[[FriendFeed]]''' ({{url|1=http://friendfeed.com/}}) has been purchased by [[Facebook]], leaving FriendFeed users uncertain as to its future and mostly unsupported. The Twitter bridge, for instance, has not worked for years now.<br />
<br />
* '''[[The Pirate Bay]]''' ({{url|1=http://www.thepiratebay.org/}}) still having persistent legal problems. The tracker went down in November, but the site still serves torrents and magnet links. If a torrent is lost, it becomes impossible to connect to other computers distributing the shared files. Considering that there are links to TPB on '''THIS VERY PAGE''', this is pretty dang important. Thankfully, the magnet links and entire siterips have now been made, though keeping them updated is sure to be a pain.<br />
<br />
* '''[http://www.ning.com/ Ning]''' in 2010 has laid off 40% of staff and seems to be running out of money [http://techcrunch.com/2010/04/15/nings-bubble-bursts-no-more-free-networks-cuts-40-of-staff/]. There is certainly some networks worth archiving among the 2 million networks[http://blog.ning.com/2010/01/2-million-ning-networks.html] they host. Grouply[http://blog.grouply.com/grouply-welcomes-ning-networks/] and Posterous[http://blog.posterous.com/posterous-commits-to-building-a-ning-blog-imp] say they are going to offer migration tools.<br />
<br />
* '''[http://debates.oireachtas.ie/ debates.oireachtas.ie]''' on September 18th, 2012 the Houses of Oireachtas website [http://www.kildarestreet.com/statement2012/ announced] that it would no longer be updating its XMl data for Irish parliamentary debates (1919-2012). Access to pre-existing data is still available, but is likely to disappear, if the current trend continues. It would be useful to at least capture the XML data that is there, while it is still available.<br />
<br />
* As of 2014, ScraperWiki Classic is now read-only. But don’t worry! You can transfer this scraper to Morph.io if you want to continue editing it.<br />
<br />
* [http://convozine.com Convozine] hasn't been active lately. Their last reply to a support question was in 2012, their last update in the "News" section was December 2011, and their last blog post was in January 2013. (See [http://convozine.com/zine_forum/discussions/512] and [http://convozine.com/zine_forum/discussions/494].)<br />
<br />
* [http://ownlog.com ownlog.com] - once one of the most popular and oldest blog platform in Poland seems to be dying slowly - no development and actualizations except most critical maintenance. Archiving project in progress<br />
<br />
=== Other endangered species and misc ideas ===<br />
<br />
We have even more small tidbits of information at [[Deathwatch/Misc]].<br />
<br />
=== Just When You Least Expect It ===<br />
<br />
Archive Team keeps a list of [[Fire_Drill|healthy sites]] that could be fine today and not so hot tomorrow. We focus on ways to back your personal data off these sites so you don't put yourself at unnecessary risk.<br />
<br />
== Dead as a Doornail ==<br />
<br />
=== Because we know better ===<br />
* [[Fileplanet]] [http://www.fileplanet.com/]. Already fully archived.<br />
<br />
===2014===<br />
* December 12: BigPond Music shut down.<br />
* December 12: [http://allgame.com Allgame.com] ran out of quarters. It was saved with ArchiveBot [https://archive.org/details/archiveteam_archivebot_go_20141119080001 here] (and available in the Wayback).<br />
* December 10: [[ZipList]] got zipped up.<br />
* December 2: Verizon shut down SugarString.<br />
* November: [[Jux]] went down before their official shutdown date, November 30, 2014.<br />
* November 30: Tree Puncher (Minecraft server host) got chopped down.<br />
* November 1: [[Easel]] shut down.<br />
* November 1: [[Qwiki]] slowed down.<br />
* October 30: [[Listn]] was shut down by Beatport.<br />
* October 24: Haivl.com, Vietnam's 9gag equivalent, shut down by the government authorities.<br />
* October 11: [[OhLife]] became NoLife.<br />
* October 1: '''[[Quizilla]]''' fizzled away. <br />
* September 30: '''[[Verizon Personal Web Space]]''' shut down.<br />
* September 30: '''[[Orkut]]''' got kut, [[Google]] thankfully left a public archive.<br />
* September 30: [[Yahoo! Education]] dropped out.<br />
* September 30: Petition Online shut down.<br />
* September 30: The [[National Atlas]] [http://www.nationalatlas.gov/status.html died].<br />
* August 31: [[Svpply]] and Want by Svpply were shut down.<br />
* August 31: The [[Yahoo! Contributor Network]] is destroyed by [[Yahoo!]].<br />
* August 15: '''[[Heello]]''' said goooodbye.<br />
* August 10: '''[[Fotopedia]]''' leaves a photo finish.<br />
* August 5: '''[[Justin.tv]]''' shuts down completely.<br />
* August 1: '''[[Yahoo! Voices]]''', formerly Associated Content, is shut up by [[Yahoo!]].<br />
* July/August: Potential massive Quebec newspaper shutdown around August 2, 74 newspapers were bought by [http://blog.fagstein.com/2014/05/28/competition-bureau-quebecor-tc-newspapers/ Transcontinental].<br />
* July 31: [[Shortmail]] shut down.<br />
* July 31: [[Snapdisk]] got snapped.<br />
* July 31: [[Yahoo! Shine]] went dark.<br />
* July 31: Pinterest [http://www.iol.co.za/scitech/technology/business/pinterest-buys-startup-icebergs-1.1728612 acquires] Icebergs.com<br />
* June 30: Hungarian [[iWiW]] social network closes; data not available from this date at all.<br />
* June 30: [[Me2Day]], a [[Twitter]]-like social network, kicks the bucket.<br />
* June 15: '''[[Rawporter]]''' enters "into an exclusive business partnership", deletes user photos and videos (which we rescue.)<br />
* June 1: '''[[Ubuntu One]]''' shuts down, gives its users until July 31 to grab their data.<br />
* May 20: Nintendo shut down [[Nintendo Wi-Fi Connection]] (except for the Wii and DSi Shop Channels).<br />
* May 6: [[Userscripts.org]] mysteriously vanished. [http://userscripts-mirror.org A mirror] popped up not long after.<br />
* April: [http://jderef.com/ JDEREF.com] is served a takedown notice by Oracle.<br />
* April 30: [[qik]].com shut down.<br />
* April 18: [[Twitter Music]] shuts down.<br />
* April 15: Beats shuts down [[MOG]].<br />
* April 7: [[Vizify]].com was acquired by [[Yahoo!]]. Bios were deleted on April 7, 2014. (Users could opt-in to extend date to September 4, 2014.)<br />
* March 31: [[IntoNow]], a [[Yahoo!]] acquisition, will ceased to function.<br />
* March 31: '''[[Mochi Media]]''' realizes Flash is dead and the game is over.<br />
* March 17: [[doo]] shut down.<br />
* March 11: [[Intel AppUp]] shut down.<br />
* March 3: '''[[My Opera]]''' closes its member profiles.<br />
* February: [[Videogum]] is [http://www.videogum.com/800151/hey-guys-we-have-to-talk-to-you-about-something/letter-from-the-editor/ shutting down].<br />
* February 28: [[Outbox]] shuts down to [http://blog.outboxmail.com/post/74086768959/outbox-is-shutting-down-a-note-of-gratitude rebuild itself].<br />
* February 21: [[Yahoo!]] crashed [[Cloud Party]].<br />
* February 7: Schemer.com shut down by Google. (Time of death: 2014-02-08 00:13:52,184 EST.)<br />
* January 21: DrawQuest and '''[[Canv.as|Canvas]]''' shuts down. moot writes his [http://chrishateswriting.com/post/74083032842/today-my-startup-failed shut down notice.]<br />
* ??? '''[[dl.tv]]''' [http://dl.tv] There is no new tech podcast on here for over a year. Good idea to start backing up all podcast on this site. Same for Crankygeeks. [http://www.crankygeeks.com/]<br />
<br />
===2013===<br />
<br />
* December 26: [[Wretch]] and [[Yahoo! Blog]] is closed by Yahoo!.<br />
* December 21: [https://web.archive.org/web/20131222233044/http://clanbase.org/ ClanBase] is no more. The company that bought the website in 2004, Global Gaming League, decided to "move on" after basically running the website into the ground.<br />
* December 20: '''[[WinAmp]]''', home of the Winamp media player, shuts down.<br />
* December 18: [[Warhammer|Warhammer Online: Age of Reckoning]] closes.<br />
* December 15: [[Everpix]], a photo-sharing service, shuts down.[http://www.everpix.com/] [http://www.theverge.com/2013/11/5/5039216/everpix-life-and-death-inside-the-worlds-best-photo-startup] [https://github.com/everpix/Everpix-Intelligence], rest lost<br />
* December 12: [[Hyves]] closes it social network, but it's now got games!<br />
* November 18 (and back through 2012): Disney nuked a bunch of online games: Pixie Hollow Online, Cars Online, Pirates of the Caribbean Online, Toontown Online. The futures of the large fan forums and an extremely completist wiki are doubtful. [http://fairies.disney.com/pixie-hollow-faq] [http://www.vmkforums.com/] [http://www.carsonlineforums.com/] [http://www.piratesonlineforums.com/] [http://www.pixiehollowforums.com/] [http://www.disneysonlineworlds.com/index.php/Main_Page] The Pixie Hollow fan forum specifically announced that they will not be archived, and the Cars Online forum seems to have a similar warning, while the other forums for the shut down games will apparently be migrated into this: [http://www.mmocentralforums.com/]<br />
* November 11: [[Bre.ad]] is dead.<br />
* November 7: [[Dopplr]] drops out from the web.<br />
* November 1: [[Zapd]] deletes its user data from the website.<br />
* November 1: [[iGoogle]] shuts down.<br />
* November: [[Bitmit]], a Bitcoin marketplace, shut down.<br />
* November: Going to call this one before it even starts, friends: '''[https://www.legacylocker.com/ Legacy Locker]''' promises lifetime control of your data and return of your data to loved ones for just $300 for "lifetime", or $30/year. [http://www.washingtonpost.com/wp-dyn/content/article/2009/03/10/AR2009031001211.html] Archive Team says to just say [https://web.archive.org/web/20131121055401/http://legacylocker.com/ No].<br />
* October 21: [[isoHunt]] was always going to shut down after an MPAA settlement. However, it did so earlier than expected to prevent archival efforts, claiming that 95% of torrents were available elsewhere. No mention of the metadata though.<br />
* September 30: [[OMGPOP]] shut down, and now redirects to Zynga's main site. There was a [https://www.facebook.com/SaveOMGPOP petition] to stop it from closing, which did not gain much traction.<br />
* September 30: [[MSN TV]], aka WebTV, no longer accessible.<br />
* September 1st: [https://torrentfreak.com/major-tv-torrent-site-thebox-bz-calls-it-quits-130829/ Thebox.bz], a TV torrents tracker/site.<br />
* September: [[Freeblog.hu]] closes without noticing it's users. Unknown number of blogs lost.<br />
* August-September: [http://fileden.com/ FileDen.com], a file hosting website, suddenly shuts down, giving their users little to no warning.<br />
* August 31: [[Rockmelt]] shuts down after being acquired by [[Yahoo!]].<br />
* August 21: [[Amplicate]] vanishes, leaves behind 502 Bad Gateway errors.<br />
* August 20: [[Catch]] closes its doors.<br />
* August 19: [http://wowcoolfactsaboutgaming.com Wow! Cool Facts About Gaming] shuts down, thankfully leaves everything up.<br />
* August 9: [[Google Latitude]] shut down.<br />
* August 5: [[Astrid]] is shut down after being acquired by [[Yahoo!]].<br />
* July 31: All third party downloads disappear from [[Yahoo! Downloads]].<br />
* July 25: [[Yahoo! Stars India]] is shut down.<br />
* July 24: [[Snapjoy]], acquired by Dropbox in December 2012, is shut down.<br />
* July 19: [[Google]] shuts down Alfred.<br />
* July 9: [[Yahoo! Neighbors]] shuts down a day after it was supposed to.<br />
* July 8: AltaVista, one of the oldest search engines, shuts down.<br />
* July 1: FoxyTunes and Yahoo! RSS Alerts disappear from the web.<br />
* June 30: [[Yahoo!]] demolishes Yahoo! WebPlayer.<br />
* June 28: [[Yahoo!]] shuts down Axis, Browser Plus and Citizen Sports.<br />
* June 28: Nintendo shuts down all of it's WiiConnect24 services, except for the Mii Channel, Wii Shop Channel, Mario Kart Wii Channel, and the Wii Speak Channel.<br />
* June 4: '''Adrenaline Vault''', a video game review site, has this posted on their Facebook profile: "Over the past weekend hackers hit the site with a DoS attack. Everything had been wiped and with no backups, everything was lost. It has been decided that Avault will remain closed. Rest in Peace, Avault."<br />
* June: '''[http://ompldr.org/ Omploader]''', an anonymous file upload site, has announced that they are about $2500 in the hole on hosting costs, and that there is possibility of their shutting down if donations do not improve. It stands to reason that there are some files among their database that are worth saving. An attempt to contact the administrator for more information and to be given a dump of the site was made, and he responded saying he'd be happy to rsync a copy of the data after some legal issues have been settled.<br />
* April 30: '''[[Posterous]]''', a blogging and life streaming platform, shut down its "Posterous Spaces" to focus on Twitter.<br />
* April 30: '''Circalit''' decides in March that [http://circalit.createsend4.com/t/ViewEmail/r/26E73577D4220DBD/4433C195741969884AB3169DA1FD82E9 deleting is easier than migrating] its prose-writing users.<br />
* April 20: '''Microsoft Collection Book''', a site dedicated to collecting information about Windows betas, shuts down due to a C&D from Microsoft. It reopened on May 5 as '''The Collection Book'''.<br />
* March 31: '''Zug.com''', a comedy website running since 1995 closed down, and replaced all its pages with a goodbye image.<br />
* March 29: http://wrathofheroes.warhammeronline.com/ Play 4 Free Warhammer Online: Wrath of Heroes (WOH) shut down<br />
* March 25: [[Epinions]] locked out its users.<br />
* March 24: '''[http://hub.opensolaris.org/bin/view/Main/ The OpenSolaris Hub]''' and '''all sites under opensolaris.org''', including the site hosting the OpenSolaris source code, were decommissioned by Oracle. OpenSolaris was an open source computer operating system based on Solaris and originally created by Sun Microsystems. After the acquisition of Sun Microsystems in 2010, Oracle decided to discontinue open development of the core software, and replaced the OpenSolaris distribution model with the proprietary Solaris Express.<br />
* February 28: '''[http://www.stickam.com Stickam]''', a major video chat service, shut down. Users were emailed and given the ability to download any recorded videos for 3 weeks in advance of the closing date.<br />
* February: [[Regretsy]] shuts down.<br />
* January 31: [[Do.com]] shuts down.<br />
* January: <nowiki>http://</nowiki>go.to, an [[URLTeam|URL shortener]], has all of its domains on sale on Sedo. No official word just yet, though.<br />
<br />
===2012===<br />
* October 29: '''[http://gamecorner.pl Gamecorner.pl]''', a Polish video game news portal, was closed in May, and later wiped entirely on October 29. The articles have been retained at the publisher's other video game portal, Polygamia.pl, but the article comments and the forums are gone (It also had user blogs, but they seemed to have been erased much earlier.)<br />
* August 17: '''Ponibooru''', a famous My Little Pony-related imageboard, [http://www.equestriadaily.com/2012/06/ponibooru-shuts-down.html shut down] by August 17. All of the images themselves (but not the comments) were available to download via torrents, though it is unknown if the torrents are still available. Currently the most popular/upvoted images are available via another imageboard, Derpibooru, but their copy is incomplete.<br />
* August: [[Parodius Networking]], which hosts numerous web sites related to classic video game platforms, died<br />
* July 30: '''[http://kasabi.com Kasabi]''', a data publishing platform created by [http://talis.com Talis] was [http://blog.kasabi.com/2012/07/09/shutting-down-kasabi/ announced] to be closing on July 30, 2012. While the service has only been around for ~2 years it represents a unique look at services for Linked Data, and contains a variety of datasets. Kasabi has a [http://blog.kasabi.com/2012/07/16/archive-of-datasets/ blog post] that announces the availability of datasets contained in Kasabi to ease archiving.<br />
* July 1: The Polish social network '''[https://en.wikipedia.org/wiki/Grono.net Grono.net]''' has disappeared, replaced by a file hosting service '''grono.net.pl''' on July 1, 2012. Most content from the old site was supposed to be migrated, but, according to a message on the main page, technical difficulties have delayed the migration by one or two weeks. It's getting increasingly late...<br />
* June 30: '''Apple''' '''[[MobileMe]]''', '''[[MobileMe#iDisk | iDisk]]''', '''[[MobileMe#web.me.com / iWeb | iWeb]]''', and included services. This major website and these services will shut down in [http://support.apple.com/kb/HT4597 2012], simply because web hosting is boring and they want to focus on the exciting "iCloud". [http://www.apple.com/mobileme/transition.html][http://support.apple.com/kb/HT4597]<br />
* April 30: Google Wave shut down on April 30th.<br />
* April: '''[http://blog.convore.com/post/17951919109/convore-shutting-down-april-1st convore.com]''' shut down in April 2012. The site hosted IRC conversations, and involved a lot of JavaScript.<br />
* February: Hungarian free hosting provider [http://eplanet.hu Eplanet] stops free service as of February 2012; unknown number of pages disappeared and probably deleted.<br />
* January 19: The popular file hosting service '''Megaupload''' has been shut down in January 2012; with it, '''Megavideo''' too is gone. It was mainly used for copyright infringement, but lots of perfectly regular files were hosted on it.<br />
<br />
===2011===<br />
<br />
* End of 2011: '''[http://ghost.cc/ Ghost Cloud Computing]''' became a ghost of itself[http://ghost.cc/home/SignUp.jsp].<br />
<br />
* Late October/November: '''[http://itdied.com/ It Died]''' by Glenn Fleishman. a site dedicated to indicating sites that have died, itself died. (Keep the [http://itdied.com/atom.xml RSS Feed] around in case that changes, though).<br />
<br />
* September/October: The closure of '''Google Buzz''' was announced in October. Luckily Google released a tool to download your content from it called [[Google Takeout]]: https://www.google.com/takeout/ Besides Buzz, Google shut down many of its other minor services, such as Aardvark, Sidewiki, and others: http://googleblog.blogspot.com/2011/09/fall-spring-clean.html<br />
<br />
* August/September: '''[[Google Labs]]''' (http://labs.google.com) closed. http://www.pcmag.com/article2/0,2817,2388881,00.asp#fbid=7kZ39-1XQUH and many great/experimental one of a kind tools vanished. Among many others "google sets" that had been around since a long time, "City Tours" some includeing user generated content and the exciting "google squared" http://4.bp.blogspot.com/_ZaGO7GjCqAI/SibWbewOy5I/AAAAAAAAQBM/8lb7UA6AWPY/s640/google-squared-species.png that was an approach to pass more artificial intelligence to the user than conventional searchengines (compareable to wolframalpha) but seemingly based on a bigger/vast pool of data just like standard google searchresults). Since there is hardly any obvious rationale(?) for closeing down "Google Labs" it pictures google as beeing either less supportive or even hostile to new inoventions and less responsible with usergenerated content or more secretive about their ongoing projects than one might thought before or than Google might was before indeed. 1 Jan 2011 [[User:Whatsgoingonwithgoogle|Whatsgoingonwithgoogle]] 18:06, 17 October 2011 (UTC)<br />
<br />
* June 3: '''[[Forums.starwars.com]]''': {{url|1=http://www.starwars.com|2=StarWars.Com}} {{url|1=http://forums.starwars.com/ann.jspa?annID=3|2=announced}} the closure of their {{url|1=http://forums.starwars.com|2=forums}} on June 3, 2011. (Forum will lock on 29 April 2011) {{url|1=http://theforce.net/latestnews/story/StarWarscom_Forums_Shutting_Down_In_June_137497.asp|2=tf.n report}}<br />
<br />
* June 1: '''[[Prodigy Pages]]''' shut down on June 1,2011.<br />
<br />
* May 31: '''[http://web.archive.org/web/20110820081653/http://www.myphotoalbum.com/ MyPhotoAlbum.com]''' closed on May 3, 2011 and deleted everything on May 31, 2011. Users who heard about the closure were given the options of either transferring their photos and videos over to dotPhoto.com or purchasing a DVD with their content for $15.00.<br />
<br />
* May 24: [[Yahoo]]! has {{url|1=http://techcrunch.com/2011/02/24/yahoo-to-shut-down-mybloglog-on-may-24/|2=announced}} that '''[[MyBlogLog]]''' will be closed on 24 May 2011. '''UPDATE:''' Yup.<br />
<br />
* April 16: '''[[Encyclopedia Dramatica]]''' shutdown on 16. April 2011 without warning. Ongoing reconstruction Efforts. A lot of Images and Articles are probably lost. (The replacement OhInternet is a very strongly sanitized Version of ED.) <s>ED is claiming that they are in danger of shutting down. Despite the controversial nature of many articles hosted on the wiki, this would be a big loss of historical records.</s><br /><font color="red">A lot of the Images and Pages are still missing. Help appreciated.</font><br />
<br />
* March: '''[http://team.gaia.com/blog/2010/3/important-gaia-announcement Gaia Community]''' shut down at the end of March.<br />
<br />
* March 31: '''[[Yahoo! Video]]''' shut down on March 31st, 2011 and was reborn as a video portal.<br />
<br />
* March 16: '''Microsoft''' closed '''Windows Live Spaces''' on March 16, 2011. Spaces owners had the option to migrate their blogs to '''WordPress''' or to make copies. As of January 4, 2011, they could no longer edit their existing Spaces.[http://explore.live.com/windows-live-spaces-help-center]<br />
<br />
* February 22: The [[Insurgency Wiki]] is a wiki with a community that created multiple guides and raids for Anonymous, in a similar manner to [[Encyclopedia Dramatica]]. It's status has always been unclear, with many mirrors coming and going. But as of Feb. 22, 2012, the last mirror, Partyvan.info, looks like it has some damning database error. Just in case, the Bibliotheca Anonoma has made a full backup, including all available images available.<br />
<br />
* January 17: '''The Sims Carnival''': [http://www.simscarnival.com/games/CarnivalMonkey/35068/The-Sims-Carnival-Says-Goodbye]<br />
<br />
* January 1: '''[[ProHosting]]''' (http://free.prohosting.com) closed hosted sites on 1 Jan 2011<br />
<br />
* January: The wiki hosting site '''wik.is''', hosted by MindTouch, shut down on the first week of January 2011; the explanation being that "in order to continue to support the growing needs of our MindTouch Express users, we are offering MindTouch Cloud", which "opens up additional features and functionality that are not available in Wik.is.". The only way you'd know all that is if you receive a warning e-mail from MindTouch. They offer to keep your site running by "upgrading to our paid Cloud version by [http://campaigns.mindtouch.com/Wik.isDecomissioningMigrationInterest.html filling out this short form.]"<br />
<br />
===2010===<br />
* December 17: The '''[http://www.symbian.org/ Symbian Foundation]''' will shut down its websites, Twitter account, Facebook page, bug trackers and remove access to its source code on 17 Dec 2010[http://www.engadget.com/2010/11/27/symbian-foundation-axing-websites-on-december-17th-source-repos/][http://developer.symbian.org/wiki/Symbian_Foundation_web_sites_to_shut_down].<br />
* December: '''[http://machinima.com Machinima.com]''' was reworked in December 2010, and by "reworked" we mean massacred. Most notably, the forums were deleted, as well as tons of older articles. <br />
* November: '''[http://www.brightfuse.net BrightFuse]''' was a small social network started as a side venture by CareerBuilder.com in August 2009. It was quietly shutdown November of 2010 without much fanfare. At its height it has 100k users.<br />
* October 31: '''isweb lite''', the Japanese Geocities, shut down on October 31. Thousands of personal homepages of artists and illustrators were deleted forever. A tiny sample of the pages deleted: [http://togetter.com/li/64058] '''isweb''' itself (paid hosting!) will shut down in May 2012. [http://portal.faq.rakuten.co.jp/app/answers/detail/a_id/15387/]<br />
* September/October: '''[http://closing.vox.com/ Vox]''' shut down at the end of September 2010.<br />
* Mid-2010: '''[[JuniorNet]]''', a subscription based online portal for children, was quietly shut down nearly a decade after it dot-bombed and was acquired by former employees.<br />
* April/May: '''[http://www.kidradd.com Kid Radd]''' was a notable and quite popular webcomic which vanished when AT&T discontinued their Worldnet service. Thankfully, an archive is available, e.g. [http://tangent128.name/depot/kid_radd.zip here].<br />
* March 31: '''[http://extra.hu Extra.hu]''', largest free hosting Hungarian hosting provider goes paid-only; deletes unknown number of free sites on 31 March 2010.<br />
* March 1: '''[http://storytlr.com/ Storytlr]''', a lifestreaming site, stopped hosting March 1st 2010.<br />
* March: '''[http://platinum.ac Platinum]''', once a popular Finnish web site associated with electronic dance music, clubbing/raving, and the other related things was closed in March after been running for years. All the content posted to the forums of the site was, however, obtained and made available by [http://klubitus.org Klubitus], another related portal popular in Finland.<br />
<br />
===2009===<br />
<br />
* December 6: '''favrd''', a website that aggregated favorite tweets from twitter, abruptly shut down on December 6, 2009 with absolutely no warning, killing off thousands of highlighted entries added by group-consensus over significant months. As a reward for their efforts, founder Dean Allen wrote this helpful message: ''"Alas, stars on Twitter have become mere take-out menus hung on the doors of other restaurants. There are still lots of clever and funny things to read every day, but finding these is no longer a challenge â you already follow your sources. Sites like this one now serve mainly as fuel for emotional up-fuckedness in the guise of a game. Just an idea: next time you see something you like, write the person who made it a note telling them so. Even better, explain why. Take care!"'' Advice to people who want to work with Dean Allen's projects in the future: don't.<br />
* November: '''here.is''' seems to permanently off-line. It ceased to re-direct email for some time ago and as per 11-23-09 it doesn't redirect even URLs any longer.<br />
[[Image:Encarta.jpg|right|300px|Discontinuedpedia]] <br />
* October 31: '''Microsoft Encarta''', the online encyclopedia with a 15+ year history, is being shut down. The US version will shut down on October 31, 2009 and the Japanese version on December 31, 2009. [http://www.reuters.com/article/CMPTRS/idUSLV28230720090331]<br />
* October 26: '''[[GeoCities]]''': Shock! Repeat Offender '''[[Yahoo]]''' announced that it would close GeoCities "later this year...We'll send you more details this summer." [http://help.yahoo.com/l/us/yahoo/geocities/geocities-05.html]. The plug was pulled on October 26th 2009. See the [[Geocities]] project page for more details.<br />
* August 31: '''Microsoft's SoapBox''' has announced it is getting off said soapbox on August 31, 2009. [http://arstechnica.com/microsoft/news/2009/07/soapbox-microsofts-youtube-dies-on-august-31-2009.ars]. <br />
* August 30ish: '''ArchNacho's & TortillaGodzilla's Quality ROMs''', a site that hosted ROMs for NES, SNES, and Genesis games, which has announced its effective death back in January of 2006, is now finally completely inaccessible, both on its original domain (http://www.qualityroms.com), and on the site that the domain masked (http://home.no.net/qualrom/). Archive.org has [http://web.archive.org/web/*/http://qualityroms.com mirrors] of the site up through August 30, 2007, which is after all updates to the site ceased. All ROMs hosted on QualityRoms are included in the mirror and can be downloaded from there.<br />
* August 24: '''Microsoft's Popfly''' [http://popflyteam.spaces.live.com/blog/cns!51018025071FD37F!336.entry] pops off into nowhere on August 24, 2009.<br />
* July 13: '''Yahoo! 360''' announces [http://blog.360.yahoo.com/blog-1qCkw2Ehaak.hdNZkEAzDrpa4Q--?cq=1] that they are closing up shop on July 13, 2009. Of course, you can still register an account but that's the first thing you're told.<br />
* June 25: '''Imeem''', a site for sharing music and convincing yourself that what you're hearing is good, [http://blog.imeem.com/2009/06/25/simplifying-imeem/ announced] on June 25, 2009 that they were "simplifying" things and deleting all user-generated photos and videos uploaded by users. They gave everyone '''five days''' to get their photos off, and then extended it to ''twenty days'' from the ensuing hue and cry. The uploaded videos had no way to extract them back.<br />
* June 15: '''[http://www.jumpcut.com Jumpcut.com]''' became the latest example of Yahoo!'s awesome respect for history and data, announcing the closure of the video hosting and editing site, for June 15, 2009. A software utility has been released to allow you to download the movies from Jumpcut. Otherwise, you are not in great shape - Yahoo says you can move your videos to Flickr, but Flickr cuts off at 90 seconds. A lot of homemade video is going to disappear.<br />
* May 31: '''Rejaw''', a microblogging platform, has announced that it will be shutting down on May 31 2009 [http://rejaw.com/rejaw/shout/OOfs2wUaLql]. It's gone.<br />
* May 21: '''MSN QnA Beta''' closed on May 21 [http://liveqna.spaces.live.com/blog/cns!2933A3E375F68349!2244.entry]<br />
* April 20: '''[http://www.coghead.com Coghead]''', " a web-based service for building and hosting custom online database applications and a software as a platform 'utility computing' company", announced it had closed up on February 20, 2009, and that the site would go down permanently on April 20, 2009. [http://blogs.zdnet.com/collaboration/?p=349]. It did.<br />
* April 17: '''[http://furl.net/ Furl]''' was a social bookmarking service that had been around since 2004. It was acquired by [http://diigo.com/ Diigo] (announced on March 9), allowed people to opt into transferring their bookmarks to Diigo, and shut down on April 17. [http://blog.diigo.com/2009/03/16/welcome-furl-users/ Diigo blog post]; [http://www.techcrunch.com/2009/03/09/diigo-buys-web-page-clipping-service-furl-away-from-looksmart/ Techcrunch post].<br />
[[Image:HP upline goes offline.jpg|right|300px|Did we say upline? We meant offline.]]<br />
* March 31: It doesn't get more ironic than this: '''[https://www.upline.com/ Upline]''', a HP-owned online backup service, is being shut down.[http://news.cnet.com/8301-17939_109-10173136-2.html?part=rss&subj=news&tag=2547-1_3-0-5] ''They almost immediately turned off the backup process,'' and then announced all your restorable data would go offline on March 31, roughly 30 days after announcement. Surprise!<br />
* March 31: Google acquired '''[http://etherpad.com Etherpad]''' on 4th December, 2009 and immediately [http://etherpad.com/ep/blog/posts/google-acquires-appjet announced] a March 2010 content deletion date. After community pressure, Google has decided to [http://etherpad.com/ep/blog/posts/etherpad-back-online-until-open-sourced open source the Etherpad codebase], keeping the service alive until then. The site closed down shortly after. Fortunately there are now are [http://www.google.com/search?q=etherpad+alternatives numerous] [http://www.google.com/search?q=etherpad+clone alternatives].<br />
* March 30: '''[[Yahoo_Briefcase|Yahoo Briefcase]]''', a positively ancient site run by Yahoo that provided you with 25 free megabytes of storage space for your junk, sent a mail to what were likely years-old contact addresses to tell them they had a little more than a month to get their files out, March 30, 2009. After that, the files would be deleted. What, Yahoo doesn't have a spare memory stick to store what must be the amount of files in this service for the next year?<br />
* March 25: '''Yahoo! Farechase''', an airline fare aggregation and searching site, was shut down on March 25, 2009. It had previously been it's own company, founded in 1999, and purchased by Yahoo! in 2004. [http://news.cnet.com/Yahoo-buys-travel-company/2100-1032_3-5300561.html]<br />
* March 20: '''[http://www.spiralfrog.com Spiralfrog]''', "a FREE service that lets you download over 3 million songs and videos, legally and safely", pulled up stakes in the night and completely shut down on March 20, 2009. [http://arstechnica.com/web/news/2009/03/ad-based-music-service-spiralfrog-croaks.ars] Things looked so promising in 2006: [http://arstechnica.com/old/content/2006/08/7611.ars] Oh, and sadly, all your music you downloaded from them will stop working within 30 days or less. [http://arstechnica.com/old/content/2007/09/spiralfrog-debuts-with-free-ad-supported-music-downloads.ars]<br />
* March 17: '''[http://seattlepi.nwsource.com/ The Seattle Post-Intelligencer]''' was [http://seattlepi.nwsource.com/business/395463_newspapersale10.html put up for sale], but found no buyer, and the print edition stopped on March 17th 2009 after 146 years. [http://www.thenewstribune.com/news/columnists/zeeck/story/591181.html] Initially, reports indicated it would shut down the website as well as the paper, but a plan was apparently in place to run a "skeleton crew" on an internet-only site, which continues to operate.<br />
* March 11: '''[http://www.videosift.com Videosift]''' had a combination database and backup failure, losing: "All votes, ever. All member usernames who registered later than around 12 months ago. All member rankings. Your member profile info (e.g., bio, favorite sift, etc.), if any. All activity that happened on the site yesterday, March 11." This is unlikely to kill the site, but an awful lot of data was lost.<br />
* March 6: '''[http://www.scoopt.com/ Scoopt]''', a "citizen journalism" site run by Getty images to allow the uploading of images by citizen journalists and the chance to be licensed to news organizations, announced they would no longer take any new imagery after February 6, 2009, and will shut down completely on March 6, 2009. Some content uploaders "may" be contacted about being absorbed into the main Getty site.<br />
[[Image:20090227.jpg|right|300px]]<br />
* Sometime between [https://web.archive.org/web/20090228114022/http://music.download.com/ February 28th] and [https://web.archive.org/web/20090323020108/http://music.download.com/ March 23d]: '''[http://music.download.com/ music.download.com]''' was redirected to Last.fm. The free music it offered does not seem to have been transferred, however (the band Ancient Teknologi had several tracks available on music.download.com, but only one available on Last.fm). As of 28 February, it claimed to have 111.052 MP3s.<br />
* February 28: '''[[Lycos Europe]]''' shut down their '''Tripod''' hosting service on February 28, 2009. [http://www.washingtonpost.com/wp-dyn/content/article/2009/01/18/AR2009011800224.html] [http://www.paidcontent.co.uk/entry/419-lycos-europe-killing-tripod-customers-warned-to-back-up/] Note that Lycos Europe are distinct from Lycos.com. '''[[Lycos Europe]]''' is also shuttering the social networking site '''Jubii''' as of February 15, 2009. [http://www.techcrunch.com/2009/01/18/lycos-kills-jubii-while-theyre-at-it/] A Danish version of the site will remain open for the time being.<br />
* February 27: '''The [http://www.rockymountainnews.com/ Rocky Mountain News]''' has shut down as of February 27, 2009. [http://www.rockymountainnews.com/news/2009/feb/26/rocky-mountain-news-closes-friday-final-edition/] We're watching to see what happens with the website (and the material, and the newspaper itself). With a 150 year history, there's a lot of backstory, and how this chronicler of history will end up, so too will many others. There is an excellent documentary about the last days of the Rocky Mountain News [http://www.vimeo.com/3390739 here].<br />
* February 23: '''Windows Live''' shut down the '''MSN Groups''' on February 23. They extended their original date from February 21st to give Group owners the weekend to prepare. [http://windowslivewire.spaces.live.com/Blog/cns!2F7EB29B42641D59!34861.entry?sa=503427140]<br />
* January 31: '''[http://ma.gnolia.com/ ma.gnolia.com]''' had a catastrophic disk corruption/failure on January 31, 2009. From the message on the main site: ''"As I evaluate recovery options, I can't provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours."'' Ma.gnolia had an excellent export feature... hope you used it and did the backups they didn't!<br />
* January 28: '''[http://dominomag.com/ Domino Magazine]''', a style/interior design magazine, announced that they were shutting down on January 28, 2009. [http://mydecofile.dominomag.com/ My Deco File], one of the site's heavily used social bookmarking features (somewhat like delicious for images) will remain up for a few weeks to allow users to save their stuff.<br />
* January 28: '''[http://culture11.com/home Culture11]''' ran out of money.[http://www.patrolmag.com/scanner/1263/culture11-is-over]<br />
* January 27: '''Yahoo Pets''' was shut down and redirected with absolutely no notice around January 27, 2009. [http://blog.dogster.com/2009/01/28/yahoo-quietly-shutters-yahoo-pets-grin/]<br />
* January 17: '''[[totse]].com''' [http://www.totse.com/ closed its doors] on January 17, 2009. As of Jan 20th, a mirror [http://totse.danladds.com/ exists], alongside a [http://totse.danladds.com/text/ repository of the totse text files].<br />
* January 15: '''[[Ficlets]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/02/ficlets-will-be-shut-down-permanently/]<br />
* January 15: '''[[Circavie]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/03/circavie-will-be-shut-down-permanently/]<br />
* January 14: '''Several Google services''' have shut down. [http://www.readwriteweb.com/archives/google_giveth_and_it_taketh_away.php] Most importantly, Google Video stopped accepting new uploads (to avoid competition with Google-owned YouTube), and Google Catalog Search was erased.<br />
* January 11: '''[[Co.mments]].com''' closed down on January 11, 2009.<br />
* January 9: '''[[AOL_Pictures|AOL Pictures]]''' said so long on January 9, 2009. To their credit, you can still yank your stuff into other photo services until June of 2009. (At least, according to their goodbye letter.)<br />
* January 6: '''Electronic Gaming Monthly''' has recently shut its doors. [http://multiplayerblog.mtv.com/2009/01/06/egm-closed-ziff-lays-off-30/]<br />
<br />
===2008===<br />
<br />
* [http://blogs.zdnet.com/BTL/?p=11227 Overview of 2008 Technology News]<br />
<br />
''Biggest Botched Shutdowns of 2008''<br />
<br />
* October 31: '''[http://www.peopleconnectionblog.com/2008/11/06/hometown-has-been-shutdown AOL Hometown]''' (owned by AOL) was officially killed on October 31, 2008. [http://ascii.textfiles.com/archives/1617 Jason wrote about it.]<br />
[[Image:Stayclassyaol.png|thumb|right|470px|The full extent of warning AOL gave about shutting down Hometown.]]<br />
* October 27: '''Digitalrailroad.net''', a photo hosting site, gave their users a 24-hour eviction notice on October 27, 2008. They shut down 10 hours after the 24-hour notice. [http://news.cnet.com/8301-17939_109-10078042-2.html]<br />
<br />
''Other deaths of 2008''<br />
<br />
* December 31: '''[http://pingmag.jp/ Pingmag]''', the magazine from Tokyo about "Designing and Making things," simultaneously rang in the new year and checked out of existence on December 31, 2008.<br />
* December 31: '''[http://www.lively.com/goodbye.html Lively]''', a 3D Avatar space experiment, was killed in a really crappy way by Google on December 31, 2008.<br />
* December 27:'''[http://blog.mixwit.com/ Mixwit]''' said goodbye on December 27, 2008. [http://news.cnet.com/8301-17939_109-10126057-2.html]<br />
* December 23: '''[http://www.castlecops.com/ Castle Cops]''' put away their badges on December 23, 2008. [http://www.idf50.co.uk/clubhouse/computer-room/15996-castle-cops-closed-down.html]<br />
* December 19(?): '''[[Google Research Datasets]]''', shut down on December 19(?), 2008. [http://blog.wired.com/wiredscience/2008/12/googlescienceda.html]<br />
[[Image:Final image 01.png|400px|right|thumb|The last person at Yahoo! Kickstart turning off the lights.]]<br />
* December 18: '''Yahoo! Kickstart''', a social network for college students revealed in 2007 [http://mashable.com/2007/08/30/yahoo-kickstart/] got expelled on about December 18, 2008. [http://www.techpluto.com/yahoo-kickstart-shutdown/]<br />
* December 16: '''Flip.com''', a social network for teenage girls, shut down on December 16, 2008. Users were advised to print out their digital scrapbooks as backups. [http://news.cnet.com/8301-1023_3-10112021-93.html]<br />
* December 15: '''[http://pownce.com/ Pownce]''' was closed on December 15, 2008.<br />
* December 8: '''[http://getsatisfaction.com/iwantsandy/topics/a_fork_in_the_road_an_important_announcement_about_i_want_sandy I Want Sandy]''' [http://www.webcitation.org/5eFA58kqN (WEBCITE)] was shut down on December 8, 2008. A lot of people complained about this one, while others thanked the site for shutting down and wished the founder well! <br />
* December 3: '''[http://live.yahoo.com/ Yahoo Live!]''' died on December 3, 2008. [http://news.cnet.com/8301-13515_3-10081486-26.html]<br />
* October 31: '''[http://ourworld.cs.com/sfrederick2/index.htm?f=fs|Compuserve OurWorld]''' slipped into history on October 31, 2008.<br />
* October 29: '''[http://blogrush.com BlogRush.com]''' failed to provide bloggers with the traffic they so desperately desired, and the creator admitted on October 29, 2008 that his 4AM idea may not have been so brilliant. [http://mashable.com/2008/10/29/blogrush-shutdown/]<br />
* September 29: '''ScribbleWiki''' wikis go offline. Apparently [http://wikiindex.org/ScribbleWiki their servers crashed] and they didn't have a backup.<br />
* September 28: '''Yahoo! Mash''', a social networking site, became mush on September 28, 2009, after 30 days warning. [http://mashable.com/2008/08/28/yahoo-mash-has-been-quashed/] <br />
* September 26: '''Uber.com''' was a social blog site that died. [http://news.cnet.com/8301-13577_3-10052301-36.html]<br />
** Not too be confused with the ridesharing service of the same name, which now owns the domain name uber.com.<br />
* September 18: '''[http://wallop.com/ Wallop]''', Microsoft's attempt at starting a social network, died on September 18, 2008. All that remains is a few Facebook apps. [http://news.cnet.com/8301-13577_3-10041856-36.html] [http://www.techcrunch.com/2008/09/15/wallop-takes-a-leap-into-the-deadpool/]<br />
* May 21: '''Virtual Magic Kingdom''' [http://www.intercot.com/discussion/showthread.php?t=130548 closed its gates] on May 21, 2008. [http://www.virtualworldsnews.com/2008/04/disneys-virtual.html] The amount of broken hearts and anguish over this move was amazing, and a warning sign to any family-oriented site that encourages families to join up.<br />
** Some of the more anguished fans have gotten together in various forms to recreate VMK, including [http://game.myvmk.com/ MyVMK], [http://www.vmkrevisited.com/ VMKRevisted] (a memorial site), and [http://www.openvmk.com/ OpenVMK] (although OpenVMK shutdown due to [https://docs.google.com/document/d/1qTdpgcLUd-Hg6-FZzkVEju4x1aMlhbMNX6snlUNgnOM/ internal squabbles].)<br />
* February 14: '''[http://en.wikipedia.org/wiki/Think_Secret Think Secret]''' was killed by Apple and shut down on February 14, 2008. [http://blog.wired.com/business/2007/12/apple-and-think.html]<br />
* July 31: '''Social.fm''' couldn't stand up to Last.fm, and died. [http://news.cnet.com/8301-13577_3-10005554-36.html]<br />
* May 15: '''Brijit.com''', a news aggregation site, closed on May 15, 2008. It might be closed for good. [http://news.cnet.com/8301-13577_3-9945059-36.html]<br />
* February: '''Yahoo! Design''', a showcase of designing and information aesthetics related to the Yahoo! properties, got revised into oblivion in February, 2008 as part of a 1,000 employee layoff. [http://infosthetics.com/archives/2008/02/rip_yahoo_design_closed_down.html]<br />
<br />
===2007===<br />
<br />
* October 31: '''Yahoo! Podcasts''', a Podcast searching site founded in October 2005 [http://www.ysearchblog.com/2005/10/09/listen-to-the-internet-with-yahoo-podcasts/], was closed with no explanation on October 31, 2007. [http://searchengineland.com/yahoo-podcasts-to-close-the-sorry-state-of-podcast-search-12288]<br />
* October 23: '''[http://oink.cd/ OiNK's Pink Palace]''' Music Bittorrent tracker site with huge user community which cared greatly about digital content and music. Would have been a great resource for the industry to research. Shutdown October 23, 2007. [http://www.wired.com/entertainment/music/news/2007/10/oink]<br />
* September 20: '''Yahoo! Photos''', a photo sharing service by Yahoo!. Tools: [http://smart-techie.com/yahoo/ Download Hi Resolution Yahoo! Photos] by [http://smart-techie.com/web/ Rohit Sud], [http://kentbrewster.com/download-yahoo-photos/ Download Yahoo! Photos] by [[Kent Brewster]], and [http://yandao.com/yahoograb/ Yahoo! Photos Grabber] by [http://yandao.com Yandao.com]<br />
* March 20: '''[http://jam.bbc.co.uk/ BBC Jam]''' was [http://news.bbc.co.uk/2/hi/uk_news/education/6449619.stm suspended] March 20, 2007 and [http://www.guardian.co.uk/media/2008/feb/28/bbc.digitalmedia will not be coming back].<br />
<br />
===2006===<br />
<br />
===2005===<br />
<br />
* http://IUMA.COM (Internet Underground Music Archive), of Santa Cruz, California, the actual first website to offer free hosting of bands including MP3 files of music offered by the bands, was mostly archived by John Gilmore before going down. At least one IUMA founder now has a copy of that archive. This ~800GB collection has been uploaded to an archiveteam staging server.<br />
<br />
===2004===<br />
<br />
===2003===<br />
<br />
* http://mp3.com went down. Much of it was archived by John Gilmore.<br />
<br />
===2002===<br />
<br />
===2001===<br />
* '''SixDegrees.com''', a social network service website that lasted from 1997 to 2001<br />
* '''The Useless Pages''' (at [http://replay.web.archive.org/20000612123540/http://www.go2net.com/useless/index.html IA])<br />
<br />
== Eleventh Hour Reprieves and Reanimations ==<br />
<br />
* Video host '''[[Viddler]]''' announces in an [http://mad.ly/5ae274?pact=20251445218&fe=1 e-mail newsletter] that they're shutting down free accounts on March 11, 2014. But Archive Team kicked in and began to suck up the place until the owners told us to stop. Videos won't be permanently deleted.<br />
* '''[[4chan|Chanarchive.org]]''' - A site dedicated to saving select quality threads from 4chan, running since 2006 and containing 500GBs of important material. It has shut down entirely, as the owner was banned from Paypal and has no means of paying for the site in it's current state. In a [https://boards.4chan.org/q/res/264159 4chan thread], the owner explains that backups will be made available, but there is no guarantee of who, where, and for how long.<br />
* '''[[Berlios.de]]''' will [http://www.berlios.de/ shut down at end of 2011]. The site hosts thousands of open source software projects (git, svn, bzr, mailing lists, bug tracking, etc). [http://developer.berlios.de/docman/display_doc.php?docid=2056&group_id=2 Instructions for exporting a project.] Berlios is still open and they are now [http://joinup.ec.europa.eu/news/german-open-source-development-site-berlios-joins-sourceforge partnered with sourceforge] to keep things running.<br />
* '''[[Citizendium]]''''s finances {{url|1=http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-11-08/News_and_notes|2=constantly cry for money}}. Running a MediaWiki site is cheap and Sanger is not homeless, hence it's expected to survive. [[WikiTeam]] archives it on a regular basis.<br />
* '''[[Delicious]]'''[http://www.delicious.com] will be [http://daringfireball.net/linked/2010/12/16/delicious.php shutting down soon]. The whole team was let go yesterday - 15 December 2010. [http://tech.slashdot.org/story/10/12/16/2220225/Yahoo-To-Close-Delicious Slashdot link]. Delicious was acquired from Yahoo! in early 2011 by AVOS however all the prior content is gone.<br />
* '''Cli.gs''', another URL shortening service, announced closure: "On Sunday, 25 Oct 2009 at 12:00:00 GMT, the service will stop accepting new short URLs and will stop logging analytics."[http://blog.cli.gs/news/cligs-shutting-down] In December 2009, it was announced that the "social bookmarking" site Mister Wong has acquired cli.gs and are keeping it running.[http://blog.cli.gs/news/mister-wong-acquires-cligs] All aboard the [[TinyURL]] project. <br />
* '''[https://duck.co/topic/soft-launch-of-the-new-forum Duck.co]''', the official DuckDuckGo community forums, transitioned to their [https://dukgo.com own platform] and moved all posts over from their old Zoho forum.<br />
* '''[[Earbits]]''' bites the dust on June 16, 2014, but comes back to life on June 19th, 2014. In between, we grabbed ~130GB of images and ~130k MP3s.<br />
*'''Filefront.com''' is closing up shop [http://farewell.filefront.com/]. The site will be suspended on March 30, 2009. 1.5 Million files and 48+ TB of space gone just like that. '''UPDATE''' As of April 2, 2009, it looks like there may have been an 11th hour reprieve for Filefront. According to a message reportedly from the original founders of the service [http://welcome.filefront.com/], the site has been re-acquired by them in order to prevent its proposed shuttering.<br />
* '''[[Formspring]]''' (now called '''spring.me''') announced they'd be [http://formspring.wordpress.com/2013/03/15/formspring-is-shutting-down/ shutting down] on April the 15th. It was, however, was acquired by new management on May 8, 2013, and saved from being shut down.<br />
* '''[[Google Video]]''' threatened to remove all hosted videos with two weeks' notice in April 2011. It backed down after criticism and an archive effort by the Archive Team.<br />
* '''Home of the Underdogs''' went under on Feb 9th[http://flashofsteel.com/index.php/2009/02/13/rip-hotu/]. There has been some passed along words by the site's owner, now working at an NGO, that an attempt to bring it back may happen. (She definitely has backups of the site.) A community-driven effort to revive the site is currently underway [http://www.hotud.org]. Backups were restored, and the remaining files (1,000+) collected from the community. As of Jan 4th 2010, HOTU is reporting that files are back online [http://www.hotud.org/component/content/article/25133-files-online-and-import-done]<br />
* '''[[JPG Magazine]]''' announced it would shut down on January 5, 2009 [http://jpgmag.com/blog/2009/01/jpg_magazine_says_goodbye.html], but the site lives [http://jpgmag.com/blog/2009/02/an_exciting_future_for_jpg.html lives on under new ownership]. Feel free to download the [http://thepiratebay.org/torrent/4624703/ torrent]<br />
* [[Jux]] announced that they would be shutting down on August 31, 2013. '''UPDATE''' On July 17, 2013, Jux announced that they would not shut down, apparently due to financial support from one of their members.<br />
* [[KeygenJukebox]], which shut down in 2014, has recently popped back up.<br />
* '''[[MobyGames]]''', the largest database of old game releases on the net, with huge amount of content found nowhere else. Was bought by [[GameFly]] in 2010 and received a new site design in September 2013, which made almost all contributors emigrate. Many site features have disappeared or became broken in the new design, and their large database of cover art and screenshots had problems loading. In December 2013, the site was bought by [http://blueflamelabs.com/ Blue Flame Labs], who have restored the old site design and managed to draw back pretty much all the contributors. It seems that the site is going back to full health again. <br />
* '''[[WebCite]]''' &ndash; Has a habit of crying for money, threatening it will stop accepting submissions. Since October 2013, the Wayback Machine archives pages on demand, hence there's no reason to use a site like WebCite that declares self at risk. It's expected that they'll send their data to Internet Archive if they ever really have to shut down.<br />
* '''[[Word Count Journal]]''' ({{url|1=http://www.wordcountjournal.com/about}}) is shutting down on June 11, 2011 '''UPDATE''' The site is fully up and running. (checked on October 21, 2011) '''UPDATE2''': Non-functional, but the website is up with this notice "Word Count Journal is no longer being supported." (checked on January 26th, 2012)<br />
<br />
== Links ==<br />
<br />
=== Other Sites Remember the Dead ===<br />
<br />
* [http://www.disobey.com/ghostsites/ Ghost Sites of the Web] by Steve Baldwin. [http://www.disobey.com/ghostsites/atom.xml RSS Feed]<br />
* [http://www.techcrunch.com/tag/deadpool/ Techcrunch's Deadpool] is an excellent archive of stories about site closings.<br />
* [http://deletionpedia.dbatley.com/w/index.php?title=Main_Page Deletionpedia] saved the articles deleted from Wikipedia in 2008, and [http://wikidumper.blogspot.com/ Wikidumper] preserves a selection of them.<br />
<br />
=== Tragic ===<br />
<br />
* [http://news.cnet.com/8301-13578_3-10029798-38.html "Russia Web site owner killed after arrest" - article at CNET News]<br />
<br />
=== Humorous ===<br />
<br />
* [http://www.nzherald.co.nz/lifestyle/news/article.cfm?c_id=6&objectid=10448650 "Dating website's miscalculated publicity attempt" - article at New Zealand Herald]<br />
<br />
{{Navigation pager<br />
| previous = Who We Are<br />
| next = Fire Drill<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Talk:The_Pirate_Bay&diff=21075Talk:The Pirate Bay2014-12-14T18:32:06Z<p>Archive Maniac: Replied</p>
<hr />
<div>== Site's down ==<br />
The website went down and the backup is being downloaded by many people. Problem is, this backup is about two years old.<br />
Question: Would it be feasible to provide a "delta backup" using the Internet Archive? If someone here has access to their infrastructure, that would mean writing a script to backup all the newer pages that have been archived by the crawler, and publishing the whole thing as a complement to the 2012 archive.<br />
[[User:Burger|Burger]] 15:09, 14 December 2014 (UTC)<br />
:The material's all pirated & a lot of malware is probably in the torrents. By backing up the content, you're giving users access to pirating material on the Wayback Machine. :/ [[User:Archive Maniac|Archive Maniac]] 13:32, 14 December 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20971User talk:Bzc6p2014-12-08T00:46:54Z<p>Archive Maniac: /* Blogter.hu's Unexpected Downfall */ new section</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)<br />
<br />
== Blogter.hu's Unexpected Downfall ==<br />
<br />
Hi Bzc6p. You know how Blogter unexpectedly shut down in December in spite of its popularity? That goes to show that anything, and I mean anything, can happen to web sites that seem okay but actually are in limbo (i.e. extinction). That's why I suggested you archive gportal.hu. I already archived the Mario and DK sites. [[User:Archive Maniac|Archive Maniac]] 19:46, 7 December 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20775User talk:Bzc6p2014-11-29T19:51:13Z<p>Archive Maniac: /* Blank CD Question */ new section</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)<br />
<br />
== Blank CD Question ==<br />
<br />
Hi Bzc6p, I am wondering how long CD-R's and DVD-R's last with a .iso image burned on to it. Is it just as long as the estimated shelf life? More importantly: what do you recommend for long-term backup solutions? [[User:Archive Maniac|Archive Maniac]] 14:51, 29 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Chfoo&diff=20752User talk:Chfoo2014-11-26T02:04:55Z<p>Archive Maniac: /* Wikiadownloader.py problem */ new section</p>
<hr />
<div>You might want to add this to the Archive. It is Donkey Kong Country GBA's official European site: http://www.mediafire.com/download/y3ydjsg259a3rw3. For more backups (mainly the file contents), check out my other archives here: http://www.dkc-atlas.com/forum/viewtopic.php?f=26&t=1861. We just have to be careful, for sites like NoE's are susceptible to shutting down.<br />
<br />
I hope I helped the Archive Team! :) [[User:Archive Maniac|Archive Maniac]] 11:17, 12 February 2014 (EST)<br />
<br />
:Excuse me, but do you know how I could manually run the Warrior's python scripts? It seems that items download a lot faster with it. Could you please teach me? I also do need help on how to upload the downloaded wikis with "uploader.py". If you could help me with these two problems, I'd much appreciate it. Thanks. [[User:Archive Maniac|Archive Maniac]] 00:39, 17 February 2014 (EST)<br />
<br />
::I have an idea. How's about we can contribute sites to the Archive Team, whether or not it has been shut down. That way, we have a backup. Also, the best way to back up your files is via (re)writable DVDs and/or CDs. Also, how do you submit your collections to Archive Team? [[User:Archive Maniac|Archive Maniac]] 14:56, 18 March 2014 (EDT)<br />
<br />
== Suggestion: Allowing People to Submit Sites to the Archiveteam ==<br />
<br />
I have a suggestion: why don't we allow users to submit sites to the ArchiveTeam on the Internet Archive (archive.org). I have already done an example site here: https://archive.org/details/site-wwwangelfirecomazdixieden. Just like WikiTeam uses "dumpgenerator.py", I think that users should use wget for downloading websites. Beginners could use HTTrack if they desire.<br />
<br />
I have also set some conventional rules, based on those of the WikiTeam's:<br />
#The collection's title should have a page title following this format: "Website - SITENAMEGOESHERE".<br />
#The URL identifier should have this format: "site-SITEURLGOESHERE". I believe it should exclude the site URL's periods and "http://" from the title, but I think "www" should be kept, as some have it in their URLs, while others don't (e.g. if a site's name is http://www.nintendo.co.jp, it should have an identifier of "site-wwwnintendocojp".<br />
#Users should enter the keywords "archiveteam" and "web" (maybe the ArchiveTeam can create a website submission collection with its own keyword).<br />
#Users should compress the folder the site was downloaded in as a 7z file. It should be named in this format: SITENAMENOPERIODS-YYYYMMDD.<br />
<br />
I hope you like this idea. :) [[User:Archive Maniac|Archive Maniac]] 14:05, 12 April 2014 (EDT)<br />
<br />
:I can't seem to use the ArchiveBot. I don't have the necessary permissions... And how do I upload sites to the Internet Archive? Do I do it a different way or the same way? :/ [[User:Archive Maniac|Archive Maniac]] 21:31, 13 April 2014 (EDT)<br />
<br />
== Wpull EXE File ==<br />
<br />
Will there ever be an EXE/Windows version of wpull? I'd appreciate it if there will be. I also messaged [[User talk:Nemo bis|Nemo bis]] about the unarchived sites of swipnet.se. I was hoping you can do something about this. Thanks. [[User:Archive Maniac|Archive Maniac]] 11:11, 7 August 2014 (EDT)<br />
<br />
== Yahoo Stinks -- Look at What They Are Going to Shut Down Now... ==<br />
<br />
Because you're the programmer here, I thought I might tell you that http://dir.yahoo.com (Yahoo Directory!) is shutting down on December 31, 2014. It's a shame it's on the month and date that my IRC username states. *_* But here's the proof: [http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down news article].<br />
<br />
Yahoo sucks. They're nuking their first service. They buy out stuff and completely destroy it all.<br />
<br />
Oh yeah, and I'm sorry for some of my trolling. [[User:Archive Maniac|Archive Maniac]] 13:31, 19 October 2014 (EDT)<br />
<br />
== Wikiadownloader.py problem ==<br />
<br />
Hi, I have a problem with the Wikia downloader script ([https://raw.githubusercontent.com/WikiTeam/wikiteam/master/wikiadownloader.py Link]). When the program is expected to open wikia.com, it recognizes it as a directory:<br />
<br />
<pre><br />
Traceback (most recent call last):<br />
File "wikiadownloader.py", line 41, in <module><br />
f = open('wikia.com', 'r')<br />
IOError: [Errno 2] No such file or directory: 'wikia.com'<br />
</pre><br />
<br />
Please fix this bug. Me and other Windows users will like it if you do. [[User:Archive Maniac|Archive Maniac]] 21:04, 25 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20720User talk:Bzc6p2014-11-22T16:45:45Z<p>Archive Maniac: /* ArchiveBot Requests */ Replied</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)<br />
:Thanks for the info. And what's been a problem is that I've tried to set ArchiveBot or wpull up a few times, but never had proper 100% cannot fail step-by-step instructions on how to set both up. If you have the time, could you please write a more specific tutorial than the existing one? I preferably want a tutorial on the former [wpull]. [[User:Archive Maniac|Archive Maniac]] 11:45, 22 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20717User talk:Bzc6p2014-11-20T17:02:32Z<p>Archive Maniac: /* ArchiveBot Requests */ Fixed new message of mine.</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descriptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20716User talk:Bzc6p2014-11-20T17:01:53Z<p>Archive Maniac: /* ArchiveBot Requests */ Messaged</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)<br />
:I have two more questions (the thing that made users upset at me):<br />
<br />
#I like archiving stuff. What archiving tools do you know of and recommend?<br />
#Is there a way that I can save whole sites to the Wayback Machine without using the ArchiveBot channel? I probably don't think so, but there still might be a chance.<br />
#Why doesn't the ArchiveTeam make C++ ports of their Python tools?<br />
#When I try to use Wget, I get this error in the command prompt: ''Connecting to SITENAME (SITENAME)|IP|:PORT... failed: Bad file descr<br />
iptor.'' Do you know how to fix this problem?<br />
<br />
I hope you're not too annoyed by these questions, like the others would probably be. [[User:Archive Maniac|Archive Maniac]] 12:01, 20 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20690User talk:Bzc6p2014-11-18T23:55:53Z<p>Archive Maniac: /* ArchiveBot Requests */ new section</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)<br />
<br />
== ArchiveBot Requests ==<br />
<br />
Hey, Bzc6p. Are you willing to take ArchiveBot requests from me? I also like your Hungarian site archiving. I recently archived smb.gportal.hu on my computer. [[User:Archive Maniac|Archive Maniac]] 18:55, 18 November 2014 (EST)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20512User talk:Bzc6p2014-10-22T00:59:16Z<p>Archive Maniac: Added topic</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)<br />
<br />
== Any Help on Chat? ==<br />
What's your IRC username? I want help coming back on the ArchiveBot & archiveteam-bs channel. And please tell me what discussions are appropriate for the latter; you do have a way with words. :P [[User:Archive Maniac|Archive Maniac]] 20:59, 21 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20509User talk:Bzc6p2014-10-20T21:42:37Z<p>Archive Maniac: Extended reply</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. Aside from that, I always get errors when attempting installation, like vcvarsbatall.bat or something error, couldn't find seesaw kit, etc. Python is so user-unfriendly... [[User:Archive Maniac|Archive Maniac]] 17:42, 20 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20508User talk:Bzc6p2014-10-20T19:21:35Z<p>Archive Maniac: /* Re: Some friendly words */ Replied</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)<br />
::Python3 unfortunately gets mixed up with Python 2 in the Command Prompt (e.g. python3 is not recognized as a command). That's why I've stuck to Python 2, because I use the wiki dump tool with that version. [[User:Archive Maniac|Archive Maniac]] 15:21, 20 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Yahoo!&diff=20489Yahoo!2014-10-20T01:22:15Z<p>Archive Maniac: I was thinking January 2008, but the ArchiveTeam started on January 2009. :P</p>
<hr />
<div>===As of January 2009, Archive Team no longer considers Yahoo a dependable location for data.===<br />
__NOTOC__<br />
<br />
[[File:Yahoo blog 404 new.png|right|300px]]<br />
<br />
This is not based on their engineering, which has shown itself to be consistent and with few outages. Rather, it appears the company is in relative free-fall with regards to which projects they will maintain and what comes under any given knife for cost-cutting measures.<br />
<br />
When a company enters this sort of spiral with regard to one of their core businesses (hosting and providing of information services), and consistently gives little or no indication of their next move, it becomes incumbent upon the users of that service to either demand changes in policy, or find alternatives, even poor ones, and build those up.<br />
<br />
When a company decides (or, more accurately, someone with the company decides) that a website or sub-site is no longer viable, then it's living on borrowed time. Like a store closing, or a very sick pet, it becomes a matter of how to bring things to a close. This is entirely up to the closing party, and from their behavior, we can see how they will consider doing this.<br />
<br />
Previously, Yahoo showed some level of restraint in how they would shut down services. For example, when [[Yahoo! Photos]], a photo sharing site, was closed in favor of the bright and shiny new property [[Flickr]], it was announced, a special site was provided to assist users in transferring their photos to other sites, and there was an opportunity to purchase an archive CD of your content. <ref>http://help.yahoo.com/l/us/yahoo/photos/photos3/closing/closing-02.html</ref>. It should be noted, however, that [[Yahoo! Photos]] was closed under much protest and duress of the userbase, who in some cases had no interest in transferring to [[Flickr]] and wished merely to maintain their own interface.<br />
<br />
But now, Yahoo seems to have no issues with very quick shutdown, with little warning, and almost no regard for the quality of the site.<br />
<br />
Some examples of this new behavior:<br />
<br />
* Yahoo is closing [[GeoCities]] down "later this year." <ref>http://help.yahoo.com/l/us/yahoo/geocities/geocities-05.html</ref> Time to start mirroring...<br />
* Yahoo closed [http://www.crunchbase.com/product/yahoo-brickhouse Brickhouse], their in-house development and prototype department (think of it as an incubator) in December of 2008. They were swift enough to close down the building within weeks. <ref>http://george08.blogspot.com/2008/12/not-quite-what-i-had-in-mind.html</ref><br />
* In December of 2008, Yahoo began layoffs at [[Flickr]], a site previously untouchable, including George Oates<ref>http://george08.blogspot.com/2008/12/not-quite-what-i-had-in-mind.html</ref>, who designed the interface of [[Flickr]], and championed the site's interaction with the "Commons", including the US Library of Congress, and making Creative Commons licenses the default for [[Flickr]]'s photo uploads. Oates was laid off mid-trip on a fact-finding and information trip for Yahoo, having met and advocated [[Flickr]] to a number of prominent folks. <ref>http://www.guardian.co.uk/technology/blog/2008/dec/11/yahoo-flickr-layoffs</ref><br />
* On or about January 27, 2009, with ''absolutely no notice'', [[Yahoo Pets]] was shut down, all content removed from the web, and completely redirected under another Yahoo property, [[Shine]]. <ref>http://blog.dogster.com/2009/01/28/yahoo-quietly-shutters-yahoo-pets-grin/</ref><br />
* At the end of 2014, Yahoo! gave three months notice that it will kill [[Yahoo! Directory]].<ref>Peter Bright. "[http://arstechnica.com/information-technology/2014/09/yahoo-killing-off-yahoo-after-20-years-of-hierarchical-organization/ Yahoo killing off Yahoo after 20 years of hierarchical organization]". ''Ars Technica'', 2014-09-26. Accessed 2014-09-28.</ref> Now what is left?<br />
<br />
===Please do not use Yahoo or Yahoo-owned sites for any non-retrievable personal data.===<br />
<br />
Non-retrievable data means that there is no export function, or way to pull your personal data off the site. You should continue to use it if you can be assured that the Yahoo function you are using will not dramatically affect your life if it disappears tomorrow. Because it might.<br />
<br />
===Yahoo Services===<br />
[[File:Yahooblogman.png|right]]<br />
* [[Flickr]]<br />
* [[Delicious]]<br />
* [[GeoCities]]<br />
* [[MyBlogLog]]<br />
* [[Qwiki]]<br />
* [[Tumblr]]<br />
* [[Yahoo! Answers]]<br />
* [[Yahoo! Blog]]<br />
* [[Yahoo! Briefcase]]<br />
* [[Yahoo! Buzz]]<br />
* [[Yahoo! Directory]]<br />
* [[Yahoo! Education]]<br />
* [[Yahoo! Groups]]<br />
* [[Yahoo! Messages]]<br />
* [[Yahoo! Upcoming]]<br />
* [[Yahoo! Video]]<br />
* [[Yahoo! Voices]]<br />
* [[Wretch|Wretch.cc]]<br />
<br />
===And a Short Time Later....===<br />
<br />
[[Image:Z89.png|center|700px]]<br />
<br />
<center>'''"Can't wait to find out how you got the web cast. Whoever it is, gone!"'''<br>''- Blake Irving, Yahoo! Chief Product Officer, showing his proficiency for trashing things''</center><br />
<br />
Follow their last deeds at https://twitter.com/YahooVictims<br />
<br />
== References ==<br />
<references/><br />
<br />
== External Links ==<br />
<br />
* {{w|List of mergers and acquisitions by Yahoo!}}<br />
* {{w|Timeline of Yahoo!}}<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Yahoo!]]<br />
[[Category:Corporations]]</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20479User talk:Bzc6p2014-10-19T21:47:41Z<p>Archive Maniac: Replied</p>
<hr />
<div>== Re: Some friendly words ==<br />
<br />
Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)<br />
<br />
:I'm waiting for Wpull to have a Windows release or a Python 2 release. I also stink at Python big time... [[User:Archive Maniac|Archive Maniac]] 17:47, 19 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Chfoo&diff=20475User talk:Chfoo2014-10-19T17:31:17Z<p>Archive Maniac: /* Yahoo Stinks -- Look at What They Are Going to Shut Down Now... */ new section</p>
<hr />
<div>You might want to add this to the Archive. It is Donkey Kong Country GBA's official European site: http://www.mediafire.com/download/y3ydjsg259a3rw3. For more backups (mainly the file contents), check out my other archives here: http://www.dkc-atlas.com/forum/viewtopic.php?f=26&t=1861. We just have to be careful, for sites like NoE's are susceptible to shutting down.<br />
<br />
I hope I helped the Archive Team! :) [[User:Archive Maniac|Archive Maniac]] 11:17, 12 February 2014 (EST)<br />
<br />
:Excuse me, but do you know how I could manually run the Warrior's python scripts? It seems that items download a lot faster with it. Could you please teach me? I also do need help on how to upload the downloaded wikis with "uploader.py". If you could help me with these two problems, I'd much appreciate it. Thanks. [[User:Archive Maniac|Archive Maniac]] 00:39, 17 February 2014 (EST)<br />
<br />
::I have an idea. How's about we can contribute sites to the Archive Team, whether or not it has been shut down. That way, we have a backup. Also, the best way to back up your files is via (re)writable DVDs and/or CDs. Also, how do you submit your collections to Archive Team? [[User:Archive Maniac|Archive Maniac]] 14:56, 18 March 2014 (EDT)<br />
<br />
== Suggestion: Allowing People to Submit Sites to the Archiveteam ==<br />
<br />
I have a suggestion: why don't we allow users to submit sites to the ArchiveTeam on the Internet Archive (archive.org). I have already done an example site here: https://archive.org/details/site-wwwangelfirecomazdixieden. Just like WikiTeam uses "dumpgenerator.py", I think that users should use wget for downloading websites. Beginners could use HTTrack if they desire.<br />
<br />
I have also set some conventional rules, based on those of the WikiTeam's:<br />
#The collection's title should have a page title following this format: "Website - SITENAMEGOESHERE".<br />
#The URL identifier should have this format: "site-SITEURLGOESHERE". I believe it should exclude the site URL's periods and "http://" from the title, but I think "www" should be kept, as some have it in their URLs, while others don't (e.g. if a site's name is http://www.nintendo.co.jp, it should have an identifier of "site-wwwnintendocojp".<br />
#Users should enter the keywords "archiveteam" and "web" (maybe the ArchiveTeam can create a website submission collection with its own keyword).<br />
#Users should compress the folder the site was downloaded in as a 7z file. It should be named in this format: SITENAMENOPERIODS-YYYYMMDD.<br />
<br />
I hope you like this idea. :) [[User:Archive Maniac|Archive Maniac]] 14:05, 12 April 2014 (EDT)<br />
<br />
:I can't seem to use the ArchiveBot. I don't have the necessary permissions... And how do I upload sites to the Internet Archive? Do I do it a different way or the same way? :/ [[User:Archive Maniac|Archive Maniac]] 21:31, 13 April 2014 (EDT)<br />
<br />
== Wpull EXE File ==<br />
<br />
Will there ever be an EXE/Windows version of wpull? I'd appreciate it if there will be. I also messaged [[User talk:Nemo bis|Nemo bis]] about the unarchived sites of swipnet.se. I was hoping you can do something about this. Thanks. [[User:Archive Maniac|Archive Maniac]] 11:11, 7 August 2014 (EDT)<br />
<br />
== Yahoo Stinks -- Look at What They Are Going to Shut Down Now... ==<br />
<br />
Because you're the programmer here, I thought I might tell you that http://dir.yahoo.com (Yahoo Directory!) is shutting down on December 31, 2014. It's a shame it's on the month and date that my IRC username states. *_* But here's the proof: [http://www.theverge.com/2014/9/27/6854139/yahoo-directory-once-the-center-of-a-web-empire-will-shut-down news article].<br />
<br />
Yahoo sucks. They're nuking their first service. They buy out stuff and completely destroy it all.<br />
<br />
Oh yeah, and I'm sorry for some of my trolling. [[User:Archive Maniac|Archive Maniac]] 13:31, 19 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20474User talk:Bzc6p2014-10-19T17:25:27Z<p>Archive Maniac: Another notice</p>
<hr />
<div>Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... <br />
<br />
And by the way, SketchCow disliked the fact that I "asked too many questions". [[User:Archive Maniac|Archive Maniac]] 13:25, 19 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User_talk:Bzc6p&diff=20473User talk:Bzc6p2014-10-19T16:13:44Z<p>Archive Maniac: Replied</p>
<hr />
<div>Thanks for appreciating my efforts and explaining the ArchiveTeam to me. I thought "#archiveteam-bs" was for off-topic conversation, though. :/ And of course I didn't give up on archiving. Why would I? I'm getting 24 Blu-ray M-Discs next month, in fact. :) Would you willing to explain to the other users about the situation? I'm willing to forgive them if they accept it & apologize for my trolling. I'm just glad someone, by the very least, understood my situation and took the time to write to me.<br />
<br />
And I looked at your userpage. I'll see if I can track down some Hungarian sites. You can always use the Google operator "site:.hu" to filter just Hungarian sites. There is, however, [http://donkeykong.gportal.hu/ this site]. I have a backup of it, but not in .warc.gz format. Even worse, Yahoo is stupid enough to be shutting down their first service: dir.yahoo.com (Yahoo! Dir), on 12/31/2014. Stupid Yahoo... [[User:Archive Maniac|Archive Maniac]] 12:13, 19 October 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Audit2014&diff=20355Audit20142014-10-07T01:40:22Z<p>Archive Maniac: /* Misc */ Collection is not affiliated with the ArchiveTeam. Removed from list.</p>
<hr />
<div>We've uploaded a bunch of stuff: [https://archive.org/search.php?query=subject:Archiveteam https://archive.org/search.php?query=subject:Archiveteam]<br />
<br />
Let's go through the list and make sure it's categorized, has decent metadata, etc.<br />
<br />
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.<br />
<br />
== Things to check ==<br />
<br />
; Collection : Are all the related items grouped into a collection?<br />
; Description : Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)<br />
; Inclusion : Are all the related items included in the same collection?<br />
; Categorization : Can a visitor find the item by browsing the collections?<br />
; Cross-references : Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?<br />
; Indexing : If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)<br />
; Your suggestion here : this is just off the top of my head.<br />
<br />
== Current Sub-Collections at Archive Team ==<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
!Collection<br />
!Status<br />
!Auditor<br />
!Item Count<br />
!Has an Index<br />
!Description of Audit<br />
|-<br />
| '''[https://archive.org/search.php?query=earbits No Category]''' || Unaudited || || 98 || Yes || The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want.<br />
|-<br />
| [http://archive.org/details/archiveteam_ptch archiveteam_ptch] || Audited || db48x || 50 || No || Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: [https://archive.org/details/deathy-s3-test-ptch deathy-s3-test-ptch]<br />
|-<br />
| [http://archive.org/details/archiveteam_flowerpot archiveteam_flowerpot] || Audited || db48x || 406 || No || The description of the collection is anemic, but each item is well-identified.<br />
|-<br />
| [http://archive.org/details/github_files github_files] || Audited || db48x || 1 || No || Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as [https://archive.org/details/archiveteam-github-repository-index-201212 this repository index], and [https://archive.org/search.php?query=ArchiveTeam%20GitHub%20file%20downloads these other file downloads]<br />
|-<br />
| [http://archive.org/details/justintv justintv] || Audited || db48x || 189 || <s>No</s> [http://chfoo-cn.mooo.com/~archiveteam/justintv-index/html/ Partial] [https://github.com/ArchiveTeam/justintv-index (Src)]|| Decent description, but no other metadata. There are [https://archive.org/search.php?query=justintv%20and%20-collection%3A%28justintv%29 51 other 'justintv' items], but none of them look to be from us.<br />
|-<br />
| [http://archive.org/details/archiveteam_mochimedia archiveteam_mochimedia] || Audited || db48x || 9 || No || Collection includes Mochi's notice about the shutdown, but no other context. The items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index.<br />
<br />
Index can be easily generated from [https://web.archive.org/web/*/http://feedmonger.mochimedia.com/feeds/query/?q=search%3A&limit=81563 this 26MB JSON file]--chfoo<br />
|-<br />
| [http://archive.org/details/archivebot archivebot] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_yahooblogs archiveteam_yahooblogs] and [https://archive.org/details/archiveteam_yahooblog archiveteam_yahooblog] || Audited || db48x || 49 || No || Collection description is just the shutdown notice (and apparently quite a brief one at that) with no other context. Items are all WARCs, and all have CDXs and JSON indexes, but there's no overall index. One item is orphaned in a collection of its own; apparently caused by a typo in the collection name. <br />
|-<br />
| [http://archive.org/details/archiveteam-splinder archiveteam-splinder] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-picplz archiveteam-picplz] || Audited || db48x || 141 || Yes || The collection description is just the shutdown message, with no other context. Items are tarballs containing WARCs. There is an index, but it's not a part of the collection ([https://ia601401.us.archive.org/18/items/picplz-00454713-20120603-143400.warc/]). There's also a search page for the index, which is great.<br />
|-<br />
| [http://archive.org/details/archiveteam_puush archiveteam_puush] || Audited || db48x || 1781 || || The collection description is just the shutdown notice, but it's better than average; it includes some context. The items are all WARCs with CDXs, but there's no central index.<br />
|-<br />
| [http://archive.org/details/archiveteam_upcoming archiveteam_upcoming] || Audited ||dashcloud1 || 142 || no || The collection description only describes the site, not the items themselves. Individual items have no description of any kind.<br />
|-<br />
| [http://archive.org/details/archiveteam_randomfandom archiveteam_randomfandom] || Audited || dashcloud1 || 42 || yes || Short collection description, but has an index, and every collection item is well described. Index is located right on collection page.<br />
|-<br />
| [http://archive.org/details/archiveteam_antecedents archiveteam_antecedents] || Audited || db48x || 46 || N/A || This collection represents multiple sites, rather than multiple parts of a single large site. The collection description is quite brief, but each item appears to have a paragraph describing what the site is/was, as well as some basic metadata such as keywords. All the items appear to be WARCs with CDXs<br />
|-<br />
| [http://archive.org/details/archiveteam_jazzhands archiveteam_jazzhands] || Audited || db48x || 443 || No || This one is a collection of items from multiple sites, but those sites are also broken up into multiple items based on when they were scanned. The items have brief descriptions and some keywords, and are WARCs with CDXs. A good way to improve this would be to make collections for each site as subcollections.<br />
|-<br />
| [http://archive.org/details/archiveteam-mobileme-hero archiveteam-mobileme-hero] || Unaudited || || 4007 || [https://ia600403.us.archive.org/30/items/archiveteam-mobileme-index/mobileme-20120817.html Yes] [https://github.com/ArchiveTeam/mobileme-index (source)] ||<br />
|-<br />
| [http://archive.org/details/archiveteam_myopera archiveteam_myopera] || Audited || dashcloud1 || 155 || No || Collection page has a nice description of the site, and the items. The items appear to be all have WARCs, and have no descriptions/keywords of any kind on them.<br />
|-<br />
| [http://archive.org/details/archiveteam_bebo archiveteam_bebo] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_dogster archiveteam_dogster] || Audited || jscott || 55 Items || ??? || Collection well described. Wayback Machine-Ready WARCs, all integrated.<br />
|-<br />
| [http://archive.org/details/hyves hyves] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_wretch archiveteam_wretch] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_xanga archiveteam_xanga] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/twitterstream twitterstream] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/pastebinpastes pastebinpastes] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups-th archiveteam-googlegroups-th] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_zapd archiveteam_zapd] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_patch archiveteam_patch] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_posterous archiveteam_posterous] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_greader archiveteam_greader] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_ignsites archiveteam_ignsites] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_g4tv_forums archiveteam_g4tv_forums] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-yahoovideo archiveteam-yahoovideo] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archive-team-friendster archive-team-friendster] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_formspring archiveteam_formspring] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_yahoo_messages archiveteam_yahoo_messages] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_punchfork archiveteam_punchfork] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/yahoo_korea_blogs yahoo_korea_blogs] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-cinch archiveteam-cinch] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_dailybooth archiveteam_dailybooth] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam_weblognl archiveteam_weblognl] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/stage6 stage6] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/googlegroups-part2 googlegroups-part2] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-btinternet archiveteam-btinternet] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-qaudio-archive archiveteam-qaudio-archive] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/webshots-freeze-frame webshots-freeze-frame] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/tabblo-archive tabblo-archive] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-fortunecity archiveteam-fortunecity] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/2012-04-30-wikimedia-images-snapshot 2012-04-30-wikimedia-images-snapshot] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-anyhub archiveteam-anyhub] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-fileplanet archiveteam-fileplanet] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-umich-save archiveteam-umich-save] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-geocities archiveteam-geocities] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-fire archiveteam-fire] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-mypodcast archiveteam-mypodcast] || Unaudited || || || ||<br />
|-<br />
| [http://archive.org/details/archiveteam-googlegroups archiveteam-googlegroups] || Unaudited || || || ||<br />
|-<br />
| isohunt dumps [https://archive.org/details/isohunt.teapot.2013 1] [https://archive.org/details/isohunt.croissant.2013 2] [https://archive.org/details/isohunt.coffeepot.2013 3] || Unaudited || || || || These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing.<br />
|-<br />
| '''[https://archive.org/search.php?query=streetfiles No Category]''' || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_yahoovoices archiveteam_yahoovoices] || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] || Unaudited || || || [http://chfoo-cn.mooo.com/~archiveteam/twitchtv-index/html/ Yes] [https://github.com/ArchiveTeam/twitchtv-index/ (source)] ||<br />
|-<br />
| [https://archive.org/details/archiveteam_fotopedia archiveteam_fotopedia] || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_canvas archiveteam_canvas] || Unaudited || || || ||<br />
|-<br />
| [https://archive.org/details/archiveteam_ancestry archiveteam_ancestry] || Unaudited || || || ||<br />
|}<br />
<br />
== [[:Category:In_progress|In progress???]] ==<br />
<br />
But what happened after? Where are the archives?<br />
<br />
* [[BerliOS]]<br />
* [[Deletionpedia]]<br />
* [[Delicious]]<br />
* [[ExtraTorrent]]<br />
* [[Free ProHosting]]<br />
* [[Google Video]]<br />
* [[Ispygames]]<br />
* [[Len Sassaman Project]]<br />
* [[Lulu Poetry]]<br />
* [[Prodigy.net]]<br />
* [[Resedagboken]]<br />
* [[ScreenshotsDatabase.com]]<br />
* [[Spanish Revolution]]: Is this finished?<br />
* [[University of Michigan personal webpages]]<br />
* [[Wallbase]]<br />
* [[Wallhaven]]<br />
* [[Webmonkey]]<br />
* [[Widgetbox]]<br />
* [[Windows Live Spaces]]<br />
<br />
== Oddities, Mislocations, and To Do ==<br />
<br />
* https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.<br />
* https://archive.org/details/archiveteam_yahooblog_20140123193921 is misplaced in an non-ArchiveTeam collection.<br />
* The wiki front page needs updating<br />
<br />
=== To be moved to better collection ===<br />
<br />
==== Orphaned [[Canv.as]] ====<br />
* https://archive.org/details/archiveteam_canvas_20140812090142<br />
* https://archive.org/details/archiveteam_canvas_20140812144024<br />
* https://archive.org/details/archiveteam_canvas_20140815175099<br />
* https://archive.org/details/archiveteam_canvas_20140812085210<br />
* https://archive.org/details/archiveteam_canvas_20140812090945<br />
<br />
==== Orphaned [[Twitch.tv]] ====<br />
* https://archive.org/details/archiveteam_twitchtv_20140811223313<br />
<br />
==== WARC ====<br />
* https://archive.org/details/pouet.com_full_grab no WARC file visible for me<br />
* https://archive.org/details/archiveteam_punchfork_archive-archive<br />
* https://archive.org/details/sg1archive.com_forums_20140708<br />
* https://archive.org/details/2013_misc_warcs_02<br />
* https://archive.org/details/2013_misc_warcs_01<br />
* https://archive.org/details/site-donkeyboytripodcom<br />
* https://archive.org/details/site-homeswipnetseclubnintendo007<br />
* https://archive.org/details/site-homeswipnetsecpg<br />
* https://archive.org/details/site-homeswipnetsegamemaster<br />
* https://archive.org/details/homeswipnetsenestabs<br />
* https://archive.org/details/Site-homeswipnetsew-62848<br />
* https://archive.org/details/site-homeswipnetsesofiasgbc<br />
* https://archive.org/details/site-homeswipnetsexcheatsdk<br />
* https://archive.org/details/site-home2swipnetsew26120<br />
* https://archive.org/details/site-home3.swipnet.se-w38081<br />
* https://archive.org/details/site-home4swipnetse-w42641<br />
* https://archive.org/details/site-home4swipnetse-w46722<br />
* https://archive.org/details/site-homeswipnetsefredde2000<br />
* https://archive.org/details/ubuntuone-panicgrab-20140405<br />
* https://archive.org/details/myopera-forums-1700001-1800000<br />
* https://archive.org/details/myopera-forums-1800001-1823192<br />
* https://archive.org/details/rawporter.s3.amazonaws.com_20140616_partial<br />
* https://archive.org/details/technet.microsoft.com-panicgrab-20130706<br />
* https://archive.org/details/isohunt_facebook_page_snapshot WARC and other formats<br />
* https://archive.org/details/Misc.yero.orgMusic<br />
* https://archive.org/details/telinco.co.uk_pages<br />
* https://archive.org/details/tribes_forum_emergency_grab<br />
* https://archive.org/details/isohunt-20131019-mithrandir-extra<br />
* https://archive.org/details/cscope.us-google-pdfs-grab-20130312<br />
* https://archive.org/details/cscope.us-google-pdfs-grab-20130520<br />
* https://archive.org/details/PinkTentacle<br />
* https://archive.org/details/journalstar.com_sports_local_20120730.warc<br />
* https://archive.org/details/www.battleforthenet.com-panicgrab-20140718<br />
* https://archive.org/details/theopeninter.net-panicgrab-20140718<br />
* https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718<br />
* https://archive.org/details/net.net-panicgrab-20140718<br />
* https://archive.org/details/wwdctimer.com-panicgrab-20140731<br />
* https://archive.org/details/xn--19g.com-panicgrab-20140731<br />
* https://archive.org/details/chromercise.com-panicgrab-20140731<br />
* https://archive.org/details/hiddenfromgoogle.com-panicgrab-20140731<br />
* https://archive.org/details/orteil.dashnet.org-panicgrab-20140731<br />
* https://archive.org/details/pingus.seul.org-panicgrab-20140731<br />
* https://archive.org/details/tux4kids.alioth.debian.org-panicgrab-20140731<br />
* https://archive.org/details/tuxkart.sourceforge.net-panicgrab-20140731<br />
* https://archive.org/details/assets.minecraft.net-panicgrab-20140807<br />
* <nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link)<br />
* https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808<br />
* https://archive.org/details/nintendo-warcs<br />
* https://archive.org/details/www.battleforthenet.com-panicgrab-20140912<br />
* https://archive.org/details/mojang.com-notch-panicgrab-20140912<br />
<br />
==== FTP ====<br />
* https://archive.org/details/ftp.idsoftware.com <br />
* https://archive.org/details/ftp.lucasarts.com-20130427<br />
* https://archive.org/details/ftp.santronics.com<br />
* https://archive.org/details/2014.02.ftp.inf.tuDresden.deAtari<br />
<br />
==== Misc ====<br />
<br />
* https://archive.org/details/archiveteam-picplz-index<br />
* https://archive.org/details/Posterous.comHostnames<br />
* https://archive.org/details/YahooBlogSitemaps20131216071927<br />
* https://archive.org/details/archiveteam-mobileme-index<br />
* https://archive.org/details/archiveteam-twitter-stream-2014-05<br />
* https://archive.org/details/ESPNForumsPanicgrab<br />
* https://archive.org/details/rawporter-grab<br />
* https://archive.org/details/bitsnoop-dump<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/ArchiveteamWarriorV220121008Hyperv<br />
* https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo<br />
* https://archive.org/details/myopera_shutdown_notice<br />
* https://archive.org/details/UsenetSci.space.news2003-2012<br />
* https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012<br />
* https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com<br />
* https://archive.org/details/CaliforniaFinanceLobbyData<br />
* https://archive.org/details/carti.itarea.org<br />
* https://archive.org/details/ovmk_story<br />
* https://archive.org/details/ti_guidebook_en<br />
* https://archive.org/details/ti_guidebook_fr<br />
* https://archive.org/details/ti_guidebook_de<br />
* https://archive.org/details/myopera_usernames_FIXED.7z<br />
* https://archive.org/details/DubaiWikipediaPageOn2012-09-06<br />
* https://archive.org/details/digpicz-2008-07-30-website<br />
* https://archive.org/details/site-wwwangelfirecomazdixieden<br />
* https://archive.org/details/ArkiverCrawlsPack0004<br />
* https://archive.org/details/ArkiverCrawlsPack0005<br />
* https://archive.org/details/ArkiverCrawlsPack0007<br />
* https://archive.org/details/ArkiverCrawlsPack0008<br />
* https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111<br />
* https://archive.org/details/paste_lisp_org<br />
* https://archive.org/details/MtGoxSituationCrisisStrategyDraft<br />
* https://archive.org/details/MtGoxBusinessPlan20142017<br />
* https://archive.org/details/nyt_innovation_2014<br />
* https://archive.org/details/slackware-irc-logs<br />
* https://archive.org/details/thekeep_bbs<br />
<br />
== [[URLTeam]] ==<br />
<br />
* Upload the latest offical torrent release<br />
* Upload the Dropbox files in the URLTeam wiki page table that are *not* in the latest release<br />
* [[user:chfoo]] needs to create a dedicated/shared account on IA and be given permission for automated/semiautomated rolling releases into a URLTeam collection.<br />
<br />
== Missing ==<br />
<br />
* [[Yahoo!_Blog]]: What happened to the Vietnam archives? Does anyone have a copy or at least a blurry screenshot of the Korean shutdown notice?<br />
<br />
[[Category:Archive Team]]</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=Talk:Valhalla&diff=20214Talk:Valhalla2014-09-21T15:52:37Z<p>Archive Maniac: /* M-Disc Reliability */ new section</p>
<hr />
<div>== Optical is Terrible, But... ==<br />
http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/<br />
Both Amazon Glacier and Facebook use bluray media for long term storage. It doesn't degrade at all like DVD or CD since it's a completely different process. So, it has some serious potential. However, DVD and CD basically suck so hard that they've given optical media a (justifiably) bad name. <br />
<br />
If we WERE to use bluray media, it would probably be best served as a complimentary storage method, especially since its initial cost is so low (~$50 for a home writer plus the per/tb cost is relatively low)<br />
<br />
== M-Disc Reliability ==<br />
<br />
While the M-Disc is new technology, I do believe that it can protect your data for a long, long time. They even went to the effort of making some videos of the disc:<br />
<br />
*https://www.youtube.com/watch?v=bQENbP8npsw - Milleniata commercial<br />
*https://www.youtube.com/watch?v=v4tjgJHc0NQ - On the KSL5 news channel<br />
*https://www.youtube.com/watch?v=-QODg2U0JR0 - Designed to Endure! ad<br />
*https://www.youtube.com/watch?v=CfBEHlzvZnc - Lasagna test (needs a reaffirmation video)<br />
*https://www.youtube.com/watch?v=eFHYsGUf1Aw - M-Disc inspiration<br />
*https://www.youtube.com/watch?v=Y1zKZISYjZU - Testimonials<br />
<br />
Pretty sure that it's exaggerating a bit, but I personally think it's the archiving solution for anyone. I'm trying to state that the description should be slightly altered. [[User:Archive Maniac|Archive Maniac]] 11:52, 21 September 2014 (EDT)</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=File:Ms_internet_on_a_disc.jpg&diff=20213File:Ms internet on a disc.jpg2014-09-21T15:41:23Z<p>Archive Maniac: uploaded a new version of &quot;File:Ms internet on a disc.jpg&quot;: Larger. Where the hell do you even buy one of these? I really would love one!</p>
<hr />
<div>The Microsoft Internet on a Disk. It solves everything.</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User:Archive_Maniac&diff=20212User:Archive Maniac2014-09-21T15:37:49Z<p>Archive Maniac: Changed wording slightly</p>
<hr />
<div>I am a self-proclaimed archivist who is barely affiliated with the ArchiveTeam. I do appreciate their tools, however.<br />
<br />
I was known as Dec-31-99 on the [[IRC]] until I was kicked out for "asking too many questions" (T_T). My tracker name is "Y2KorSketchBeef". SketchBeef is a satirized title of [[User:Jscott|Jscott]]'s IRC name: "SketchCow" (I'm also jealous of that guy). I also volunteer by archiving wikis for the [[WikiTeam]]. It would be wonderful to archive full URLs to the Wayback Machine. I could totally help with that...<br />
<br />
My interests are pretty off from everyone around here, though, so don't expect me to hang around much. :/<br />
<br />
The friendliest people here are probably the quiet workers like yipdw and chfoo.</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=User:Archive_Maniac&diff=20197User:Archive Maniac2014-09-19T23:02:21Z<p>Archive Maniac: Created userpage.</p>
<hr />
<div>I am a self-proclaimed archivist who is barely affiliated with the ArchiveTeam. I do appreciate their tools, however.<br />
<br />
I was known as Dec-31-99 on the [[IRC]] until I was kicked out for "asking too many questions" (T_T). My tracker name is "Y2KorSketchBeef". SketchBeef is a satirical take on [[User:Jscott|Jscott]]'s IRC name: "SketchCow" (I'm also jealous of that guy). I also volunteer by archiving wikis for the [[WikiTeam]]. It would be wonderful to archive full URLs to the Wayback Machine. I could totally help with that...<br />
<br />
My interests are pretty off from everyone around here, though, so don't expect me to hang around much. :/<br />
<br />
The friendliest people here are probably the quiet workers like yipdw and chfoo.</div>Archive Maniachttps://wiki.archiveteam.org/index.php?title=WikiTeam&diff=20196WikiTeam2014-09-19T22:48:05Z<p>Archive Maniac: /* Official WikiTeam tools */ Changed links to more modern ones.</p>
<hr />
<div><center><big>'''We save wikis, from Wikipedia to tiniest wikis'''<br/>[http://code.google.com/p/wikiteam/wiki/AvailableBackups 15000+ wikis saved to date]</big></center><br />
{{Infobox project<br />
| title = WikiTeam<br />
| image = Wikiteam.jpg<br />
| description = WikiTeam, a set of tools for wiki preservation and a repository of wikis<br />
| URL = http://code.google.com/p/wikiteam<br />
| project_status = {{online}} (at least some of them)<br />
| tracker = manual for now, check [https://wikiapiary.com/wiki/Category:Website_not_archived not archived wikis on wikiapiary]<br />
| archiving_status = {{inprogress}} ([http://code.google.com/p/wikiteam/wiki/NewTutorial you can help])<br />
| irc = wikiteam<br />
}}<br />
<br />
Welcome to '''WikiTeam'''. A '''wiki''' is a website that allows the creation and editing of any number of interlinked web pages, generally used to store information on a specific subject or subjects. This is done with a day-to-day web browser using a simplified markup language (HTML as an example) or a WYSIWYG (what-you-see-is-what-you-get) text editor.<br />
<br />
Most of the wikis don't offer public backups. How bad!<br />
<br />
== Wikis to archive ==<br />
<br />
Please [https://wikiapiary.com/wiki/Special:FormEdit/Website add a wiki to wikiapiary] if you want someone to archive it sooner or later; or tell us on the #wikiteam channel if it's particularly urgent. Remember that there are thousands of wikis we don't even know about yet.<br />
<br />
[http://code.google.com/p/wikiteam/wiki/NewTutorial You can help] downloading wikis yourself. If you don't know where to start, pick a [https://wikiapiary.com/wiki/Category:Website_not_archived wiki which was not archived yet] from the lists on wikiapiary. If you can't, edit those pages to link existing dumps! You'll help others focus their work.<br />
<br />
Examples of huge wikis:<br />
* '''[[Wikipedia]]''' - arguably the largest and one of the oldest Wikis on the planet. It offers public backups (also for sister projects): http://dumps.wikimedia.org<br />
** They have some mirrors but not many.<br />
** Every now and then we upload a copy to archive.org, but this is not automated. You can do it in our stead. ;)<br />
* '''[[Wikimedia Commons]]''' - a Wiki of media files available for free usage. It offers public backups: http://dumps.wikimedia.org<br />
** But there is no image dump available, only the image descriptions<br />
** So we made it! http://archive.org/details/wikimediacommons<br />
* '''[[Wikia]]''' - a website that allows the creation and hosting of wikis. Doesn't make regular backups.<br />
<br />
There are also '''[[List of wikifarms|several wikifarms]]''' with hundreds of wikis. On this wiki we only create pages for those we have some special information about that we don't want to lose (like archiving history and tips). For a full list, please use wikiapiary: see the [https://wikiapiary.com/wiki/Farm:Main_Page wikifarms main page].<br />
<br />
We're trying to decide which [https://groups.google.com/forum/#!topic/wikiteam-discuss/TxzfrkN4ohA other wiki engines] to work on: suggestions needed!<br />
<br />
== Tools and source code ==<br />
=== Official WikiTeam tools ===<br />
* [http://code.google.com/p/wikiteam/ WikiTeam Google Code repository]<br />
* '''[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py dumpgenerator.py] to download MediaWiki wikis:''' <tt>python dumpgenerator.py --api=http://archiveteam.org/api.php --xml --images</tt><br />
* [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/wikipediadownloader.py wikipediadownloader.py] to download Wikipedia dumps from download.wikimedia.org: <tt>python wikipediadownloader.py</tt><br />
<br />
=== Other ===<br />
* [http://dl.dropbox.com/u/63233/Wikitravel/Source%20Code%20and%20tools/Source%20Code%20and%20tools.7z Scripts of a guy who saved Wikitravel]<br />
* [http://www.communitywiki.org/en/BackupThisWiki OddMuseWiki backup]<br />
* UseModWiki: use wget/curl and [http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/RawMode raw mode] (might have a different URL scheme, like [http://meatballwiki.org/wiki/action=browse&id=TheTippingPoint&raw=1 this])<br />
** Some wikis: [[UseMod:SiteList]]<br />
<br />
== Wiki dumps ==<br />
<br />
Most of our dumps are in the [http://www.archive.org/details/wikiteam wikiteam collection at the Internet Archive]. If you want an item to land there, just upload it in "opensource" collection and remember the "WikiTeam" keyword, it will be moved at some point. When you've uploaded enough wikis, you'll probably be made a collection admin to save others the effort to move your stuff.<br />
<br />
For a manually curated list, [http://code.google.com/p/wikiteam/wiki/AvailableBackups visit the download section] on Google Code.<br />
<br />
There is another site of MediaWiki dumps located [http://mirrors.sdboyd56.com/WikiTeam/index.html here] on [http://www.archiveteam.org/index.php?title=User:Sdboyd Scott's] website.<br />
<br />
=== Tips ===<br />
Some tips:<br />
* When downloading Wikipedia/Wikimedia Commons dumps, pages-meta-history.xml.7z and pages-meta-history.xml.bz2 are the same, but 7z use to be smaller (better compress ratio), so use 7z.<br />
* To download a mass of wikis with N parallel threads, just <code>split</code> your full <code>$list</code> in N chunks, then start N instances of <code>launcher.py</code> ([https://code.google.com/p/wikiteam/wiki/NewTutorial#Download_a_list_of_wikis tutorial]), one for each list<br />
** If you want to upload dumps as they're ready and clean up your storage: at the same time, in a separate window or screen, run a loop of the kind <code>while true; do ./uploader.py $list --prune-directories --prune-wikidump; sleep 12h; done;</code> (the <code>sleep</code> ensure each run has something to do). <br />
** If you want to go advanced and run really ''many'' instances, use <code>tmux</code>[http://blog.hawkhost.com/2010/07/02/tmux-%E2%80%93-the-terminal-multiplexer-part-2/]! Every now and then, attach to the tmux session and look (<code>ctrl-b f</code>) for windows stuck on "is wrong", "is slow" or "......" loops, or which are inactive[http://unix.stackexchange.com/questions/78093/how-can-i-make-tmux-monitor-a-window-for-inactivity]. Even with a couple cores you can run a hundred instances, just make sure to have enough disk space for the occasional huge ones (tens of GB).<br />
<br />
=== BitTorrent downloads ===<br />
You can download and seed the torrents from the archive.org collection.<br />
<br />
=== Old mirrors ===<br />
<span class="plainlinks"><br />
# [https://sourceforge.net/projects/wikiteam/files/ Sourceforge] (also mirrored to another 26 mirrors)<br />
# [http://www.archive.org/details/WikiTeamMirror Internet Archive] ([http://ia700705.us.archive.org/16/items/WikiTeamMirror/ direct link] to directory)<br />
</span><br />
<br />
== See also ==<br />
* [[List of wikifarms]]<br />
<br />
== External links ==<br />
* http://wikiindex.org - A lot of wikis to save<br />
* http://wiki1001.com/ offline?<br />
* http://www.cs.brown.edu/~pavlo/mediawiki/mediawikis.csv - 20,000 wikis<br />
* http://meta.wikimedia.org/wiki/List_of_largest_wikis<br />
* http://s23.org/wikistats/<br />
* http://en.wikipedia.org/wiki/Comparison_of_wiki_farms<br />
* http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive<br />
* http://blog.shoutwiki.com/<br />
* http://wikiheaven.blogspot.com/<br />
* [http://s23.org/wikistats/largest_html.php?th=15000&lines=500 List of largest wikis in the world]<br />
* Dump of [http://nostalgia.wikipedia.org/ nostalgia], an ancient version of Wikipedia from 2001, [http://dumps.wikimedia.org/nostalgiawiki dump]<br />
* http://code.google.com/p/wikiteam/wiki/AvailableBackups many dumps<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]</div>Archive Maniac