Difference between revisions of "Audit2014"
Jump to navigation
Jump to search
(→Current Sub-Collections at Archive Team: justintv has partial index) |
m (→WARC) |
||
Line 196: | Line 196: | ||
* <nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) | * <nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) | ||
* https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 | * https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 | ||
* https://archive.org/details/nintendo-warcs | |||
==== FTP ==== | ==== FTP ==== |
Revision as of 19:48, 26 August 2014
We've uploaded a bunch of stuff: https://archive.org/search.php?query=subject:Archiveteam
Let's go through the list and make sure it's categorized, has decent metadata, etc.
Many of our uploads are quite large, and have been broken into many items on Archive.org. We'll group them together here and verify each set all at once.
Things to check
- Collection
- Are all the related items grouped into a collection?
- Description
- Can a visitor figure out what each item represents? Items in a collection don't need to repeat the description of the collection, but it'd be nice if they had a sentence or two, and information about how the item differs from the other items in the collection ("MP3s from earbits.com, files starting with c." from the Earbits items is a good example.)
- Inclusion
- Are all the related items included in the same collection?
- Categorization
- Can a visitor find the item by browsing the collections?
- Cross-references
- Can a visitor find other items in a set, starting at any item in the set? Can a visitor find the index of a large set starting from any part of it?
- Indexing
- If the item is a collection of sub-items, is one of these sub-items an index of the others? (This is a complicated thing to check for and to create when it doesn't exist, so we can come back to this after we've checked the rest.)
- Your suggestion here
- this is just off the top of my head.
Current Sub-Collections at Archive Team
Collection | Status | Auditor | Item Count | Has an Index | Description of Audit |
---|---|---|---|---|---|
No Category | Unaudited | 98 | Yes | The items are not in a collection. Most items are WARCs; the rest need additional work if anyone is going to be able to find the exact MP3 they want. | |
archiveteam_ptch | Audited | db48x | 50 | No | Collection has great description, but no categories. Items in collection are WARCS. One item not included in the collection: deathy-s3-test-ptch |
archiveteam_flowerpot | Audited | db48x | 406 | No | The description of the collection is anemic, but each item is well-identified. |
github_files | Audited | db48x | 1 | No | Pretty bad shape. Only one item in the collection, and that's only half the data. Was the rest never uploaded? Has no description, keywords or other metadata. Other Github items could be included, such as this repository index, and these other file downloads |
justintv | Audited | db48x | 189 | Decent description, but no other metadata. There are 51 other 'justintv' items, but none of them look to be from us. | |
archiveteam_mochimedia | Unaudited | ||||
archivebot | Unaudited | ||||
archiveteam_yahooblogs | Unaudited | ||||
archiveteam-splinder | Unaudited | ||||
archiveteam-picplz | Unaudited | ||||
archiveteam_puush | Unaudited | ||||
archiveteam_upcoming | Unaudited | ||||
archiveteam_randomfandom | Unaudited | ||||
archiveteam_antecedents | Unaudited | ||||
archiveteam_jazzhands | Unaudited | ||||
archiveteam-mobileme-hero | Unaudited | ||||
archiveteam_myopera | Unaudited | ||||
archiveteam_bebo | Unaudited | ||||
archiveteam_dogster | Audited | jscott | 55 Items | ??? | Collection well described. Wayback Machine-Ready WARCs, all integrated. |
hyves | Unaudited | ||||
archiveteam_wretch | Unaudited | ||||
archiveteam_xanga | Unaudited | ||||
twitterstream | Unaudited | ||||
pastebinpastes | Unaudited | ||||
archiveteam-googlegroups-th | Unaudited | ||||
archiveteam_zapd | Unaudited | ||||
archiveteam_patch | Unaudited | ||||
archiveteam_posterous | Unaudited | ||||
archiveteam_greader | Unaudited | ||||
archiveteam_ignsites | Unaudited | ||||
archiveteam_g4tv_forums | Unaudited | ||||
archiveteam-yahoovideo | Unaudited | ||||
archive-team-friendster | Unaudited | ||||
archiveteam_formspring | Unaudited | ||||
archiveteam_yahoo_messages | Unaudited | ||||
archiveteam_punchfork | Unaudited | ||||
yahoo_korea_blogs | Unaudited | ||||
archiveteam-cinch | Unaudited | ||||
archiveteam_dailybooth | Unaudited | ||||
archiveteam_weblognl | Unaudited | ||||
stage6 | Unaudited | ||||
googlegroups-part2 | Unaudited | ||||
archiveteam-btinternet | Unaudited | ||||
archiveteam-qaudio-archive | Unaudited | ||||
webshots-freeze-frame | Unaudited | ||||
tabblo-archive | Unaudited | ||||
archiveteam-fortunecity | Unaudited | ||||
2012-04-30-wikimedia-images-snapshot | Unaudited | ||||
archiveteam-anyhub | Unaudited | ||||
archiveteam-fileplanet | Unaudited | ||||
archiveteam-umich-save | Unaudited | ||||
archiveteam-geocities | Unaudited | ||||
archiveteam-fire | Unaudited | ||||
archiveteam-mypodcast | Unaudited | ||||
archiveteam-googlegroups | Unaudited | ||||
isohunt dumps 1 2 3 | Unaudited | These are not yet in a dedicated collection, and have never been post-processed. Some of the .torrent files may actually be error pages. This needs work, and proper full auditing. | |||
No Category | Unaudited |
Oddities, Mislocations, and To Do
- https://archive.org/search.php?query=earbits Earbits gathering is in the wrong place and needs additional versions.
- https://archive.org/details/archiveteam_yahooblog_20140123193921 is misplaced in an non-ArchiveTeam collection.
To be moved to better collection
WARC
- https://archive.org/details/pouet.com_full_grab no WARC file visible for me
- https://archive.org/details/archiveteam_punchfork_archive-archive
- https://archive.org/details/sg1archive.com_forums_20140708
- https://archive.org/details/2013_misc_warcs_02
- https://archive.org/details/2013_misc_warcs_01
- https://archive.org/details/site-donkeyboytripodcom
- https://archive.org/details/site-homeswipnetseclubnintendo007
- https://archive.org/details/site-homeswipnetsecpg
- https://archive.org/details/site-homeswipnetsegamemaster
- https://archive.org/details/homeswipnetsenestabs
- https://archive.org/details/Site-homeswipnetsew-62848
- https://archive.org/details/site-homeswipnetsesofiasgbc
- https://archive.org/details/site-homeswipnetsexcheatsdk
- https://archive.org/details/site-home2swipnetsew26120
- https://archive.org/details/site-home3.swipnet.se-w38081
- https://archive.org/details/site-home4swipnetse-w42641
- https://archive.org/details/site-home4swipnetse-w46722
- https://archive.org/details/site-homeswipnetsefredde2000
- https://archive.org/details/ubuntuone-panicgrab-20140405
- https://archive.org/details/myopera-forums-1700001-1800000
- https://archive.org/details/myopera-forums-1800001-1823192
- https://archive.org/details/rawporter.s3.amazonaws.com_20140616_partial
- https://archive.org/details/technet.microsoft.com-panicgrab-20130706
- https://archive.org/details/isohunt_facebook_page_snapshot WARC and other formats
- https://archive.org/details/Misc.yero.orgMusic
- https://archive.org/details/telinco.co.uk_pages
- https://archive.org/details/tribes_forum_emergency_grab
- https://archive.org/details/isohunt-20131019-mithrandir-extra
- https://archive.org/details/cscope.us-google-pdfs-grab-20130312
- https://archive.org/details/cscope.us-google-pdfs-grab-20130520
- https://archive.org/details/PinkTentacle
- https://archive.org/details/journalstar.com_sports_local_20120730.warc
- https://archive.org/details/www.battleforthenet.com-panicgrab-20140718
- https://archive.org/details/theopeninter.net-panicgrab-20140718
- https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718
- https://archive.org/details/net.net-panicgrab-20140718
- https://archive.org/details/wwdctimer.com-panicgrab-20140731
- https://archive.org/details/xn--19g.com-panicgrab-20140731
- https://archive.org/details/chromercise.com-panicgrab-20140731
- https://archive.org/details/hiddenfromgoogle.com-panicgrab-20140731
- https://archive.org/details/orteil.dashnet.org-panicgrab-20140731
- https://archive.org/details/pingus.seul.org-panicgrab-20140731
- https://archive.org/details/tux4kids.alioth.debian.org-panicgrab-20140731
- https://archive.org/details/tuxkart.sourceforge.net-panicgrab-20140731
- https://archive.org/details/assets.minecraft.net-panicgrab-20140807
- https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808 (remove asterisk, spam filter doesn't like this link)
- https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808
- https://archive.org/details/nintendo-warcs
FTP
- https://archive.org/details/ftp.idsoftware.com
- https://archive.org/details/ftp.lucasarts.com-20130427
- https://archive.org/details/ftp.santronics.com
- https://archive.org/details/2014.02.ftp.inf.tuDresden.deAtari
Misc
- https://archive.org/details/archiveteam-picplz-index
- https://archive.org/details/Posterous.comHostnames
- https://archive.org/details/YahooBlogSitemaps20131216071927
- https://archive.org/details/archiveteam-mobileme-index
- https://archive.org/details/archiveteam-twitter-stream-2014-05
- https://archive.org/details/ESPNForumsPanicgrab
- https://archive.org/details/rawporter-grab
- https://archive.org/details/bitsnoop-dump
- https://archive.org/details/CaliforniaFinanceLobbyData
- https://archive.org/details/ArchiveteamWarriorV220121008Hyperv
- https://archive.org/details/HowFlickr.comLookedLikeIn2010-APlaceOfWorshipOnFlickr-Photo
- https://archive.org/details/myopera_shutdown_notice
- https://archive.org/details/UsenetSci.space.news2003-2012
- https://archive.org/details/Usenet_rec.food.recipesArchive2003-2012
- https://archive.org/details/MirrorOfSiteOrtodoxiesiviata.blogspot.com
- https://archive.org/details/CaliforniaFinanceLobbyData
- https://archive.org/details/carti.itarea.org
- https://archive.org/details/ovmk_story
- https://archive.org/details/ti_guidebook_en
- https://archive.org/details/ti_guidebook_fr
- https://archive.org/details/ti_guidebook_de
- https://archive.org/details/myopera_usernames_FIXED.7z
- https://archive.org/details/DubaiWikipediaPageOn2012-09-06
- https://archive.org/details/digpicz-2008-07-30-website
- https://archive.org/details/site-wwwangelfirecomazdixieden
- https://archive.org/details/ArkiverCrawlsPack0004
- https://archive.org/details/ArkiverCrawlsPack0005
- https://archive.org/details/ArkiverCrawlsPack0007
- https://archive.org/details/ArkiverCrawlsPack0008
- https://archive.org/details/laptops-manuals-dump-from-tim.id.au-20121111
- https://archive.org/details/paste_lisp_org
- https://archive.org/details/MtGoxSituationCrisisStrategyDraft
- https://archive.org/details/MtGoxBusinessPlan20142017
- https://archive.org/details/nyt_innovation_2014
- https://archive.org/details/slackware-irc-logs
- https://archive.org/details/thekeep_bbs