https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Dnova&feedformat=atomArchiveteam - User contributions [en]2024-03-19T10:43:17ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Valhalla&diff=26580Valhalla2016-11-30T03:35:38Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Google Apps Unlimited<br />
| $120* ("unlimited" storage for $600/year, free for Google Apps for Education/Nonprofits)<br />
|<br />
|<br />
|<br />
| Google Apps Unlimited, Google Apps for Education and Google Apps for Nonprofits provide 1TB of storage per user for domains with less than 5 users, and "unlimited" storage for domains with five or more users. Google Apps Unlimited starts at $10/user/month, and at least $600/year for unlimited storage. Google Apps for Education and Nonprofits is free but requires fulfillment of certain criteria, such as being a 501(c)(3) nonprofit.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
| MongoDB fares poorly at data integrity, performance is dubious, and scaling is a nightmare. Not recommended.<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|-<br />
| [http://syncthing.net/ Syncthing]<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| GPL<br />
| Simple<br />
| Open Source Software, active Developement, individual rights<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it. Rightsmanagment allows to only download but not change the files in the cloud.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|-<br />
| BitTorrent<br />
| Filesystem<br />
| All<br />
| various<br />
| Distributed<br />
| Readily available technology, easily understood distribution model, contributors can join or leave at any time<br />
| Harder to get people interested in contribute if they have to join bittorrent swarms<br />
| Breaking a large item up into smaller torrents makes contributing smaller chunks of space possible, and a custom client could be created which would let the user dedicate some space and automatically join the swarms which have the fewest peers. Getting the initial seeds requires coordination to distribute the data across available seeds by other means, creating the sub-torrents, etc.<br />
|-<br />
| Git Annex<br />
| Filesystem<br />
| linux, mac, windows, android (in that order)<br />
| GPL (mostly)<br />
| Distributed<br />
| Once you join a repository (by using git clone), you can choose which of the files within that repository you will actually store locally.<br />
| Pretty complicated to use on the command-line; git annex assistant isn't quite geared for this use-case but could handle it anyway.<br />
| [[INTERNETARCHIVE.BAK/git-annex_implementation]]<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== Alternatives ==<br />
<br />
For completeness sake:<br />
<br />
* Fund raising. If IA had more donations coming in then the problem would be less important.<br />
* Grant writing. A more formal form of fund raising, but most grant writers would expect the grantee to be doing the work. Works well in combination with the other methods mentioned above.<br />
* Better accessibility. If the problem is that these archives are simultaneously large and infrequently accessed, then making them more accessible would make the size easier to swallow.<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
Consumer NAS devices have severe firmware issues, potentially causing full data loss on a trivial operation. Such a case was previously observed after flashing a new official firmware image onto a QNAP Pro series 4 bay NAS (700€ empty) while the RAID was presumably resyncing. It has to be expected that the device prefers reinitialization over being stuck with an error.<br />
<br />
HDD compatibility is limited and has needs close investigation, WD Green 2TB for example tend to frequently degrade the RAID array and accumulate load cycles from frequent head parking.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world, or will you want to keep it? <br />
|-<br />
| vitorio<br />
|<br />
* Participating in the simple pool (I only have a laptop, so I'd store the HDs offline at home and check them monthly/quarterly)<br />
* Participating in the distributed pool (residential 30/10 connection)<br />
* Contributing $100/mo. for the server option<br />
| Indefinitely<br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world.<br />
|-<br />
| pluesch<br />
|<br />
* Willing to provide 12 TB storage space (3x 4TB drives) on a rented ovh.com server; Access via SSH; Uploaded stuff can be made available via Rsync and HTTP<br />
| As long as I have a job. It's a very stable position atm.<br />
| If I can't provide the storage anymore I'll inform archiveteam at least 3 months before.<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
* [[INTERNETARCHIVE.BAK]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20261Valhalla2014-09-29T05:25:31Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world, or will you want to keep it? <br />
|-<br />
| dnova<br />
|<br />
* Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs).<br />
* Willing to write/maintain tape library (but cannot provide tape drive/tapes).<br />
* Willing to participate in simple pool or storage pool, depending on technical details. <br />
* I can store media in a class 1000 cleanroom!<br />
* Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| <br />
* 2+ years in my current geographical location and with cleanroom access. <br />
* Willing to continue indefinitely wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of all media/hardware anywhere in the world. <br />
|-<br />
| vitorio<br />
|<br />
* Participating in the simple pool (I only have a laptop, so I'd store the HDs offline at home and check them monthly/quarterly)<br />
* Participating in the distributed pool (residential 30/10 connection)<br />
* Contributing $100/mo. for the server option<br />
| Indefinitely<br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world.<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20259Valhalla2014-09-29T05:20:50Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
| dnova<br />
|<br />
* Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs).<br />
* Willing to write/maintain tape library (but cannot provide tape drive/tapes).<br />
* Willing to participate in simple pool or storage pool, depending on technical details. <br />
* I can store media in a class 1000 cleanroom!<br />
* Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| <br />
* 2+ years in my current geographical location and with cleanroom access. <br />
* Willing to continue indefinitely wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world. <br />
|-<br />
| vitorio<br />
|<br />
* Participating in the simple pool (I only have a laptop, so I'd store the HDs offline at home and check them monthly/quarterly)<br />
* Participating in the distributed pool (residential 30/10 connection)<br />
* Contributing $100/mo. for the server option<br />
| Indefinitely<br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world.<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20258Valhalla2014-09-29T05:18:58Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
| dnova<br />
|<br />
* Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs).<br />
* Willing to write/maintain tape library (but cannot provide tape drive/tapes).<br />
* Willing to participate in simple pool or storage pool, depending on technical details. <br />
* I can store media in a class 1000 cleanroom!<br />
* Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| <br />
* 2+ years in my current geographical location and with cleanroom access. <br />
* Willing to continue indefinitely wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of media. Willing to ship any storage media anywhere in the world. <br />
|-<br />
| vitorio<br />
|<br />
* Participating in the simple pool (I only have a laptop, so I'd store the HDs offline at home and check them monthly/quarterly)<br />
* Participating in the distributed pool (residential 30/10 connection)<br />
* Contributing $100/mo. for the server option<br />
| Indefinitely<br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world.<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20257Valhalla2014-09-29T05:17:08Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
| dnova<br />
|<br />
* Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs).<br />
* Willing to write/maintain tape library (but cannot provide tape drive/tapes).<br />
* Willing to participate in simple pool or storage pool, depending on technical details. <br />
* I can store media in a class 1000 cleanroom!<br />
* Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| 2+ years in my current geographical location and with cleanroom access. Willing to continue wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of media. Willing to ship any storage media anywhere in the world. <br />
|-<br />
| vitorio<br />
|<br />
* Participating in the simple pool (I only have a laptop, so I'd store the HDs offline at home and check them monthly/quarterly)<br />
* Participating in the distributed pool (residential 30/10 connection)<br />
* Contributing $100/mo. for the server option<br />
| Indefinitely<br />
| Can give ample notice for either full upload and/or shipping of all hardware anywhere in the world.<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20255Valhalla2014-09-29T05:12:30Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
| dnova<br />
| Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs). Willing to write/maintain tape library (but cannot provide tape drive/tapes). I can store media in a class 1000 cleanroom! Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| 2+ years in my current geographical location and with cleanroom access. Willing to continue wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of media. Willing to ship any storage media anywhere in the world. <br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20254Valhalla2014-09-29T05:09:41Z<p>Dnova: /* What Can You Contribute? */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
| dnova<br />
| Willing to burn and maintain a blu-ray collection (can to provide burner and at least some discs). Willing to maintain tape library (but cannot provide tape drive/tapes). I can store media in a class 1000 cleanroom! Willing to provide short-term storage for few hundreds of GB of RAIDZ-1 storage on a 75/10 residential connection. <br />
| 2+ years in my current geographical location and with cleanroom access. Willing to continue wherever I go, but some details may change accordingly. <br />
| Can give ample notice for either full upload and/or shipping of media. Willing to ship any storage media anywhere in the world. <br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20253Valhalla2014-09-29T04:52:28Z<p>Dnova: Added section "What Can You Contribute?"</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== What Can You Contribute? == <br />
<br />
{| class="wikitable sortable"<br />
! Name<br />
! What You Can Contribute<br />
! For How Long?<br />
! Exit Strategy<br />
|-<br />
| ExampleArchiver<br />
| Describe what you are willing to buy/build/write/do. Talk about the connection you would use, the storage conditions, etc. How much money can you put into it? <br />
| For how long can you truly commit to this?<br />
| If you need to quit or wind down your contribution, what are you willing to do? Can you guarantee a period of notice? Are you willing to ship your hardware or media to another volunteer anywhere in the world? <br />
|-<br />
|<br />
|<br />
|<br />
|<br />
|-<br />
|}<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20252Valhalla2014-09-29T04:22:43Z<p>Dnova: /* No, seriously, how are you going to actually DO it */</p>
<hr />
<div>[[Image:Ms internet on a disc.jpg|300px|right]]<br />
This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
This project/discussion has come around because there is a class of data currently existing, several times a year, as a massive amount of data with "large, but nominal" status within the Internet Archive. The largest example is currently MobileMe, which is hundreds of terabytes in the Internet Archive system (and in need of WARC conversion), which represents a cost amount far outstripping its use. Another is TwitPic, which is currently available (and might continue to be available) but which has shown itself to be a bad actor with regards to longevity and predictability for its sunset. <br />
<br />
Therefore, there is an argument that there could be a "third place" that data collected by Archive Team could sit, until the Internet Archive (or another entity) grows its coffers/storage enough that 80-100tb is "no big deal", just like 1tb of data was annoying in 2009 and now is totally understandable for the value, i.e. Geocities. <br />
<br />
This is for short-term (or potentially also long-term) storage options, say five years or less, of data generated by Archive Team.<br />
<br />
* What options are out there, generally?<br />
* What are the costs, roughly?<br />
* What are the positives and negatives?<br />
<br />
There has been a lot of study in this area over the years, of course, so links to known authorities and debates will be welcome as well.<br />
<br />
Join the discussion in [irc://irc.efnet.org/huntinggrounds #huntinggrounds].<br />
<br />
== Goals ==<br />
<br />
We want to:<br />
<br />
* Dump an unlimited<ref>Unlimited doesn't mean infinite, but it does mean that we shouldn't worry about running out of space. We won't be the only expanding data store.</ref> amount of data into something.<br />
* Recover that data at any point.<br />
<br />
We do not care about:<br />
<br />
* Immediate or continuous availability.<br />
<br />
We absolutely require:<br />
<br />
* Low (ideally, zero) human time for maintenance. If we have substantial human maintenance needs, we're probably going to need a Committee of Elders or something.<br />
* Data integrity. The storage medium must be impossibly durable or make it inexpensive/easy to copy and verify the data onto a fresh medium.<br />
<br />
It would be nice to have:<br />
<br />
* No special environmental requirements that could not be handled by a third party. (So nobody in Archive Team would have to set up some sort of climate-controlled data-cave; however, if this is already something that e.g. IA does and they are willing to lease space, that's cool.)<br />
<br />
== What does the Internet Archive do for this Situation, Anyway? ==<br />
<br />
''This section has not been cleared by the Internet Archive, and so should be considered a rough sketch.''<br />
<br />
The Internet Archive primarily wants "access" to the data it stores, so the primary storage methodology is spinning hard drives connected to a high-speed connection from multiple locations. These hard drives are between 4-6tb (as of 2014) and are of general grade, as is most of the hardware - the theory is that replacing cheap hardware is better than spending a lot of money on super-grade hardware (whatever that may be) and not being able to make the dollars stretch. Hundreds of drives die in a month and the resiliency of the system allows them all to hot-swap in replacements. <br />
<br />
There are multiple warehouses for storing the original books that are scanned, as well as materials like CD-ROMs and even hard drives. There are collections of tapes and CD-ROMs from previous iterations of storage, although they are thought of as drop-dead options instead of long-term archival storage - the preference is, first and foremost, the spinning hard drives.<br />
<br />
The Archive does not generally use tape technology, having run into the classic "whoops, no tape drive on earth reads these any more" and "whoops, this tape no longer works properly".<br />
<br />
The Archive has indicated that if Archive Team uses a physical storage method, such as tapes, paper, hard drives or anything else, that they are willing to store these materials "as long as they are exceedingly labelled".<br />
<br />
== Physical Options ==<br />
{| class="wikitable sortable"<br />
! Storage type<br />
! Cost ($/TB/year)<br />
! Storage density (m³/TB)<br />
! Theoretical lifespan<br />
! Practical, tested lifespan<br />
! Notes<br />
|-<br />
| Hard drives (simple distributed pool)<br />
| $150 (full cost of best reasonable 1TB+ external HD)<br />
| <br />
| <br />
| <br />
| September 2014, best reasonable 1TB+ external HD is [http://thewirecutter.com/reviews/the-best-external-desktop-hard-drive/ a 4TB WD]. 25+ pool members would need one HD each plus a computer plus software to distribute data across the entire pool.<br />
|-<br />
| Hard drives (dedicated distributed pool)<br />
| <br />
| <br />
| <br />
| <br />
| An off-the-shelf or otherwise specified, dedicated, network storage device used exclusively as part of a distributed pool.<br />
|-<br />
| Hard drives (SPOF) <ref>The [[Internet Archive]]'s cost per TB, with 24/7 online hard drives, is approximately $2000 for forever.</ref><br />
| $62 (but you have to buy 180TB)<br />
| <br />
| <br />
| <br />
| For a single location to provide all storage needs, building a [https://www.backblaze.com/blog/backblaze-storage-pod-4/ Backblaze Storage Pod 4.0] runs an average of $11,000, providing 180TB of [http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/ non-redundant, not-highly-available] storage. (You really want more than one pod mirroring your data, but this is the most effective way to get that much storage in one place.)<br />
|-<br />
| Commercial / archival-grade tapes<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Consumer tape systems (VHS, Betamax, cassette tapes, ...)<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Vinyl<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
| <br />
| <br />
| <br />
| <br />
| 500KB per letter sheet means 1TB is 2,199,024 sheets, or ~4400 reams (500 sheets each), or an 8'x16' room filled with 6' tall stacks. It would take 63.6 days of continuous printing to do this.<ref>A HP LaserJet 5Si printing 24 pages per minute which generates the 500K bytes per page, yielding approximately 200,000 bytes per second.</ref><br />
|-<br />
| [http://ronja.twibright.com/optar/ Optar]<br />
| <br />
| <br />
| <br />
| <br />
| At 200KB per page, this has less than half the storage density of Paperback.<br />
|-<br />
| Blu-Ray<br />
| $40 (50 pack spindle of 25GB BD-Rs)<br />
| <br />
| 30 years<ref>On the basis of the described studies and assuming adequate consideration of the specified conditions for storage and handling, as well as verification of data after writing, we estimate the Imation CD, DVD or Blu-ray media to have a theoretical readability of up to 30 years. The primary caveat is how you handle and store the media. http://support.tdkperformance.com/app/answers/detail/a_id/1685/~/life-expectancy-of-optical-media </ref><br />
| <br />
| Lasts a LOT longer than CD/DVD, but should not be assumed to last more than a decade. [http://arstechnica.com/information-technology/2014/01/why-facebook-thinks-blu-ray-discs-are-perfect-for-the-data-center/ Raidz3 with Blu-rays Doing a backup in groups of 15 disks]. Comes to under $.04/GB which is cheap, and low initial investment (drives) too!<br><br />
<br>Specifically, a 50pack spindle of 25GB BD-Rs could readily hold 1TB of data for $30-50 per spindle. 50GB and 100GB discs are more expensive per GB.<br />
|-<br />
| [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
| <br />
| <br />
| <br />
| <br />
| Unproven technology, but potentially interesting.<br />
|-<br />
| Flash media<br />
| <br />
| <br />
| <br />
| <br />
| Very durable for online use, and usually fails from lots of writes. A drive might never wear out from cold-storage usage. Newer drives can have 10-year warranties. But capacitors may leak charge over time. JEDEC JESD218A only specifies 101 weeks (almost two years) retention without power, so we'd have to check the spec of the specific drives, or power them up and re-write the data to refresh it about once a year. Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
|-<br />
| Glass/metal etching<br />
| <br />
| <br />
| <br />
| <br />
| <br />
|-<br />
| Amazon Glacier<br />
| $122.88 (storage only, retrieval billed separately)<br />
| <br />
| average annual durability of 99.999999999% <ref>"Amazon Glacier is designed to provide average annual durability of 99.999999999% for an archive. The service redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon Glacier synchronously stores your data across multiple facilities before returning SUCCESS on uploading archives. Glacier performs regular, systematic data integrity checks and is built to be automatically self-healing." Maciej Ceglowski thinks that's [https://blog.pinboard.in/2014/04/cloudy_snake_oil/ kinda bullshit compared to the failure events you don't plan for], of course.</ref><br />
| <br />
| Retrieval is billed separately. 5% or less per month into S3 is free (5% of 100TB is 5TB), and data can be copied out from S3 to a SATA HD for $2.50/hr. plus media handling and shipping fees. Downloading 5TB from S3 would cost $614.40 (~$122.88/TB), but only $44.82 to transfer to HD via USB 3 or SATA (USB 2 is slower).<br />
|-<br />
| Dropbox for Business<br />
| $160* ($795/year)<br />
| <br />
| <br />
| <br />
| Dropbox for Business provides a shared pool of 1TB per user, at $795/year (five user minimum, 5TB), and $125 each additional user/year.<br />
|-<br />
| Box.com for Business<br />
| $180* ("unlimited" storage for $900/year)<br />
| <br />
| <br />
| <br />
| Box.com for Business provides "unlimited" storage at $15/user/month, five user minimum, or $900/year.<br />
|-<br />
| Dedicated colocated storage servers<br />
| $100* (e.g. $1300 for one year of 12TB rackmount server rental)<br />
|<br />
|<br />
|<br />
| Rent [http://www.ovh.com/us/dedicated-servers/storage/ storage servers from managed hosting colocation providers], and pool data across them. Benefits include bandwidth and electricity being included in the cost, and files could be made available online immediately. Negatives include needing to administer tens of servers.<br />
|}<br />
<br />
== Software Options ==<br />
<br />
Some of the physical options require supporting software.<br />
<br />
Removable media requires a centralized index of who has what discs, where they are, how they are labeled, and what the process for retrieval/distribution is. It could just be a wiki page, but it does require something.<br />
<br />
A simple pool of HDs ("simple pool"), one without a shared filesystem, just people offering up HDs, requires software running on Windows, Linux and/or Mac hardware to allow Archive Team workers to learn who has free disk space, and to save content to those disks. This could be just an IRC conversation and SFTP, but the more centralized and automated, the more likely available disk space will be able to be utilized. Software that is not cross-platform cannot be used here.<br />
<br />
A simple distributed and redundant pool of HDs ("distributed pool") requires software running on Windows, Linux and Mac hardware to manage a global filesystem or object store, and distribute uploads across the entire pool of available space, and make multiple copies on an ongoing basis to ensure preservation of data if a pool member goes offline. This has to be automated and relatively maintenance-free, and ideally low-impact on CPU and memory if it will be running on personal machines with multi-TB USB drives hanging off them. Software that is not cross-platform cannot be used here.<br />
<br />
A dedicated distributed and redundant pool of HDs ("dedicated pool") requires a selection of dedicated hardware and disks for maximum availability, and software to run on that hardware to manage a global filesystem or object store. It has to be automated and relatively maintenance-free, but would be the only thing running on its dedicated hardware, and as such does not have to be cross-platform.<br />
<br />
{| class="wikitable sortable"<br />
! Software name<br />
! Filesystem or Object Store?<br />
! Platform(s)<br />
! License<br />
! Good for which pool?<br />
! Pros<br />
! Cons<br />
! Notes<br />
|-<br />
| Tahoe-LAFS<br />
| Filesystem<br />
| Windows, Mac, Linux<br />
| GPL 2+<br />
| Distributed, dedicated<br />
| Uses what people already have, can spread expenses out, could be a solution done with only software<br />
| Barrier to leaving is non-existent, might cause data-loss even with auto-fixing infrastructure. Too slow to be a primary offloading site. <ref>"Practically the following results have been reported: 16Mbps in throughput for writing and about 8.8Mbps in reading" -- from https://tahoe-lafs.org/trac/tahoe-lafs/wiki/FAQ, making it non-competitive with the 1-2 gigabit speeds needed when archiving twitch.tv.</ref><br />
| Accounting is experimental, meaning "in practice is that anybody running a storage node can also automatically shove shit onto it, with no way to track down who uploaded how much or where or what it is" -joepie91 on IRC<br />
|-<br />
| Ceph<br />
| Object store, Filesystem<br />
| Linux<br />
| LGPL<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Linux, BSD, OpenSolaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Gfarm<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| X11<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| Quantcast<br />
| Filesystem<br />
| Linux<br />
| Apache<br />
| Dedicated<br />
|<br />
| Like HDFS, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| GlusterFS<br />
| Filesystem<br />
| Mac, Linux, BSD, Solaris<br />
| GPL 3<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| HDFS<br />
| Filesystem<br />
| Java<br />
| Apache<br />
| Distributed, dedicated<br />
|<br />
| Like Quantcast, intended for MapReduce processing, which writes large files, and doesn't delete them. Random access and erasing or moving data around may not be performant.<br />
| <br />
|-<br />
| XtreemFS<br />
| Filesystem<br />
| Linux, Solaris<br />
| BSD<br />
| Dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| MogileFS<br />
| Object store<br />
| Linux<br />
| GPL<br />
| Dedicated<br />
| Understands distributing files across multiple networks, not just multiple disks<br />
|<br />
| As an object store, you can't just mount it as a disk and dump files onto it, you have to push them into it through its API, and retrieve them the same way.<br />
|-<br />
| Riak CS<br />
| Object store<br />
| Mac, Linux, BSD<br />
| Apache<br />
| Dedicated<br />
| S3 API compatible<br />
| Multi-datacenter replication (which might be what you consider having multiple disparate users on different networks) is only available in the commercial offering.<br />
| A former Basho employee suggests this might not be a good fit due to the high latency and unstable connections we'd be dealing with. Datacenter-to-datacenter sync is an "entirely different implementation" than local replication, and would require the enterprise offering.<br />
|-<br />
| MongoDB GridFS<br />
| Object store<br />
| Windows, Mac, Linux<br />
| AGPL<br />
| Distributed, dedicated<br />
|<br />
|<br />
|<br />
|-<br />
| LeoFS<br />
| Object store<br />
| Mac, Linux<br />
| Apache<br />
| Dedicated<br />
| S3-compatible interface, beta NFS interface, supports multi-datacenter replication, designed with GUI administration in mind<br />
|<br />
|<br />
|-<br />
| BitTorrent Sync<br />
| Synchronization<br />
| Windows, Mac, Linux, BSD, NAS<br />
| Proprietary<br />
| Simple<br />
| Commercially supported software<br />
| As straight synchronization software, it mirrors folders across devices. Individual users would have to make synched folders available to get copies of archives, and then they would be mirrored, and that's it.<br />
| Synchronization software in general is not the right solution for this problem.<br />
|}<br />
<br />
== Non-options ==<br />
* Ink-based Consumer Optical Media (CDs, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long. The fact is, the history of optical writable media has been on of chicanery, failure, and overpromising while under-delivering. Some DVDs failed within a year. There are claims Blu-Ray is different, but fool me 3,504 times, shame on me.<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon S3 / Google Cloud Storage / Microsoft Azure Storage<br />
** Amazon S3 might be a viable waypoint for intra-month storage ($30.68/TB), but retrieval over the internet, as with Glacier, is expensive, $8499.08 for 100TB. Google's and Microsoft's offerings are all in the same price range.<br />
* Floppies<br />
** ''"Because 1.4 trillion floppies exists less than 700 billion floppies. HYPOTHETICALLY, if you set twenty stacks side by side, figure a quarter centimeter per floppy thickness, excluded the size of the drive needed to read the floppies you would still need a structure 175,000 ft. high to house them. Let's also assume that the failure rate for floppies is about 5% (everyone knows that varies by brand, usage, time of manufacture, materials used, etc, but lets say 5% per year). 70 million of those 1.4 trillion floppies are unusuable. Figuring 1.4 MB per floppy disk, you are losing approximately 100MB of porn each year. Assuming it takes 5 seconds to replace a bad floppy, you would have to spend 97,222 hrs/yr to replace them. Considering there are only 8,760 hrs per year, you would require a staff of 12 people replacing floppies around the clock or 24 people on 12 hr shifts. Figuring $7/hr you would spend $367,920 on labor alone. Figuring a nickel per bad floppy, you would need $3,500,000 annually in floppy disks, bringing your 1TB floppy raid operating costs (excluding electricity, etc) to $3,867, 920 and a whole landfill of corrupted porn. Thank you for destroying the planet and bankrupting a small country with your floppy based porn RAID."'' ([http://gizmodo.com/5431497/why-its-better-to-pretend-you-dont-know-anything-about-computers?comment=17793028#comments source])<br />
<br />
== From IRC ==<br />
<br />
<Drevkevac> we are looking to store 100TB+ of media offline for 25+ years<br />
<Drevkevac> if anyone wants to drop in, I will pastebin the chat log<br />
<rat> DVDR and BR-R are not high volume. When you have massive amounts of data, raid arrays have too many points of failure.<br />
<rat> Drevkevac: I work in a tv studio. We have 30+ years worth of tapes. And all of them are still good.<br />
<rat> find a hard drive from 30 years ago and see how well it hooks up ;)<br />
<brousch_> 1500 Taiyo Yuden Gold CD-Rs http://www.mediasupply.com/taiyo-yuden-gold-cd-rs.html<br />
<br />
<Drevkevac> still, if its true, you could do, perhaps, raidz3s in groups of 15 disks or so?<br />
<SketchCow> Please add paperbak to the wiki page.<br />
<SketchCow> Fuck Optical Media. not an option;.<br />
<Drevkevac> that would give you ~300GB per disk group, with 3 disks<br />
<br />
== Where are you going to put it? ==<br />
<br />
Okay, so you have the tech. Now you need a place for it to live.<br />
<br />
Possibilities:<br />
<br />
* The Internet Archive Physical Warehouse, Richmond, CA<br />
** The Internet Archive has several physical storage facilities, including warehouses in Richmond, CA (home of the Physical Archive) and the main location in San Francisco, CA. They have indicated they are willing to take copies of Archive Team-sponsored physical materials with the intent of them being ingested into the Archive at large over time, as costs lower and 100tb collections are not as big a drain (or a rash of funding arrives elsewhere).<br />
<br />
* Living Computer Museum, Seattle, WA<br />
** In discussions with Jason Scott, the Living Computer Museum has indicated they will have physical storage available for computer historical materials. Depending on the items being saved by Archive Team, they may be willing to host/hold copies for the forseable future.<br />
<br />
* Library of Congress, Washington, DC<br />
** The Library of Congress may be willing to take a donation of physical storage, although it is not indicated what they may do long-term with it.<br />
<br />
Multiple copies would of course be great.<br />
<br />
== No, seriously, how are you going to actually DO it ==<br />
<br />
There are only a few practical hardware+software+process combinations. In order of cost to each volunteer:<br />
<br />
* A pool of volunteers with Blu-ray burners commit to ("the Blu-ray option"): <br />
** buying a 50-disc spindle of 25GB discs per TB per project,<br />
** burning them,<br />
** verifying them,<br />
** storing them somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** verifying them regularly (monthly? quarterly?) and replacing discs if necessary, and<br />
** shipping them somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
This probably requires a minimum of three volunteers per TB per project. Probably best to pre-split the data into < 25GB chunks so each disc can be labeled the same and expected to have the same data on it. Fifty 25GB discs is a little more than a TB, and it's expected you'll lose a few to bad burns each time, but it might be worth buying more than a spindle and generating parity files onto additional discs.<br />
<br />
* A pool of volunteers commit to ("the simple pool"):<br />
** buying a best reasonable external HD,<br />
** downloading archives to it,<br />
** keeping it spun up, or spinning it up regularly (monthly? quarterly?) and running filesystem and content checks on it,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying additional HDs once it's full or if there are drive errors, and<br />
** shipping it somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
Same as with Blu-rays, and not really any more expensive ($150 == $37.50 for one 1TB of Blu-rays * 4, or one 4TB HD), except look at all that disc-swapping time and effort you don't have to do. You don't have to split data into chunks, but you do want to download it in a resumable fashion and verify it afterwards, so, checksums, parity files, something. You also risk losing a lot more if a drive fails, and the cost per-volunteer is higher (replacing a whole drive versus replacing individual discs or spindles). As such, you still probably want a minimum of three volunteers per TB per project (so a 2TB project needs six volunteers with 1TB each, not three volunteers holding all 2TB each).<br />
<br />
* A pool of volunteers commit to ("the distributed pool"):<br />
** all buying the same, standard, inexpensive, hackable, RAID 1, NAS,<br />
*** WD My Cloud Mirror (starts at $300 for 2TB [called "4TB," only 2TB with mirroring])<br />
*** QNAP (2-bay starts at $140 without HDs)<br />
*** Synology (2-bay starts at $200 without HDs)<br />
*** Pogoplug Series 4 + two best reasonable external HD + software RAID 1, or a download script that manually mirrors files ($20 without HDs)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled (a shelf in a house with AC and heat is fine, an attic/garage/flooded basement is not),<br />
** buying entire additional units once they are full or if there are drive errors, and<br />
** shipping the drives (or the entire My Cloud Mirror unit, if that's the one selected) somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
These units provide dramatically improved reliability for content, enough that perhaps you only need two volunteers per project, and no need to split by TB, since each volunteer would have two copies. Having everyone buy the same hardware means reduced administration time overall, especially if custom scripts are involved. QNAP and Synology both have official SDKs, and all of them run some flavor of Linux, with Synology supporting SSH logins out of the box. The Pogoplug is the most underpowered of the options, but even it should be powerful enough to run a MogileFS storage node, or a script that downloads to one HD and copies to the other. (Checksums would be really slow, though.) This is moderately expensive per-volunteer, with an upfront cost of $320-$500.<br />
<br />
* A pool of volunteers commit to ("the dedicated pool"):<br />
** all buying the same, standard, expensive NAS,<br />
*** iXsystems FreeNAS Mini (starts at $1000 without HDs),<br />
*** A DIY FreeNAS box ($300+ without HDs),<br />
*** A DIY NexentaStor box (probably the same as the DIY FreeNAS box)<br />
** keeping it spun up, online, and possibly accessible by external AT admins,<br />
** storing it somewhere climate-controlled and well-ventilated (a shelf with no airflow is not fine),<br />
** replacing drives if there are drive errors,<br />
** migrating the pool to larger disks once it starts getting full, and<br />
** shipping the drives somewhere else upon request, with no expectation of return (permanent storage, consolidation, etc.).<br />
<br />
A set of volunteers with (comparatively) expensive network-attached storage gives you a lot of storage in a lot of locations, potentially tens of redundant TB in each one, depending on the size of the chassis. You want everyone running the same NAS software, but the hardware can vary somewhat; however, the hardware should all have ECC RAM, and the more the better. MogileFS storage nodes are known to run on NexentaStor, and FreeNAS supports plugins, so it could be adapted to run there, or you could figure out e.g. LeoFS (which also expects ZFS). This is the most expensive option per-volunteer, upfront costs starting at around $1300 for a DIY box with four 4TB WD Red drives.<br />
<br />
* A pool of volunteers set up a recurring payment to fund ("the server option"):<br />
** one or more rented, managed, storage servers; or<br />
** saving up to buy one or more storage servers, and then hosting it somewhere.<br />
<br />
A rented server has no hardware maintenance costs; replacing a failed HD is the responsibility of the hosting provider, both in terms of materials cost and in labor cost. This is not the case with a purchased server, where someone would have to buy a replacement hard drive, bring it to the colocation center, and replace the drive; or someone would have to buy a replacement disk, ship it to the colocation center, and then they would bill someone for the labor involved in replacing it.<br />
<br />
== Project-specific suggestions ==<br />
<br />
=== Twitch.tv (and other video services) ===<br />
<br />
* Keep the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.<br />
<br />
== See Also ==<br />
*[[Storage Media]]<br />
<br />
== References ==<br />
<references/><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Puu.sh&diff=17549Puu.sh2013-09-20T20:21:09Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = puu.sh<br />
| logo = puush_logo.png<br />
| image = Puush_homepage_screenshot.png<br />
| URL = http://puush.me/<br />
| project_status = '''Special Case'''<br />
| source = https://github.com/ArchiveTeam/puush-grab<br />
| archiving_status = {{saved}} ~5.8 TB of data<br />
Still {{inprogress}}<br />
| irc = pushharder<br />
| tracker = http://b07s57le.corenetworks.net:8031/puush/<br />
}}<br />
<br />
'''puu.sh''' is a file sharing service that was created in 2010.<br />
<br />
== Image expiry ==<br />
<br />
Early on June 7th, 2013, the following email was sent out to users:<br />
<br />
<blockquote><br />
Hey guys,<br />
<br />
We're making some important changes to puush and want to inform you of how it will affect our service.<br />
<br />
When we first conceived puush in 2010, we wanted to create a straightforward way to help us quickly share what was on our screens. Soon after, we extended puush to allow us to throw small files around too. Since then, we’ve seen a massive uptake and tremendous support from our users. The problem is that a tremendous majority of puushes aren’t being accessed again after 24 hours - in fact, only 10% of puushes are accessed after a month.<br />
<br />
puush to us is a quick way to share things. puush is not a data warehouse.<br />
<br />
We do not wish to become a file locker, file storage or backup service. There are plenty of other solutions out there that do a much better job of this (e.g. Dropbox), so what we want to do is this:<br />
<br />
* Remove the 200mb storage limit for free users<br />
* Stop offering permanent storage, and files will expire after not being accessed for:<br />
** Free users: 1 month<br />
** Pro users: up to 6 months<br />
* Offer an optional Dropbox “sync” for pro users (i.e. automatically save a copy to dropbox)<br />
<br />
How this will affect you after the 1st of August 2013:<br />
<br />
* You will no longer have an account storage limits. Feel free to puush as much as you want!<br />
* We are going to start expiring files. At this point, any files which haven't been recently viewed by anyone will be automatically deleted after 1 month, or up to 6 months for pro users.<br />
* If you wish to grab a copy of your files before this begins, you can download an archive from your My Account page (Account -> Settings -> Pools -> Export).<br />
<br />
As an example, if you have puush'd images which are being used on a forum, as long as that thread is visited at least once a month (or up to 6 months as a pro user) your files will *always be accessible*.<br />
</blockquote><br />
<br />
This notice is also visible on the puu.sh site, where it was announced even earlier.<br />
<br />
== How to Help ==<br />
<br />
If you are comfortable running scripts manually (i.e., outside the Warrior) go to the [https://github.com/ArchiveTeam/puush-grab GitHub repo] for information how to run the scripts.<br />
<br />
== Where can I find a file? ==<br />
<br />
If you know the item ID, go the the [https://archive.org/web/web.php Wayback Machine] and enter the URL as <code><nowiki>http://puu.sh/XXXXX</nowiki></code> without any filename extension. The Wayback Machine treats the URL as case-insensitive so you may need to explore which URL is the one you are looking for. <br />
<br />
If the Puush is private, it is unlikely archived as we do not guess the access code (the bunch of characters after the item ID). You can, however, use wildcards as a way of browsing the Wayback Machine. [http://web.archive.org/web/*/http://puu.sh/1at* Here's an example].<br />
<br />
== Archives ==<br />
<br />
Archives are uploaded to the [http://archive.org/details/archiveteam_puush Archive Team Puush collection]. These are the original WARC files. They are 10GB in size instead of the typical 50GB because the project is staged on cloud hosting with small disk space.<br />
<br />
== Tracker information ==<br />
<br />
* The tracker and rsync target is being run by [[User:Chfoo]].<br />
* On 2013-08-22, Redis was unable to background save due to failed <code>fork()</code>.<br />
* On 2013-08-27, an attempt was made to clear out the tracker log. Redis crashed.<br />
<br />
=== Logs ===<br />
<br />
* Archived logs (IP address and username scrubbed):<br />
** [https://dl.dropboxusercontent.com/u/672132/archiveteam/puush_log_2013-08.tar.xz puush_log_2013-08.tar.xz]<br />
** 2013-09: todo<br />
* Daily done set dump: [https://dl.dropboxusercontent.com/u/672132/archiveteam/puush_done.txt.xz puush_done.txt.xz]<br />
** Archive: [https://dl.dropboxusercontent.com/u/672132/archiveteam/puush_done-20130827.txt.xz puush_done-20130827.txt.xz]<br />
<br />
=== Ranges ===<br />
<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Date Loaded<br />
! Start (Base 10)<br />
! End (Base 10)<br />
! Alphabet<br />
! Notes<br />
|-<br />
|2013-08-06<br />
| 0 (0)<br />
| 3UXX3 (51607749)<br />
| Legacy<br />
| At most 10 URLs per item<br />
|-<br />
|2013-08-27<br />
| 10 (62)<br />
| 3UXX3 (51607749)<br />
| Legacy<br />
| At most 13 URLs per item (unlucky 13)<br />
|-<br />
| 2013-09-08<br />
| 3UXX4 (51607750)<br />
| 49999 (61285459)<br />
| Legacy<br />
| At most 13 URLs per item<br />
|-<br />
| 2013-09-13<br />
| 4999a (61285460)<br />
| 4mPOO (64547754)<br />
| Puush<br />
| At most 13 URLs per item<br />
|-<br />
| 2013-09-15<br />
| 4mPOP (64547755)<br />
| 4rrrr (65645689)<br />
| Puush<br />
| At most 13 URLs per item<br />
|-<br />
| 2013-09-16<br />
| 4rrrs (65645690)<br />
| 4sQ00 (65978416)<br />
| Puush<br />
| At most 13 URLs per item<br />
|-<br />
| ∞<br />
| 4sQ01 (65978417)<br />
| ∞<br />
| Puush<br />
| At most 13 URLs per item. Auto-queues using a script that checks Twitter.<br />
|}<br />
<br />
== Ideas ==<br />
<br />
* Keep accessing each and every file - likely unsustainable in the long run in the event that expiry times are shortened<br />
* Grab everything - the site appears to use incremental images IDs<br />
<br />
=== Shortcode Stats ===<br />
<br />
<pre><br />
Number of shortcodes: 526<br />
Number of string lengths: 3<br />
3 5 0.951%<br />
4 125 23.764%<br />
5 396 75.285%<br />
Number of unique characters: 62<br />
Number of characters used: 2495<br />
0 24 0.962%<br />
1 155 6.212%<br />
2 234 9.379%<br />
3 121 4.850%<br />
4 24 0.962%<br />
5 45 1.804%<br />
6 26 1.042%<br />
7 37 1.483%<br />
8 25 1.002%<br />
9 34 1.363%<br />
A 46 1.844%<br />
B 37 1.483%<br />
C 46 1.844%<br />
D 38 1.523%<br />
E 36 1.443%<br />
F 42 1.683%<br />
G 33 1.323%<br />
H 31 1.242%<br />
I 37 1.483%<br />
J 32 1.283%<br />
K 38 1.523%<br />
L 35 1.403%<br />
M 28 1.122%<br />
N 39 1.563%<br />
O 31 1.242%<br />
P 44 1.764%<br />
Q 28 1.122%<br />
R 36 1.443%<br />
S 31 1.242%<br />
T 26 1.042%<br />
U 29 1.162%<br />
V 32 1.283%<br />
W 45 1.804%<br />
X 30 1.202%<br />
Y 29 1.162%<br />
Z 30 1.202%<br />
a 34 1.363%<br />
b 39 1.563%<br />
c 32 1.283%<br />
d 46 1.844%<br />
e 27 1.082%<br />
f 30 1.202%<br />
g 39 1.563%<br />
h 38 1.523%<br />
i 30 1.202%<br />
j 34 1.363%<br />
k 24 0.962%<br />
l 29 1.162%<br />
m 40 1.603%<br />
n 40 1.603%<br />
o 38 1.523%<br />
p 25 1.002%<br />
q 26 1.042%<br />
r 34 1.363%<br />
s 23 0.922%<br />
t 45 1.804%<br />
u 36 1.443%<br />
v 27 1.082%<br />
w 32 1.283%<br />
x 45 1.804%<br />
y 26 1.042%<br />
z 22 0.882%<br />
</pre><br />
<br />
=== How many items are there? ===<br />
<br />
<blockquote><br />
&lt;chfoo&gt; [...] using the decentralized script i wrote, i've grabbed [randomly] 3824 items (totalling 785M) out of 6409 requests (a 60% hit rate at a max id of "40000" or 59,105,344).<br />
so, in theory, there's 35,463,206 items based on this sample and max id.<br />
</blockquote><br />
<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hosting services]]</div>Dnovahttps://wiki.archiveteam.org/index.php?title=ArchiveTeam_Warrior&diff=16704ArchiveTeam Warrior2013-05-16T09:12:38Z<p>Dnova: /* I need to disconnect my internet / reboot my PC but I don't want to loose work */</p>
<hr />
<div>[[Image:Archive_team.png|100px|left]]<br />
[[Image:Warrior-vm-screenshot.png|right]]<br />
[[Image:Warrior-web-screenshot.png|right]]<br />
<br />
The ArchiveTeam Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!<br />
<br />
The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the [[Tracker]].<br />
<br />
The warrior runs on Windows, OS X and Linux. You’ll need [https://www.virtualbox.org/ VirtualBox] (recommended), VMware or a similar program to run the virtual machine.<br />
<br />
Instructions for VirtualBox:<br />
<ol><br />
<li>Download the [http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova appliance] (174MB).</li><br />
<li>In VirtualBox, click File > Import Appliance and open the file.</li><br />
<li>Start the virtual machine. It will fetch the latest updates and will eventually tell you to start your web browser.</li><br />
</ol><br />
<br />
Once you’ve started your warrior:<br />
<ol><br />
<li>Go to http://localhost:8001/ and check the Settings page.</li><br />
<li>Choose a username — we’ll show your progress on the leaderboard.</li><br />
<li>Go to the All projects tab and pick a project to work on. Even better: select ArchiveTeam’s Choice to let your warrior work on the most urgent project.</li><br />
</ol><br />
<br />
<br />
<br />
<br />
----<br />
<br />
<br />
==Warrior FAQ==<br />
<br />
===Help! The warrior is eating all my bandwidth!===<br />
<br />
You can limit the warriors bandwidth quite easily for virtualbox as long as you are running a relatively recent version.<br />
<br />
<pre>VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3</pre> will limit the warrior instance called archiveteam-warrior-2 (The default name of the warrior vm currently) to 3Mb/s. Adjust as needed.<br />
<br />
In the latest version of VirtualBox on Windows, the syntax appears to have changed. The correct command now seems to be:<br />
<br />
<pre>VBoxManage bandwidthctl archiveteam-warrior-2 add netlimit --type network --limit 3</pre><br />
<br />
=== I turned my warrior off, will those tasks be lost? ===<br />
<br />
If you've killed your warrior instances then the work your warrior did has been lost, however the tasks will be returned to the pool after a period of time. If you want you can alert the admins via IRC of whats happened, and they can clear the claims your username may of made however this isn't very important on most projects.<br />
<br />
<br />
=== I need to disconnect my internet / reboot my PC but I don't want to lose work ===<br />
<br />
If you pause/suspend the warrior instance, most projects will allow resuming of work in progress when you unsuspend the warrior instance.<br />
<br />
=== I told the warrior to shutdown from the interface but nothing has changed! what gives? ===<br />
<br />
The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away; go ahead, your progress will be lost however the jobs will eventually cycle out to another user.<br />
<br />
== Projects ==<br />
<br />
Previous and current warrior projects:<br />
<br />
{| class="wikitable"<br />
! Project !! Status !! Began !! Finished !! Result !! Archive Location<br />
|-<br />
| MobileMe || '''Archive Posted''' || April 3, 2012 || Aug 8, 2012 || Success || <br />
[http://archive.org/details/archiveteam-mobileme-hero archive] [http://archive.org/details/archiveteam-mobileme-index index] [http://archive.org/download/archiveteam-mobileme-index/mobileme-20120817.html user lookup]<br />
|-<br />
| Fortune City || '''Archive Posted''' || April 4, 2012 || April 11, 2012 || Partial Success || [http://archive.org/details/archiveteam-fortunecity archive] [http://archive.org/download/test-memac-index-test/fortunecity.html user lookup]<br />
|-<br />
| Tabblo || '''Archive Posted''' || May 23, 2012 || May 26, 2012 || Success || [http://archive.org/details/tabblo-archive archive] [http://archive.org/download/test-memac-index-test/tabblo.html user lookup]<br />
|-<br />
| PicPlz || '''Archive Posted''' || June 3, 2012 || June 15, 2012 || || [http://archive.org/details/archiveteam-picplz archive] [http://archive.org/details/archiveteam-picplz-index index] [http://archive.org/download/archiveteam-picplz-index/picplz-20120823.html user lookup]<br />
|-<br />
| Tumblr (test project) || '''Archive Posted''' || August 9, 2012 || August 19, 2012 || || [http://archive.org/details/archiveteam-tumblr-test archive (tar)] [http://archive.org/details/archiveteam-tumblr-test-warc archive (warc)]<br />
|-<br />
| Cinch.FM || '''Archive Posted''' || August 20, 2012 || August 22, 2012 || Success || [http://archive.org/details/archiveteam-cinch archive]<br />
|-<br />
| City Of Heroes || '''Archive Posted''' || September 3, 2012 || December 1, 2012 || Success || [http://archive.org/details/archiveteam-city-of-heroes-www www] [http://archive.org/details/archiveteam-city-of-heroes-main forums] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-1 1] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-2 2] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-3 3] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-4 4] [http://archive.org/details/archiveteam-city-of-heroes-forums-megawarc-5 5]<br />
|-<br />
| Webshots || '''Archive Posted''' || October 4, 2012 || November 18, 2012 || || [http://archive.org/download/webshots-freeze-frame-index/index.html index]<br />
|-<br />
| BT Internet || '''Archive Posted''' || October 10, 2012 || November 2, 2012 || Success || [http://archive.org/details/archiveteam-btinternet archive]<br />
|-<br />
| Daily Booth || '''Archive Posted''' || November 19, 2012 || December 29, 2012 || || [http://archive.org/details/archiveteam_dailybooth archive] [http://archive.org/download/dailybooth-freeze-frame-index/index.html lookup]<br />
|-<br />
| Github || '''Archive Posted''' || December 13, 2012 || December 17, 2012 || Success || [http://archive.org/details/github-downloads-2012-12 archive] [http://archive.org/details/archiveteam-github-repository-index-201212 index]<br />
|-<br />
| Yahoo Blogs (Vietnamese) || '''Archive Posted''' || January 8, 2013 || January 19, 2013 || || [http://archive.org/details/yahoo_korea_blogs archive]<br />
|-<br />
| weblog.nl || '''Archive Posted''' || January 19, 2013 || February 2, 2013 || || [http://archive.org/details/archiveteam_weblognl archive] [http://archive.org/download/archiveteam_weblognl-index/ lookup]<br />
|-<br />
| URLTeam || Active || || || || [http://urlte.am/releases/2013-01-02/urlteam.torrent latest]<br />
|-<br />
| Punchfork || '''Archive Posted''' || January 11, 2013 || March 6, 2013 || || [http://archive.org/details/archiveteam_punchfork archive] [http://archive.org/download/archiveteam_punchfork_index/ user lookup]<br />
|-<br />
| Xanga || Downloads Paused || January 22, 2013 || February 16, 2013 || || [http://archive.org/details/archiveteam_xanga archive] [http://archive.org/download/archiveteam_xanga_index/ user lookup] [http://archive.org/details/archiveteam-xanga-userlist-20130142 user list]<br />
|-<br />
| Posterous || Active || February 23, 2013 || || || [http://archive.org/details/archiveteam_posterous archive]<br />
|-<br />
| Storylane || Downloads Finished || March 8, 2013 || March 15, 2013 || ||<br />
|-<br />
| Yahoo! Messages || Downloads Finished || March 20, 2013 || March 31, 2013 || || [http://archive.org/details/archiveteam_yahoo_messages archive]<br />
|-<br />
| Formspring || Active || March 24, 2013 || || || [http://archive.org/details/archiveteam_formspring archive]<br />
|-<br />
| Yahoo Upcoming || '''Archive Posted''' || April 20, 2013 || April 25, 2013 || || [http://archive.org/details/archiveteam archive]<br />
|-<br />
| Streetfiles.org || Downloads Finished || April 28, 2013 || April 30, 2013 || Partial || [http://archive.org/details/archiveteam archive]<br />
|}<br />
<br />
=== Status ===<br />
:; In Development : a future project<br />
:; Active : start up a Warrior and join the fun; this one is in progress right now<br />
:; Downloads Finished : we've finished downloading the data<br />
:; Archived : the collected data has been properly archived<br />
:; Archive Posted : the archive is available for download<br />
<br />
=== Result ===<br />
:; Success : downloaded all of the data and posted the archive publicly<br />
:; Qualified Success : either we couldn't get all of the data, or the archive can't be made public<br />
:; Failure : the site closed before we could download anything<br />
<br />
== Testing pre-production code ==<br />
<br />
(Don't do this unless you really need or want to.) If you are developing a warrior script, you can test it by switching your warrior from the <code>production</code> branch to the <code>master</code> branch.<br />
<br />
<ol><br />
<li>Start the warrior.</li><br />
<li>Press Alt+F2 and log in with username <code>root</code> and password <code>archiveteam</code>.</li><br />
<li><code>cd /home/warrior/warrior-code</code></li><br />
<li><code>sudo -u warrior git checkout master</code></li><br />
<li><code>reboot</code></li><br />
</ol><br />
<br />
By the same route you can return your warrior to the <code>production</code> branch.</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Kickstarter_Prizes&diff=8270Kickstarter Prizes2012-06-29T19:54:27Z<p>Dnova: </p>
<hr />
<div>Just dreaming up some ideas for the kickstarter prizes.<br />
<br />
Prizes where something is sent to you<br />
<br />
* Makerbot<br />
** Something made with a makerbot<br />
* Internet Archive has a lot of swag like Hats, Shirts, Glasses.<br />
* Single issues of low-value but interesting vintage computing/hacking/electronics/science magazines<br />
* Physical personalized IA "library cards"<br />
<br />
Prizes of a more Ethereal nature<br />
<br />
* Phone call with person of prominence<br />
* Tutorial/instruction/Q&A with person<br />
* Name a server at the IA<br />
<br />
Prizes where "you have to travel to get them".<br />
<br />
* Tour of the Internet Archive<br />
* Dinner with people</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Kickstarter_Prizes&diff=8265Kickstarter Prizes2012-06-29T19:14:34Z<p>Dnova: </p>
<hr />
<div>Just dreaming up some ideas for the kickstarter prizes.<br />
<br />
Prizes where something is sent to you<br />
<br />
* Makerbot<br />
* Internet Archive has a lot of swag like Hats, Shirts, Glasses.<br />
* Single issues of low-value but interesting vintage computing/hacking/electronics/science magazines<br />
<br />
Prizes where "you have to travel to get them".<br />
<br />
* Tour of the Internet Archive<br />
* Dinner with people</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7917Main Page2012-05-27T07:00:21Z<p>Dnova: Undo revision 7916 by Dnova (talk)</p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things, but do NOT talk about Richard M. Stallman. <br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''235 terabytes''' and counting!<br />
* '''[[Tabblo]]''' - A site where users told stories with pictures. Closing May 30, 2012.<br />
** Very easy to use scripts are in place. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': Tabblo.com announces its closure scheduled for May 30th, giving its userbase just ten days of warning. Archive Team is on the case. <br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7916Main Page2012-05-27T06:58:20Z<p>Dnova: Reverted edits by Dnova (talk) to last revision by Emijrp</p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''80 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7915Main Page2012-05-27T06:46:25Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things, but do NOT talk about Richard M. Stallman. <br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''235 terabytes''' and counting!<br />
* '''[[Tabblo]]''' - A site where users told stories with pictures. Closing May 30, 2012.<br />
** Very easy to use scripts are in place. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': Tabblo.com announces its closure scheduled for May 30th, giving its userbase just ten days of warning. Archive Team is on the case. <br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7878Main Page2012-05-25T05:48:07Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''235 terabytes''' and counting!<br />
* '''[[Tabblo]]''' - A site where users told stories with pictures. Closing May 30, 2012.<br />
** Very easy to use scripts are in place. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': Tabblo.com announces its closure scheduled for May 30th, giving its userbase just ten days of warning. Archive Team is on the case. <br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7779Main Page2012-05-17T02:19:32Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''220 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Closed on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7687Main Page2012-05-08T00:07:44Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''205 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Closed on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7686Main Page2012-05-07T23:57:52Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''205 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7680Main Page2012-05-07T18:29:18Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012. <br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''205 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7679Main Page2012-05-07T16:45:09Z<p>Dnova: new mobileme tally and shutdown date</p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on September 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''205 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7646Main Page2012-04-24T17:53:38Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''160 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7630Main Page2012-04-18T17:14:12Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''135 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7624Main Page2012-04-16T12:16:09Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''125 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7609Main Page2012-04-11T06:12:33Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''110 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7566Main Page2012-04-10T01:38:09Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''100 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7491Main Page2012-04-02T21:19:56Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''80 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7487Main Page2012-04-02T11:14:05Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''80 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Facebook&diff=7486Facebook2012-04-02T11:12:33Z<p>Dnova: /* Download Your Data From Facebook */</p>
<hr />
<div>{{Infobox project<br />
| title = Facebook<br />
| image = Facebooklogo.png<br />
| description = Facebook Logo<br />
| URL = http://facebook.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Facebook''' is a social networking site whose popularity has exploded in recent years. As of February 2012, there are more than ''845 million'' active users of the site. Facebook hosts untold billions of users' photos, videos, thoughts, conversations, and other content. <br />
<br />
The judicious user will have a well-designed backup plan for all that content that they retain full control over, but it is a reasonable assumption that the majority of users rely totally on Facebook to safeguard their data. '''This is a mistake.''' <br />
<br />
It might seem completely unthinkable that a site as massive and as popular as Facebook could ever disappear, taking your data with it. The reality is that websites, even hugely popular ones, can decline in popularity over time and eventually go away, taking your data with them with little or no warning. We've seen it happen.<br />
<br />
While Facebook may not be in any immediate danger, you should consider that the data you put on Facebook may be immensely important to you in 10 or 20 years, similar to your family's photo albums. Facebook could be long dead by then. Start planning for this eventuality right now. <br />
<br />
== Download Your Data From Facebook ==<br />
<br />
Facebook has created a tool to download an entire archive of your Facebook account. This includes all of your own photos and videos, chat conversations, messages, status updates and wall posts. It does NOT include photos and videos belonging to other people even if you are tagged in them, so do keep that in mind. <br />
<br />
To create your archive, click the little down arrow next to your name in the upper right area of the page and go to "account settings". You should then see a screen like the one below: <br />
[[File:Fbdownload.png | center]]<br />
<br />
The next screen will explain what's going on. Press "Start my Archive" and you will be presented with a popup telling you that this will take some time - around one hour is not unheard of. Press Start again and Facebook will generate the file for you. This may indeed take several minutes. In the mean time you can continue using Facebook as usual. They will email you when the archive is ready for download. <br />
<br />
Your email will contain a link to download your archive. Follow that link and enter your Facebook password to continue. The next page presents you with a download button and an estimate of the archive's size. Download that somewhere convenient for you. '''This file contains highly personal and potentially sensitive information''' so keep it safe! You may wish to encrypt it with a password with a free tool like [http://www.axantum.com/axcrypt/ Axcrypt]. The easiest way to browse the information is to extract the contents of the zip file, and then open the index.html file with your browser of choice. From there you can look at your profile, your wall posts, photos and videos, and private messages. <br />
<br />
Note that as of April, 2012, this download tool seems to have some bugs -- in my tests it failed to completely back up all of my conversations, for example. It's better than nothing but for now at least I don't trust that it is perfect. <br />
<br />
=== Former unofficial Backup tools ===<br />
<br />
* [http://on10.net/Blogs/larry/export-facebook-to-excel-with-friendcsv/ FriendCSV] exports your contacts to CSV files.<br />
<br />
* [http://www.vincentcheung.ca/facedown/ Facedown] downloads photo albums from Facebook. <br />
<br />
This leaves wall, profile information and the plethora of Facebook apps out in the cold. Perhaps a backup app could be written for Facebook from within Facebook using the applications framework.<br />
<br />
== Vital Signs ==<br />
<br />
Currently stable.<br />
<br />
== External links ==<br />
* http://facebook.com<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Facebook&diff=7485Facebook2012-04-02T11:11:48Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Facebook<br />
| image = Facebooklogo.png<br />
| description = Facebook Logo<br />
| URL = http://facebook.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Facebook''' is a social networking site whose popularity has exploded in recent years. As of February 2012, there are more than ''845 million'' active users of the site. Facebook hosts untold billions of users' photos, videos, thoughts, conversations, and other content. <br />
<br />
The judicious user will have a well-designed backup plan for all that content that they retain full control over, but it is a reasonable assumption that the majority of users rely totally on Facebook to safeguard their data. '''This is a mistake.''' <br />
<br />
It might seem completely unthinkable that a site as massive and as popular as Facebook could ever disappear, taking your data with it. The reality is that websites, even hugely popular ones, can decline in popularity over time and eventually go away, taking your data with them with little or no warning. We've seen it happen.<br />
<br />
While Facebook may not be in any immediate danger, you should consider that the data you put on Facebook may be immensely important to you in 10 or 20 years, similar to your family's photo albums. Facebook could be long dead by then. Start planning for this eventuality right now. <br />
<br />
== Download Your Data From Facebook ==<br />
<br />
Facebook has created a tool to download an entire archive of your Facebook account. This includes all of your own photos and videos, chat conversations, messages, status updates and wall posts. It does NOT include photos and videos belonging to other people even if you are tagged in them, so do keep that in mind. <br />
<br />
To create your archive, click the little down arrow next to your name in the upper right area of the page and go to "account settings". You should then see a screen like the one below: <br />
[[File:Fbdownload.png | center]]<br />
<br />
The next screen will explain what's going on. Press "Start my Archive" and you will be presented with a popup telling you that this will take some time - around one hour is not unheard of. Press Start again and Facebook will generate the file for you. This may indeed take several minutes. In the mean time you can continue using Facebook as usual. They will email you when the archive is ready for download. <br />
<br />
Your email will contain a link to download your archive. Follow that link and enter your Facebook password to continue. The next page presents you with a download button and an estimate of the archive's size. Download that somewhere convenient for you. '''This file contains highly personal and potentially sensitive information''' so keep it safe! You may with to encrypt it with a password with a free tool like [http://www.axantum.com/axcrypt/ Axcrypt]. The easiest way to browse the information is to extract the contents of the zip file, and then open the index.html file with your browser of choice. From there you can look at your profile, your wall posts, photos and videos, and private messages. <br />
<br />
Note that as of April, 2012, this download tool seems to have some bugs -- in my tests it failed to completely back up all of my conversations, for example. It's better than nothing but for now at least I don't trust that it is perfect. <br />
<br />
=== Former unofficial Backup tools ===<br />
<br />
* [http://on10.net/Blogs/larry/export-facebook-to-excel-with-friendcsv/ FriendCSV] exports your contacts to CSV files.<br />
<br />
* [http://www.vincentcheung.ca/facedown/ Facedown] downloads photo albums from Facebook. <br />
<br />
This leaves wall, profile information and the plethora of Facebook apps out in the cold. Perhaps a backup app could be written for Facebook from within Facebook using the applications framework.<br />
<br />
== Vital Signs ==<br />
<br />
Currently stable.<br />
<br />
== External links ==<br />
* http://facebook.com<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Facebook&diff=7484Facebook2012-04-02T09:28:24Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Facebook<br />
| image = Facebooklogo.png<br />
| description = Facebook Logo<br />
| URL = http://facebook.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Facebook''' is a social networking site whose popularity has exploded in recent years. As of February 2012, there are more than ''845 million'' active users of the site. Facebook hosts untold billions of users' photos, videos, thoughts, conversations, and other content. <br />
<br />
The judicious user will have a well-designed backup plan for all that content that they retain full control over, but it is a reasonable assumption that the majority of users rely totally on Facebook to safeguard their data. '''This is a mistake.''' <br />
<br />
It might seem completely unthinkable that a site as massive and as popular as Facebook could ever disappear, taking your data with it. The reality is that websites, even hugely popular ones, can decline in popularity over time and eventually go away, taking your data with them with little or no warning. We've seen it happen.<br />
<br />
While Facebook may not be in any immediate danger, you should consider that the data you put on Facebook may be immensely important to you in 10 or 20 years, similar to your family's photo albums. Facebook could be long dead by then. Start planning for this eventuality right now. <br />
<br />
== Download Your Data From Facebook ==<br />
<br />
Facebook has created a tool to download an entire archive of your Facebook account. This includes all of your own photos and videos, chat conversations, messages, status updates and wall posts. It does NOT include photos and videos belonging to other people even if you are tagged in them, so do keep that in mind. <br />
<br />
To create your archive, click the little down arrow next to your name in the upper right area of the page and go to "account settings". You should then see a screen like the one below: <br />
[[File:Fbdownload.png | center]]<br />
<br />
The next screen will explain what's going on. Press "Start my Archive" and you will be presented with a popup telling you that this might take a few minutes. Press Start again and Facebook will generate the file for you. This may indeed take several minutes. In the mean time you can continue using Facebook as usual. They will email you when the archive is ready for download. <br />
<br />
more here soon.<br />
<br />
=== Former unofficial Backup tools ===<br />
<br />
* [http://on10.net/Blogs/larry/export-facebook-to-excel-with-friendcsv/ FriendCSV] exports your contacts to CSV files.<br />
<br />
* [http://www.vincentcheung.ca/facedown/ Facedown] downloads photo albums from Facebook. <br />
<br />
This leaves wall, profile information and the plethora of Facebook apps out in the cold. Perhaps a backup app could be written for Facebook from within Facebook using the applications framework.<br />
<br />
== Vital Signs ==<br />
<br />
Currently stable.<br />
<br />
== External links ==<br />
* http://facebook.com<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Facebook&diff=7483Facebook2012-04-02T09:27:52Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Facebook<br />
| image = Facebooklogo.png<br />
| description = Facebook Logo<br />
| URL = http://facebook.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Facebook''' is a social networking site whose popularity has exploded in recent years. As of February 2012, there are more than ''845 million'' active users of the site. Facebook hosts untold billions of users' photos, videos, thoughts, conversations, and other content. <br />
<br />
The judicious user will have a well-designed backup plan for all that content that they retain full control over, but it is a reasonable assumption that the majority of users rely totally on Facebook to safeguard their data. ''''This is a mistake.'''' <br />
<br />
It might seem completely unthinkable that a site as massive and as popular as Facebook could ever disappear, taking your data with it. The reality is that websites, even hugely popular ones, can decline in popularity over time and eventually go away, taking your data with them with little or no warning. We've seen it happen.<br />
<br />
While Facebook may not be in any immediate danger, you should consider that the data you put on Facebook may be immensely important to you in 10 or 20 years, similar to your family's photo albums. Facebook could be long dead by then. Start planning for this eventuality right now. <br />
<br />
== Download Your Data From Facebook ==<br />
<br />
Facebook has created a tool to download an entire archive of your Facebook account. This includes all of your own photos and videos, chat conversations, messages, status updates and wall posts. It does NOT include photos and videos belonging to other people even if you are tagged in them, so do keep that in mind. <br />
<br />
To create your archive, click the little down arrow next to your name in the upper right area of the page and go to "account settings". You should then see a screen like the one below: <br />
[[File:Fbdownload.png | center]]<br />
<br />
The next screen will explain what's going on. Press "Start my Archive" and you will be presented with a popup telling you that this might take a few minutes. Press Start again and Facebook will generate the file for you. This may indeed take several minutes. In the mean time you can continue using Facebook as usual. They will email you when the archive is ready for download. <br />
<br />
more here soon.<br />
<br />
=== Former unofficial Backup tools ===<br />
<br />
* [http://on10.net/Blogs/larry/export-facebook-to-excel-with-friendcsv/ FriendCSV] exports your contacts to CSV files.<br />
<br />
* [http://www.vincentcheung.ca/facedown/ Facedown] downloads photo albums from Facebook. <br />
<br />
This leaves wall, profile information and the plethora of Facebook apps out in the cold. Perhaps a backup app could be written for Facebook from within Facebook using the applications framework.<br />
<br />
== Vital Signs ==<br />
<br />
Currently stable.<br />
<br />
== External links ==<br />
* http://facebook.com<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=File:Fbdownload.png&diff=7482File:Fbdownload.png2012-04-02T09:20:03Z<p>Dnova: </p>
<hr />
<div></div>Dnovahttps://wiki.archiveteam.org/index.php?title=File:Facebooklogo.png&diff=7481File:Facebooklogo.png2012-04-02T08:54:39Z<p>Dnova: </p>
<hr />
<div></div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7480Main Page2012-04-02T08:35:19Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** More downloaders are needed for this project!<br />
** Track the download progress at http://memac.heroku.com/ '''80 terabytes''' and counting!<br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** Project essentially complete!<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=FortuneCity&diff=7444FortuneCity2012-03-19T20:09:32Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = FortuneCity<br />
| image = Fortunecity 1304522840896.png<br />
| description = <br />
| URL = {{url|1=http://www.fortunecity.com/}}<br />
| project_status = {{closing}} April 30th, 2012<br />
| tracker = http://focity.heroku.com/<br />
| source = https://github.com/ArchiveTeam/fortunecity<br />
| archiving_status = {{nosavedyet}}<br />
| irc = fortuneshitty<br />
}}<br />
<br />
<br />
[[File:FortuneCityNotice.png | thumb | Shutdown notice]]<br />
== How to help ==<br />
<br />
To run one or more FortuneCity downloaders, you'll need to be on Linux or a Linux-like OS.<br />
<br />
Setting up:<br />
<pre><br />
git clone git://github.com/ArchiveTeam/fortunecity.git<br />
cd fortunecity<br />
./get-wget-warc.sh<br />
</pre><br />
<br />
Check the output: does it say wget is successfully compiled? Great!<br />
<br />
Now you can run a download client. Choose a nickname and run:<br />
<pre><br />
./seesaw.sh YOURNICK<br />
</pre><br />
<br />
The script should start downloading and uploading. If it works, feel free to run a few more!<br />
<br />
<b>Please do not run more than 10 downloaders.</b> It won't work. We need many individuals making small contributions.<br />
<br />
If you want to stop, you can just kill the scripts. To stop gracefully,<br />
<pre><br />
touch STOP<br />
</pre><br />
and the script will stop after the current user.<br />
<br />
There is no need to run upload-finished.sh. seesaw.sh will automatically upload your finished users to us.<br />
<br />
== Common Problems ==<br />
<br />
=== configure: error: --with-ssl was given, but GNUTLS is not available. ===<br />
<br />
You don't have the proper headers available to be able to compile wget-warc. It should be fairly easy to fix though. If you're using a Debian or Ubuntu-based Linux distribution:<br />
<br />
apt-get install libgnutls-dev<br />
<br />
If you're using a Fedora distribution:<br />
<br />
yum install gnutls-devel<br />
<br />
If you're using FreeBSD :<br />
cd /usr/ports/security/gnutls/<br />
make all install clean<br />
<br />
If you're using something else you'll just have to poke around in the documentation and figure it out :)<br />
<br />
== Status ==<br />
<br />
Archiving is well under way. Check the tracker for an up to date status report.<br />
<br />
== Site structure ==<br />
<br />
FortuneCities operated in multiple tlds: <code>com co.uk es it se</code><br />
<br />
(Once operated but died: <code>de fr nl cn.fortunecity.com</code>)<br />
<br />
Main website:<br />
<ul><br />
<li><nowiki>http://www.fortunecity.${tld}/</nowiki></li><br />
</ul><br />
<br />
Username-based sites:<br />
<ul><br />
<li><nowiki>http://members.fortunecity.${tld}/</nowiki></li><br />
<li><nowiki>http://${username}.fortunecity.${tld}/${username}/</nowiki></li><br />
</ul><br />
<br />
Area/street-based sites:<br />
<ul><br />
<li><nowiki>http://www.fortunecity.${tld}/${area}/${street}/${number}/</nowiki></li><br />
<li>on .com also: <nowiki>http://${area}.fortunecity.com/${street}/${number}/</nowiki></li><br />
</ul><br />
<br />
Range of numbers: unsure. At least includes 0 up to 2600. Note: the street names are case-sensitive.<br />
<br />
=== Categories, areas, streets ===<br />
<br />
Before 2001, FortuneCity used a category-area-street-based system.<br />
<br />
The categories were the same for all tlds:<br />
<pre><br />
artsandhumanities<br />
businessandcareers<br />
computersandinternet<br />
entertainment<br />
homeandfamily<br />
international<br />
peopleandchat<br />
recreationandsports<br />
scifiandparanormal<br />
travelandtransport<br />
</pre><br />
<br />
(This data is on https://github.com/ArchiveTeam/fortunecity/tree/master/explore)<br />
<br />
The areas for each category can be found via Wayback (1999 or 2000): <nowiki>http://www.fortunecity.${tld}/explore/category/${category}.html</nowiki><br />
<br />
The streets for each area, too.<br />
For com in 1999: <nowiki>http://${area}.fortunecity.com/</nowiki><br />
For others in 2000: <nowiki>http://www.fortunecity.${tld}/explore/area/${area}.html</nowiki><br />
<br />
The streets for the following .com areas couldn't be found via Wayback:<br />
<pre>challenge cratervalley lavender littleitaly marina olympia skyscraper tatooine victorian</pre><br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=File:FortuneCityNotice.png&diff=7443File:FortuneCityNotice.png2012-03-19T20:05:33Z<p>Dnova: FortuneCity.com's notice of switching off its free services. Screen grabbed 2012-03-19.</p>
<hr />
<div>FortuneCity.com's notice of switching off its free services. Screen grabbed 2012-03-19.</div>Dnovahttps://wiki.archiveteam.org/index.php?title=FanFiction.Net&diff=7441FanFiction.Net2012-03-19T19:59:29Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = FanFiction.Net<br />
| image = FanFiction.Net homepage.png<br />
| URL = {{url|1=http://www.fanfiction.net/}}<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| irc = fanfriction<br />
}}<br />
<br />
'''FanFiction.Net''' is the self-proclaimed "world's largest fanfiction archive and forum" and is one of if not the largest site hosting [https://en.wikipedia.org/wiki/Fan_fiction fanfiction]. Has a sister site Fictionpress, much smaller, identical layout.<br />
<br />
It is in the process of being preemptively archived.<br />
<br />
== Current status ==<br />
<br />
Profile IDs are being scraped from indices; downloading will then occur on a per-profile basis. Drop by the IRC channel (#fanfriction EFNet) for more information.</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7440Main Page2012-03-19T19:57:59Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Track the download progress at http://focity.heroku.com/<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Track the download progress at http://memac.heroku.com/ 50 terabytes and counting! <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''March, 2012''': [http://www.FortuneCity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7439Main Page2012-03-19T19:43:51Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Track the download progress at http://focity.heroku.com/<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''March, 2012''': [http://www.FortuneCity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7438Main Page2012-03-19T19:42:35Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[FortuneCity]]''' - A "free" webhost founded in 1997 with around 1 million users. Going non-free/closing on April 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Track the download progress at http://focity.heroku.com/<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''March, 2012''': [FortuneCity.com] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=7181Main Page2012-01-25T20:50:33Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[Splinder]]''' - An Italian blog/media site with over 1.3 million users, closing on Jan 31, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Most of the profiles are now rescued, with some difficult (large) ones remaining.<br />
** '''Anyone with Splinder profiles should rsync them immediately! Contact SketchCow for a slot.'''<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** It has been reported that Mobileme is going somewhat slowly. At the time of writing, an estimated 2% of the site is rescued. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Splinder&diff=7036Splinder2011-12-16T12:31:15Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Splinder<br />
| image = Splinder homepage.png<br />
| URL = {{url|1=http://www.splinder.com/}}<br />
{{url|1=http://www.us.splinder.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
Splinder.com has been the main blog hosting company in Italy for a while (see [[Wikipedia:it:Splinder]]). It was founded in 2001 and it hosts about half a million blogs and over 55 millions pages.<br />
Since 8th November, 2011 a warning on the home page says that no new PRO accounts are being created since the 1st June. The company has confirmed that the website will close on the 24th.[http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/comment/65653358#cid-65653358]<br />
<br />
'''Update''': the company issued an official statement saying that the closure will happen on January 31, 2012.[http://www.procionegobbo.it/blog/2011/11/splinder-chiude/] According to our tracker, we have downloaded or assigned all users.<br />
<br />
== Upload status ==<br />
<br />
For the time being: please ignore any errors caused by special characters in usernames (| ^ etc.), we'll get those profiles later.<br />
<br />
{| class="wikitable" style="text-align: left"<br />
|+ Uploaded to batcave?<br />
! scope="col" colspan="3" | Phase 1<br />
|-<br />
! scope="col" | Downloader || Count || Status<br />
|-<br />
| closure || 254869 || <br />
|-<br />
| kenneth || 206696 || <br />
|-<br />
| ndurner || 177665 || '''Uploaded'''<br />
|-<br />
| Nemo || 111340 || '''Uploaded''' with errors, some incomplete<br />
|-<br />
| donbex || 71562 || <br />
|-<br />
| dnova || 68740 || '''Uploaded'''; all special char profiles fixed<br />
|-<br />
| underscor || 58774 || <br />
|-<br />
| Wyatt || 54525 || <br />
|-<br />
| crawl336 || 45785 || <br />
|-<br />
| Angra || 35752 || <br />
|-<br />
| cameron_d || 26357 || <br />
|-<br />
| db48x || 23120 || '''Uploaded''', three profiles not uploaded<br />
|-<br />
| yipdw || 18789 || Most uploaded, re-doing some larger blogs with errors<br />
|-<br />
| crawl338 || 17783 || <br />
|-<br />
| crawl337 || 16784 || <br />
|-<br />
| crawl334 || 15897 || <br />
|-<br />
| Coderjoe || 13749 || '''Uploaded''' from both machines all profiles which did not have .incomplete (fixed some backslashes in profile directory names)<br />
|-<br />
| bsmith093 || 13194 || <br />
|-<br />
| DoubleJ || 10301 || '''Uploaded''' from all machines w/ no errors<br />
|-<br />
| crawl339 || 9026 || <br />
|-<br />
| anonymous || 8653 || <br />
|-<br />
| kennethreitz || 8287 || <br />
|-<br />
| alard || 7299 || '''Uploaded''', one error<br />
|-<br />
| dashcloud || 6803 || <br />
|-<br />
| crawl333 || 6292 || <br />
|-<br />
| spirit || 6282 || <br />
|-<br />
| crawl335 || 6106 || <br />
|-<br />
| Paradoks || 5890 || '''Uploaded''', but still downloading it:scatto, which includes files.splinder.com (Among others).<br />
|-<br />
| koon || 5029 || <br />
|-<br />
| chronomex || 4913 || '''Partially Uploaded''', moved house and has yet to get computers running<br />
|-<br />
| VMB || 4620 || <br />
|-<br />
| shoop || 4461 || <br />
|-<br />
| marceloantonio1 || 2927 || '''Uploaded'''<br />
|-<br />
| undercave || 2508 || <br />
|-<br />
| DFJustin || 2456 || '''Uploaded''', may have errors<br />
|-<br />
| proub || 1178 || <br />
|-<br />
| Hydriz || 842 || '''Uploaded'''<br />
|-<br />
| canUbeatclosure || 669 || <br />
|-<br />
| tef || 440 || <br />
|-<br />
| arima || 347 || <br />
|-<br />
| NotGLaDOS || 259 || <br />
|-<br />
| sarpedon || 105 || <br />
|-<br />
| pberry || 89 || <br />
|-<br />
| Wyattq || 84 || <br />
|-<br />
| soultcer || 74 || <br />
|-<br />
| Konklone || 56 || <br />
|-<br />
| PepsiMax || 12 || <br />
|-<br />
| mareloantonio1 || 10 || '''Uploaded'''<br />
|-<br />
| hrbrmstr || 9 || <br />
|-<br />
| sente || 7 || <br />
|-<br />
| rebiolca || 6 || <br />
|-<br />
| 2 || 5 || <br />
|-<br />
| Wyatt-B || 3 || <br />
|-<br />
| Wyatt-A || 2 || <br />
|-<br />
| asdf || 2 || <br />
|}<br />
<br />
== How to help archiving ==<br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget.<br />
<br />
# Get the code: <code>git clone https://github.com/ArchiveTeam/splinder-grab</code><br />
# Get and compile the latest version of wget-warc: <code>./get-wget-warc.sh</code><br />
# Think of a nickname for yourself (preferably use your IRC name).<br />
# Run the download script:<br />
#* To run a single downloader, run <code>./dld-client.sh "<YOURNICK>"</code>.<br />
#* To run multiple downloaders (and thus use your bandwidth more efficiently), do either:<br />
#** simply run as many copies of <code>dld-client.sh</code> as you like<br />
#** run <code>./dld-streamer.sh <YOURNICK> <N></code>, where <N> is the number of concurrent downloads you want.<br />
# To stop the script gracefully, run <code>touch STOP</code> in the script's working directory. It will finish the current task and stop.<br />
<br />
===Notes===<br />
<br />
* Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the <code>gnutls-devel</code> or <code>gnutls-dev</code> package with your favorite package manager.<br />
* Downloading one user's data can take between 10 seconds and several days.<br />
* The data for one user is equally varied, from a few kB to several GB.<br />
* The downloaded data will be saved in the <code>./data/</code> subdirectory.<br />
* Download speeds from splinder.com are not that high (servers may be particularly overloaded during European day because of additional traffic of people exporting their blogs). You can run multiple clients to speed things up.<br />
<br />
===Errors===<br />
* There are some problems with subdomains containing dashes[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626472]: if they fail on your machine (reported: wget compiled with +nls), for now stop and restart the script, someone else will do those users (although they seem to fail in part anyway). <br />
*: Some such users: macrisa, -Maryanne-, it:SalixArdens, it:MCris, it:7lilla, it:thepinkpenguin, it:bimbambolina, it:lazzaretta, it:Hedwige, it:N4m3L3Ss, it:Barbabietole_Azzurre, it:celebrolesa2212, it:buongiono.mattina, it:DarkExtra, it:-slash-, it:marlene1, it:Ohina, us:XyKy, us:Naluf, it:elisablu, it:*JuLs*, it:RikuSan, it:Nasutina<br />
* There are also some problems with upload-finished.sh because of some inconsistencies in escaping special characters, e.g. [http://p.defau.lt/?NITL0SVf4K4QFRgCKmlWIg]; remember not to delete those directories without fixing/uploading them.<br />
* The script looks for errors in English, so it's better if you wget-warc to use English. Otherwise, errors like [http://toolserver.org/~nemobis/wget-phase-1.log these] won't be detected and the script will mark as done users which failed. Please run <code>fix-dld.sh</code> to fix those users, after changing <code>if grep -q "ERROR 50"</code> to your localised output.<br />
<br />
==== splinder_noconn.html errors ====<br />
<br />
Please check your wget logs for presence of a file named <code>splinder_noconn.html</code>. This is a transient maintenance page that has appeared in some downloads, but cannot be detected as an error by wget, because the page isn't returned with a status code indicating "an error occurred".<br />
<br />
Some examples:<br />
<br />
* https://gist.github.com/a15c7707ee666502a825<br />
* https://gist.github.com/0427b4ed12ae48f2fb5f<br />
* http://p.defau.lt/?sJOFev7prpKYpC_CYRnqrg<br />
<br />
These accounts may have to be re-fetched.<br />
<br />
== Uploading your data ==<br />
<br />
* To upload the data you've downloaded, first contact SketchCow on IRC for an rsync slot. Once you have that you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh batcave.textfiles.com::YOURNICK/splinder/</code><br />
* The script will upload only completed users. To check how much space the incomplete users are taking, without killing your disk, you can use <code>ionice -c 3 find -name .incomplete -printf "%h\0" | ionice -c 3 du -mcs --files0-from=-</code> in your <code>splinder-grab</code> directory.<br />
<br />
== Status ==<br />
<br />
There is a [http://splinder.heroku.com real-time dashboard] where you can check the progress.<br />
<br />
==External links==<br />
*http://www.splinder.com/<br />
*http://www.us.splinder.com/<br />
<br />
==Site structure==<br />
<br />
The users are identified by their usernames. Fortunately, the side provides a list of all users. Usernames are not case-sensitive, but there is a case preference.<br />
<br />
==Example URLs==<br />
User profile: <code><nowiki>http://www.splinder.com/profile/<<username>></nowiki></code><br />
<br />
<pre><br />
Example profile:<br />
http://www.splinder.com/profile/difficilifoglie<br />
<br />
View count on profile page:<br />
http://www.splinder.com/ajax.php?type=counter&op=profile&profile=Romanticdreamer<br />
<br />
Example of friends list paging: (160 per page, starting at 0)<br />
http://www.splinder.com/profile/difficilifoglie/friends<br />
http://www.splinder.com/profile/difficilifoglie/friends/160<br />
<br />
Inverse friends (probably also paged):<br />
http://www.splinder.com/profile/difficilifoglie/friendof<br />
<br />
Link to blog: (note: not always the same as the username)<br />
http://difficilifoglie.splinder.com/<br />
http://learnonline.splinder.com/<br />
<br />
Photo:<br />
http://www.splinder.com/profile/difficilifoglie/photo<br />
http://www.splinder.com/mediablog/wondermum/media/24544805<br />
<br />
Video:<br />
http://www.splinder.com/profile/wondermum/video<br />
http://www.splinder.com/mediablog/wondermum/media/25737390<br />
<br />
Audio:<br />
Not a separate user feed, but only accessible via mediablog<br />
http://www.splinder.com/mediablog/learnonline/media/25727030<br />
<br />
Mediablog: combination of the audio + video + photo lists<br />
http://www.splinder.com/mediablog/learnonline<br />
(16 per page, starting at 0)<br />
http://www.splinder.com/mediablog/learnonline/16<br />
<br />
Mediablog has PowerPoint, Word files:<br />
http://www.splinder.com/mediablog/learnonline/media/25641346<br />
http://www.splinder.com/mediablog/learnonline/media/25546305<br />
http://www.splinder.com/mediablog/learnonline/media/21901634<br />
http://www.splinder.com/mediablog/learnonline/media/24875290<br />
<br />
User avatar: grab url from profile page<br />
<br />
Photo file: grab url from photo page and remove _medium to get original picture<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_square.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_medium.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d.jpg<br />
older photos do not have this structure, different ids for each size:<br />
http://www.splinder.com/mediablog/babboramo/media/17359043<br />
http://files.splinder.com/13b615ccbd75354ee4e0d973da66c2b2.jpeg<br />
http://files.splinder.com/770d7b9ecac27083d9204af327ebe743.jpeg<br />
<br />
PowerPoint, Word files: grab url from media page<br />
http://files.splinder.com/46dbf3d5a0b12e490f81ddb8444b4fad.ppt<br />
http://files.splinder.com/ab3ce16c850ac530351d9df0937152c7.pdf<br />
<br />
Video items: grab url from media page<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_square.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_thumbnail.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_small.flv<br />
note: square, thumbnail, small is not always available, check flashvars for vidpath, imgpath<br />
http://www.splinder.com/mediablog/babboramo/media/13131052<br />
http://files.splinder.com/e067653e1532e55ee208605fcb84361a.flv<br />
http://files.splinder.com/f56060b7fef139f03b72e06ca9fcba55.jpeg<br />
<br />
Audio items: grab url from media page, flashvars<br />
sometimes there is a _thumbnail, remove that to get a better quality<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef_thumbnail.mp3<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef.mp3<br />
<br />
Comments on blog posts:<br />
http://www.splinder.com/myblog/comment/list/25742358<br />
on some, but not on all blogs, those comments are also included in the blog page<br />
http://dal15al25.splinder.com/post/25740180<br />
http://soluzioni.splinder.com/post/2802227/blog-pager-su-piu-righe<br />
http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/<br />
http://civati.splinder.com/post/25742977<br />
pagination: see media comments<br />
<br />
Comments on media items:<br />
http://www.splinder.com/media/comment/list/21254470<br />
http://www.splinder.com/media/comment/list/21254470?from=50<br />
(50 per page, starting at 0)<br />
number of comments is on the media page<br />
http://www.splinder.com/mediablog/danspo/media/21254470<br />
<br />
<br />
Blog urls:<br />
the blogs have content from their own subdomain, but also from<br />
files.splinder.com<br />
www.splinder.com/misc/ (topbar css, gif)<br />
www.splinder.com/includes/ (js)<br />
www.splinder.com/modules/service_links/ (images)<br />
syndication.splinder.com<br />
<br />
links to www.splinder.com that should NOT be followed:<br />
/myblog/<br />
/users/<br />
/media/<br />
/node/<br />
/profile/<br />
/mediablog/<br />
/community/<br />
/user/<br />
/night/<br />
/home/<br />
/mysearch/<br />
/online/<br />
/trackback/<br />
<br />
</pre><br />
<br />
wget-warc --mirror --page-requisites --span-hosts --domains=learnonline.splinder.com,files.splinder.com,www.splinder.com,syndication.splinder.com --exclude-directories="/users,/media,/node,/profile,/mediablog,/community,/user,/night,/home,/mysearch,/online,/trackback,/myblog/post,/myblog/posts,/myblog/tags,/myblog/tag,/myblog/view,/myblog/latest,/myblog/subscribe" -nv -o wget.log "http://learnonline.splinder.com/"<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Splinder&diff=7029Splinder2011-12-15T12:50:53Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Splinder<br />
| image = Splinder homepage.png<br />
| URL = {{url|1=http://www.splinder.com/}}<br />
{{url|1=http://www.us.splinder.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
Splinder.com has been the main blog hosting company in Italy for a while (see [[Wikipedia:it:Splinder]]). It was founded in 2001 and it hosts about half a million blogs and over 55 millions pages.<br />
Since 8th November, 2011 a warning on the home page says that no new PRO accounts are being created since the 1st June. The company has confirmed that the website will close on the 24th.[http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/comment/65653358#cid-65653358]<br />
<br />
'''Update''': the company issued an official statement saying that the closure will happen on January 31, 2012.[http://www.procionegobbo.it/blog/2011/11/splinder-chiude/] According to our tracker, we have downloaded or assigned all users.<br />
<br />
== Upload status ==<br />
<br />
For the time being: please ignore any errors caused by special characters in usernames (| ^ etc.), we'll get those profiles later.<br />
<br />
{| class="wikitable" style="text-align: left"<br />
|+ Uploaded to batcave?<br />
! scope="col" colspan="3" | Phase 1<br />
|-<br />
! scope="col" | Downloader || Count || Status<br />
|-<br />
| closure || 254869 || <br />
|-<br />
| kenneth || 206696 || <br />
|-<br />
| ndurner || 177665 || '''Uploaded'''<br />
|-<br />
| Nemo || 111340 || '''Uploaded''' with errors, some incomplete<br />
|-<br />
| donbex || 71562 || <br />
|-<br />
| dnova || 68740 || '''Uploaded'''; all special char profiles fixed; still downloading 1 more huge profile ("perijulka")<br />
|-<br />
| underscor || 58774 || <br />
|-<br />
| Wyatt || 54525 || <br />
|-<br />
| crawl336 || 45785 || <br />
|-<br />
| Angra || 35752 || <br />
|-<br />
| cameron_d || 26357 || <br />
|-<br />
| db48x || 23120 || '''Uploaded''', three profiles not uploaded<br />
|-<br />
| yipdw || 18789 || Most uploaded, re-doing some larger blogs with errors<br />
|-<br />
| crawl338 || 17783 || <br />
|-<br />
| crawl337 || 16784 || <br />
|-<br />
| crawl334 || 15897 || <br />
|-<br />
| Coderjoe || 13749 || '''Uploaded''' from both machines all profiles which did not have .incomplete (fixed some backslashes in profile directory names)<br />
|-<br />
| bsmith093 || 13194 || <br />
|-<br />
| DoubleJ || 10301 || '''Uploaded''' from all machines w/ no errors<br />
|-<br />
| crawl339 || 9026 || <br />
|-<br />
| anonymous || 8653 || <br />
|-<br />
| kennethreitz || 8287 || <br />
|-<br />
| alard || 7299 || '''Uploaded''', one error<br />
|-<br />
| dashcloud || 6803 || <br />
|-<br />
| crawl333 || 6292 || <br />
|-<br />
| spirit || 6282 || <br />
|-<br />
| crawl335 || 6106 || <br />
|-<br />
| Paradoks || 5890 || '''Uploaded''', but still downloading it:scatto, which includes files.splinder.com (Among others).<br />
|-<br />
| koon || 5029 || <br />
|-<br />
| chronomex || 4913 || '''Partially Uploaded''', moved house and has yet to get computers running<br />
|-<br />
| VMB || 4620 || <br />
|-<br />
| shoop || 4461 || <br />
|-<br />
| marceloantonio1 || 2927 || '''Uploaded'''<br />
|-<br />
| undercave || 2508 || <br />
|-<br />
| DFJustin || 2456 || '''Uploaded''', may have errors<br />
|-<br />
| proub || 1178 || <br />
|-<br />
| Hydriz || 842 || '''Uploaded'''<br />
|-<br />
| canUbeatclosure || 669 || <br />
|-<br />
| tef || 440 || <br />
|-<br />
| arima || 347 || <br />
|-<br />
| NotGLaDOS || 259 || <br />
|-<br />
| sarpedon || 105 || <br />
|-<br />
| pberry || 89 || <br />
|-<br />
| Wyattq || 84 || <br />
|-<br />
| soultcer || 74 || <br />
|-<br />
| Konklone || 56 || <br />
|-<br />
| PepsiMax || 12 || <br />
|-<br />
| mareloantonio1 || 10 || '''Uploaded'''<br />
|-<br />
| hrbrmstr || 9 || <br />
|-<br />
| sente || 7 || <br />
|-<br />
| rebiolca || 6 || <br />
|-<br />
| 2 || 5 || <br />
|-<br />
| Wyatt-B || 3 || <br />
|-<br />
| Wyatt-A || 2 || <br />
|-<br />
| asdf || 2 || <br />
|}<br />
<br />
== How to help archiving ==<br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget.<br />
<br />
# Get the code: <code>git clone https://github.com/ArchiveTeam/splinder-grab</code><br />
# Get and compile the latest version of wget-warc: <code>./get-wget-warc.sh</code><br />
# Think of a nickname for yourself (preferably use your IRC name).<br />
# Run the download script:<br />
#* To run a single downloader, run <code>./dld-client.sh "<YOURNICK>"</code>.<br />
#* To run multiple downloaders (and thus use your bandwidth more efficiently), do either:<br />
#** simply run as many copies of <code>dld-client.sh</code> as you like<br />
#** run <code>./dld-streamer.sh <YOURNICK> <N></code>, where <N> is the number of concurrent downloads you want.<br />
# To stop the script gracefully, run <code>touch STOP</code> in the script's working directory. It will finish the current task and stop.<br />
<br />
===Notes===<br />
<br />
* Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the <code>gnutls-devel</code> or <code>gnutls-dev</code> package with your favorite package manager.<br />
* Downloading one user's data can take between 10 seconds and several days.<br />
* The data for one user is equally varied, from a few kB to several GB.<br />
* The downloaded data will be saved in the <code>./data/</code> subdirectory.<br />
* Download speeds from splinder.com are not that high (servers may be particularly overloaded during European day because of additional traffic of people exporting their blogs). You can run multiple clients to speed things up.<br />
<br />
===Errors===<br />
* There are some problems with subdomains containing dashes[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626472]: if they fail on your machine (reported: wget compiled with +nls), for now stop and restart the script, someone else will do those users (although they seem to fail in part anyway). <br />
*: Some such users: macrisa, -Maryanne-, it:SalixArdens, it:MCris, it:7lilla, it:thepinkpenguin, it:bimbambolina, it:lazzaretta, it:Hedwige, it:N4m3L3Ss, it:Barbabietole_Azzurre, it:celebrolesa2212, it:buongiono.mattina, it:DarkExtra, it:-slash-, it:marlene1, it:Ohina, us:XyKy, us:Naluf, it:elisablu, it:*JuLs*, it:RikuSan, it:Nasutina<br />
* There are also some problems with upload-finished.sh because of some inconsistencies in escaping special characters, e.g. [http://p.defau.lt/?NITL0SVf4K4QFRgCKmlWIg]; remember not to delete those directories without fixing/uploading them.<br />
* The script looks for errors in English, so it's better if you wget-warc to use English. Otherwise, errors like [http://toolserver.org/~nemobis/wget-phase-1.log these] won't be detected and the script will mark as done users which failed. Please run <code>fix-dld.sh</code> to fix those users, after changing <code>if grep -q "ERROR 50"</code> to your localised output.<br />
<br />
==== splinder_noconn.html errors ====<br />
<br />
Please check your wget logs for presence of a file named <code>splinder_noconn.html</code>. This is a transient maintenance page that has appeared in some downloads, but cannot be detected as an error by wget, because the page isn't returned with a status code indicating "an error occurred".<br />
<br />
Some examples:<br />
<br />
* https://gist.github.com/a15c7707ee666502a825<br />
* https://gist.github.com/0427b4ed12ae48f2fb5f<br />
* http://p.defau.lt/?sJOFev7prpKYpC_CYRnqrg<br />
<br />
These accounts may have to be re-fetched.<br />
<br />
== Uploading your data ==<br />
<br />
* To upload the data you've downloaded, first contact SketchCow on IRC for an rsync slot. Once you have that you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh batcave.textfiles.com::YOURNICK/splinder/</code><br />
* The script will upload only completed users. To check how much space the incomplete users are taking, without killing your disk, you can use <code>ionice -c 3 find -name .incomplete -printf "%h\0" | ionice -c 3 du -mcs --files0-from=-</code> in your <code>splinder-grab</code> directory.<br />
<br />
== Status ==<br />
<br />
There is a [http://splinder.heroku.com real-time dashboard] where you can check the progress.<br />
<br />
==External links==<br />
*http://www.splinder.com/<br />
*http://www.us.splinder.com/<br />
<br />
==Site structure==<br />
<br />
The users are identified by their usernames. Fortunately, the side provides a list of all users. Usernames are not case-sensitive, but there is a case preference.<br />
<br />
==Example URLs==<br />
User profile: <code><nowiki>http://www.splinder.com/profile/<<username>></nowiki></code><br />
<br />
<pre><br />
Example profile:<br />
http://www.splinder.com/profile/difficilifoglie<br />
<br />
View count on profile page:<br />
http://www.splinder.com/ajax.php?type=counter&op=profile&profile=Romanticdreamer<br />
<br />
Example of friends list paging: (160 per page, starting at 0)<br />
http://www.splinder.com/profile/difficilifoglie/friends<br />
http://www.splinder.com/profile/difficilifoglie/friends/160<br />
<br />
Inverse friends (probably also paged):<br />
http://www.splinder.com/profile/difficilifoglie/friendof<br />
<br />
Link to blog: (note: not always the same as the username)<br />
http://difficilifoglie.splinder.com/<br />
http://learnonline.splinder.com/<br />
<br />
Photo:<br />
http://www.splinder.com/profile/difficilifoglie/photo<br />
http://www.splinder.com/mediablog/wondermum/media/24544805<br />
<br />
Video:<br />
http://www.splinder.com/profile/wondermum/video<br />
http://www.splinder.com/mediablog/wondermum/media/25737390<br />
<br />
Audio:<br />
Not a separate user feed, but only accessible via mediablog<br />
http://www.splinder.com/mediablog/learnonline/media/25727030<br />
<br />
Mediablog: combination of the audio + video + photo lists<br />
http://www.splinder.com/mediablog/learnonline<br />
(16 per page, starting at 0)<br />
http://www.splinder.com/mediablog/learnonline/16<br />
<br />
Mediablog has PowerPoint, Word files:<br />
http://www.splinder.com/mediablog/learnonline/media/25641346<br />
http://www.splinder.com/mediablog/learnonline/media/25546305<br />
http://www.splinder.com/mediablog/learnonline/media/21901634<br />
http://www.splinder.com/mediablog/learnonline/media/24875290<br />
<br />
User avatar: grab url from profile page<br />
<br />
Photo file: grab url from photo page and remove _medium to get original picture<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_square.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_medium.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d.jpg<br />
older photos do not have this structure, different ids for each size:<br />
http://www.splinder.com/mediablog/babboramo/media/17359043<br />
http://files.splinder.com/13b615ccbd75354ee4e0d973da66c2b2.jpeg<br />
http://files.splinder.com/770d7b9ecac27083d9204af327ebe743.jpeg<br />
<br />
PowerPoint, Word files: grab url from media page<br />
http://files.splinder.com/46dbf3d5a0b12e490f81ddb8444b4fad.ppt<br />
http://files.splinder.com/ab3ce16c850ac530351d9df0937152c7.pdf<br />
<br />
Video items: grab url from media page<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_square.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_thumbnail.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_small.flv<br />
note: square, thumbnail, small is not always available, check flashvars for vidpath, imgpath<br />
http://www.splinder.com/mediablog/babboramo/media/13131052<br />
http://files.splinder.com/e067653e1532e55ee208605fcb84361a.flv<br />
http://files.splinder.com/f56060b7fef139f03b72e06ca9fcba55.jpeg<br />
<br />
Audio items: grab url from media page, flashvars<br />
sometimes there is a _thumbnail, remove that to get a better quality<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef_thumbnail.mp3<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef.mp3<br />
<br />
Comments on blog posts:<br />
http://www.splinder.com/myblog/comment/list/25742358<br />
on some, but not on all blogs, those comments are also included in the blog page<br />
http://dal15al25.splinder.com/post/25740180<br />
http://soluzioni.splinder.com/post/2802227/blog-pager-su-piu-righe<br />
http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/<br />
http://civati.splinder.com/post/25742977<br />
pagination: see media comments<br />
<br />
Comments on media items:<br />
http://www.splinder.com/media/comment/list/21254470<br />
http://www.splinder.com/media/comment/list/21254470?from=50<br />
(50 per page, starting at 0)<br />
number of comments is on the media page<br />
http://www.splinder.com/mediablog/danspo/media/21254470<br />
<br />
<br />
Blog urls:<br />
the blogs have content from their own subdomain, but also from<br />
files.splinder.com<br />
www.splinder.com/misc/ (topbar css, gif)<br />
www.splinder.com/includes/ (js)<br />
www.splinder.com/modules/service_links/ (images)<br />
syndication.splinder.com<br />
<br />
links to www.splinder.com that should NOT be followed:<br />
/myblog/<br />
/users/<br />
/media/<br />
/node/<br />
/profile/<br />
/mediablog/<br />
/community/<br />
/user/<br />
/night/<br />
/home/<br />
/mysearch/<br />
/online/<br />
/trackback/<br />
<br />
</pre><br />
<br />
wget-warc --mirror --page-requisites --span-hosts --domains=learnonline.splinder.com,files.splinder.com,www.splinder.com,syndication.splinder.com --exclude-directories="/users,/media,/node,/profile,/mediablog,/community,/user,/night,/home,/mysearch,/online,/trackback,/myblog/post,/myblog/posts,/myblog/tags,/myblog/tag,/myblog/view,/myblog/latest,/myblog/subscribe" -nv -o wget.log "http://learnonline.splinder.com/"<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Splinder&diff=7028Splinder2011-12-15T12:50:20Z<p>Dnova: </p>
<hr />
<div>{{Infobox project<br />
| title = Splinder<br />
| image = Splinder homepage.png<br />
| URL = {{url|1=http://www.splinder.com/}}<br />
{{url|1=http://www.us.splinder.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
Splinder.com has been the main blog hosting company in Italy for a while (see [[Wikipedia:it:Splinder]]). It was founded in 2001 and it hosts about half a million blogs and over 55 millions pages.<br />
Since 8th November, 2011 a warning on the home page says that no new PRO accounts are being created since the 1st June. The company has confirmed that the website will close on the 24th.[http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/comment/65653358#cid-65653358]<br />
<br />
'''Update''': the company issued an official statement saying that the closure will happen on January 31, 2012.[http://www.procionegobbo.it/blog/2011/11/splinder-chiude/] According to our tracker, we have downloaded or assigned all users.<br />
<br />
== Upload status ==<br />
<br />
For the time being: please ignore any errors caused by special characters in usernames (| ^ etc.), we'll get those profiles later.<br />
<br />
{| class="wikitable" style="text-align: left"<br />
|+ Uploaded to batcave?<br />
! scope="col" colspan="3" | Phase 1<br />
|-<br />
! scope="col" | Downloader || Count || Status<br />
|-<br />
| closure || 254869 || <br />
|-<br />
| kenneth || 206696 || <br />
|-<br />
| ndurner || 177665 || '''Uploaded'''<br />
|-<br />
| Nemo || 111340 || '''Uploaded''' with errors, some incomplete<br />
|-<br />
| donbex || 71562 || <br />
|-<br />
| dnova || 68740 || '''Uploaded'''; still downloading 1 more huge profile ("perijulka")<br />
|-<br />
| underscor || 58774 || <br />
|-<br />
| Wyatt || 54525 || <br />
|-<br />
| crawl336 || 45785 || <br />
|-<br />
| Angra || 35752 || <br />
|-<br />
| cameron_d || 26357 || <br />
|-<br />
| db48x || 23120 || '''Uploaded''', three profiles not uploaded<br />
|-<br />
| yipdw || 18789 || Most uploaded, re-doing some larger blogs with errors<br />
|-<br />
| crawl338 || 17783 || <br />
|-<br />
| crawl337 || 16784 || <br />
|-<br />
| crawl334 || 15897 || <br />
|-<br />
| Coderjoe || 13749 || '''Uploaded''' from both machines all profiles which did not have .incomplete (fixed some backslashes in profile directory names)<br />
|-<br />
| bsmith093 || 13194 || <br />
|-<br />
| DoubleJ || 10301 || '''Uploaded''' from all machines w/ no errors<br />
|-<br />
| crawl339 || 9026 || <br />
|-<br />
| anonymous || 8653 || <br />
|-<br />
| kennethreitz || 8287 || <br />
|-<br />
| alard || 7299 || '''Uploaded''', one error<br />
|-<br />
| dashcloud || 6803 || <br />
|-<br />
| crawl333 || 6292 || <br />
|-<br />
| spirit || 6282 || <br />
|-<br />
| crawl335 || 6106 || <br />
|-<br />
| Paradoks || 5890 || '''Uploaded''', but still downloading it:scatto, which includes files.splinder.com (Among others).<br />
|-<br />
| koon || 5029 || <br />
|-<br />
| chronomex || 4913 || '''Partially Uploaded''', moved house and has yet to get computers running<br />
|-<br />
| VMB || 4620 || <br />
|-<br />
| shoop || 4461 || <br />
|-<br />
| marceloantonio1 || 2927 || '''Uploaded'''<br />
|-<br />
| undercave || 2508 || <br />
|-<br />
| DFJustin || 2456 || '''Uploaded''', may have errors<br />
|-<br />
| proub || 1178 || <br />
|-<br />
| Hydriz || 842 || '''Uploaded'''<br />
|-<br />
| canUbeatclosure || 669 || <br />
|-<br />
| tef || 440 || <br />
|-<br />
| arima || 347 || <br />
|-<br />
| NotGLaDOS || 259 || <br />
|-<br />
| sarpedon || 105 || <br />
|-<br />
| pberry || 89 || <br />
|-<br />
| Wyattq || 84 || <br />
|-<br />
| soultcer || 74 || <br />
|-<br />
| Konklone || 56 || <br />
|-<br />
| PepsiMax || 12 || <br />
|-<br />
| mareloantonio1 || 10 || '''Uploaded'''<br />
|-<br />
| hrbrmstr || 9 || <br />
|-<br />
| sente || 7 || <br />
|-<br />
| rebiolca || 6 || <br />
|-<br />
| 2 || 5 || <br />
|-<br />
| Wyatt-B || 3 || <br />
|-<br />
| Wyatt-A || 2 || <br />
|-<br />
| asdf || 2 || <br />
|}<br />
<br />
== How to help archiving ==<br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget.<br />
<br />
# Get the code: <code>git clone https://github.com/ArchiveTeam/splinder-grab</code><br />
# Get and compile the latest version of wget-warc: <code>./get-wget-warc.sh</code><br />
# Think of a nickname for yourself (preferably use your IRC name).<br />
# Run the download script:<br />
#* To run a single downloader, run <code>./dld-client.sh "<YOURNICK>"</code>.<br />
#* To run multiple downloaders (and thus use your bandwidth more efficiently), do either:<br />
#** simply run as many copies of <code>dld-client.sh</code> as you like<br />
#** run <code>./dld-streamer.sh <YOURNICK> <N></code>, where <N> is the number of concurrent downloads you want.<br />
# To stop the script gracefully, run <code>touch STOP</code> in the script's working directory. It will finish the current task and stop.<br />
<br />
===Notes===<br />
<br />
* Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the <code>gnutls-devel</code> or <code>gnutls-dev</code> package with your favorite package manager.<br />
* Downloading one user's data can take between 10 seconds and several days.<br />
* The data for one user is equally varied, from a few kB to several GB.<br />
* The downloaded data will be saved in the <code>./data/</code> subdirectory.<br />
* Download speeds from splinder.com are not that high (servers may be particularly overloaded during European day because of additional traffic of people exporting their blogs). You can run multiple clients to speed things up.<br />
<br />
===Errors===<br />
* There are some problems with subdomains containing dashes[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626472]: if they fail on your machine (reported: wget compiled with +nls), for now stop and restart the script, someone else will do those users (although they seem to fail in part anyway). <br />
*: Some such users: macrisa, -Maryanne-, it:SalixArdens, it:MCris, it:7lilla, it:thepinkpenguin, it:bimbambolina, it:lazzaretta, it:Hedwige, it:N4m3L3Ss, it:Barbabietole_Azzurre, it:celebrolesa2212, it:buongiono.mattina, it:DarkExtra, it:-slash-, it:marlene1, it:Ohina, us:XyKy, us:Naluf, it:elisablu, it:*JuLs*, it:RikuSan, it:Nasutina<br />
* There are also some problems with upload-finished.sh because of some inconsistencies in escaping special characters, e.g. [http://p.defau.lt/?NITL0SVf4K4QFRgCKmlWIg]; remember not to delete those directories without fixing/uploading them.<br />
* The script looks for errors in English, so it's better if you wget-warc to use English. Otherwise, errors like [http://toolserver.org/~nemobis/wget-phase-1.log these] won't be detected and the script will mark as done users which failed. Please run <code>fix-dld.sh</code> to fix those users, after changing <code>if grep -q "ERROR 50"</code> to your localised output.<br />
<br />
==== splinder_noconn.html errors ====<br />
<br />
Please check your wget logs for presence of a file named <code>splinder_noconn.html</code>. This is a transient maintenance page that has appeared in some downloads, but cannot be detected as an error by wget, because the page isn't returned with a status code indicating "an error occurred".<br />
<br />
Some examples:<br />
<br />
* https://gist.github.com/a15c7707ee666502a825<br />
* https://gist.github.com/0427b4ed12ae48f2fb5f<br />
* http://p.defau.lt/?sJOFev7prpKYpC_CYRnqrg<br />
<br />
These accounts may have to be re-fetched.<br />
<br />
== Uploading your data ==<br />
<br />
* To upload the data you've downloaded, first contact SketchCow on IRC for an rsync slot. Once you have that you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh batcave.textfiles.com::YOURNICK/splinder/</code><br />
* The script will upload only completed users. To check how much space the incomplete users are taking, without killing your disk, you can use <code>ionice -c 3 find -name .incomplete -printf "%h\0" | ionice -c 3 du -mcs --files0-from=-</code> in your <code>splinder-grab</code> directory.<br />
<br />
== Status ==<br />
<br />
There is a [http://splinder.heroku.com real-time dashboard] where you can check the progress.<br />
<br />
==External links==<br />
*http://www.splinder.com/<br />
*http://www.us.splinder.com/<br />
<br />
==Site structure==<br />
<br />
The users are identified by their usernames. Fortunately, the side provides a list of all users. Usernames are not case-sensitive, but there is a case preference.<br />
<br />
==Example URLs==<br />
User profile: <code><nowiki>http://www.splinder.com/profile/<<username>></nowiki></code><br />
<br />
<pre><br />
Example profile:<br />
http://www.splinder.com/profile/difficilifoglie<br />
<br />
View count on profile page:<br />
http://www.splinder.com/ajax.php?type=counter&op=profile&profile=Romanticdreamer<br />
<br />
Example of friends list paging: (160 per page, starting at 0)<br />
http://www.splinder.com/profile/difficilifoglie/friends<br />
http://www.splinder.com/profile/difficilifoglie/friends/160<br />
<br />
Inverse friends (probably also paged):<br />
http://www.splinder.com/profile/difficilifoglie/friendof<br />
<br />
Link to blog: (note: not always the same as the username)<br />
http://difficilifoglie.splinder.com/<br />
http://learnonline.splinder.com/<br />
<br />
Photo:<br />
http://www.splinder.com/profile/difficilifoglie/photo<br />
http://www.splinder.com/mediablog/wondermum/media/24544805<br />
<br />
Video:<br />
http://www.splinder.com/profile/wondermum/video<br />
http://www.splinder.com/mediablog/wondermum/media/25737390<br />
<br />
Audio:<br />
Not a separate user feed, but only accessible via mediablog<br />
http://www.splinder.com/mediablog/learnonline/media/25727030<br />
<br />
Mediablog: combination of the audio + video + photo lists<br />
http://www.splinder.com/mediablog/learnonline<br />
(16 per page, starting at 0)<br />
http://www.splinder.com/mediablog/learnonline/16<br />
<br />
Mediablog has PowerPoint, Word files:<br />
http://www.splinder.com/mediablog/learnonline/media/25641346<br />
http://www.splinder.com/mediablog/learnonline/media/25546305<br />
http://www.splinder.com/mediablog/learnonline/media/21901634<br />
http://www.splinder.com/mediablog/learnonline/media/24875290<br />
<br />
User avatar: grab url from profile page<br />
<br />
Photo file: grab url from photo page and remove _medium to get original picture<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_square.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_medium.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d.jpg<br />
older photos do not have this structure, different ids for each size:<br />
http://www.splinder.com/mediablog/babboramo/media/17359043<br />
http://files.splinder.com/13b615ccbd75354ee4e0d973da66c2b2.jpeg<br />
http://files.splinder.com/770d7b9ecac27083d9204af327ebe743.jpeg<br />
<br />
PowerPoint, Word files: grab url from media page<br />
http://files.splinder.com/46dbf3d5a0b12e490f81ddb8444b4fad.ppt<br />
http://files.splinder.com/ab3ce16c850ac530351d9df0937152c7.pdf<br />
<br />
Video items: grab url from media page<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_square.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_thumbnail.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_small.flv<br />
note: square, thumbnail, small is not always available, check flashvars for vidpath, imgpath<br />
http://www.splinder.com/mediablog/babboramo/media/13131052<br />
http://files.splinder.com/e067653e1532e55ee208605fcb84361a.flv<br />
http://files.splinder.com/f56060b7fef139f03b72e06ca9fcba55.jpeg<br />
<br />
Audio items: grab url from media page, flashvars<br />
sometimes there is a _thumbnail, remove that to get a better quality<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef_thumbnail.mp3<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef.mp3<br />
<br />
Comments on blog posts:<br />
http://www.splinder.com/myblog/comment/list/25742358<br />
on some, but not on all blogs, those comments are also included in the blog page<br />
http://dal15al25.splinder.com/post/25740180<br />
http://soluzioni.splinder.com/post/2802227/blog-pager-su-piu-righe<br />
http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/<br />
http://civati.splinder.com/post/25742977<br />
pagination: see media comments<br />
<br />
Comments on media items:<br />
http://www.splinder.com/media/comment/list/21254470<br />
http://www.splinder.com/media/comment/list/21254470?from=50<br />
(50 per page, starting at 0)<br />
number of comments is on the media page<br />
http://www.splinder.com/mediablog/danspo/media/21254470<br />
<br />
<br />
Blog urls:<br />
the blogs have content from their own subdomain, but also from<br />
files.splinder.com<br />
www.splinder.com/misc/ (topbar css, gif)<br />
www.splinder.com/includes/ (js)<br />
www.splinder.com/modules/service_links/ (images)<br />
syndication.splinder.com<br />
<br />
links to www.splinder.com that should NOT be followed:<br />
/myblog/<br />
/users/<br />
/media/<br />
/node/<br />
/profile/<br />
/mediablog/<br />
/community/<br />
/user/<br />
/night/<br />
/home/<br />
/mysearch/<br />
/online/<br />
/trackback/<br />
<br />
</pre><br />
<br />
wget-warc --mirror --page-requisites --span-hosts --domains=learnonline.splinder.com,files.splinder.com,www.splinder.com,syndication.splinder.com --exclude-directories="/users,/media,/node,/profile,/mediablog,/community,/user,/night,/home,/mysearch,/online,/trackback,/myblog/post,/myblog/posts,/myblog/tags,/myblog/tag,/myblog/view,/myblog/latest,/myblog/subscribe" -nv -o wget.log "http://learnonline.splinder.com/"<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=6973Main Page2011-12-08T13:12:34Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[Splinder]]''' - An Italian blog/media site with over 1.3 million users, closing on Jan 31, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Most of the profiles are now rescued, with some difficult (large) ones remaining.<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** It has been reported that Mobileme is going somewhat slowly. At the time of writing, an estimated 2% of the site is rescued. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Splinder&diff=6972Splinder2011-12-08T11:21:29Z<p>Dnova: /* Upload status */</p>
<hr />
<div>{{Infobox project<br />
| title = Splinder<br />
| image = Splinder homepage.png<br />
| URL = {{url|1=http://www.splinder.com/}}<br />
{{url|1=http://www.us.splinder.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
Splinder.com has been the main blog hosting company in Italy for a while (see [[Wikipedia:it:Splinder]]). It was founded in 2001 and it hosts about half a million blogs and over 55 millions pages.<br />
Since 8th November, 2011 a warning on the home page says that no new PRO accounts are being created since the 1st June. The company has confirmed that the website will close on the 24th.[http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/comment/65653358#cid-65653358]<br />
<br />
'''Update''': the company issued an official statement saying that the closure will happen on January 31, 2012.[http://www.procionegobbo.it/blog/2011/11/splinder-chiude/] According to our tracker, we have downloaded or assigned all users.<br />
<br />
== Upload status ==<br />
<br />
For the time being: please ignore any errors caused by special characters in usernames (| ^ etc.), we'll get those profiles later.<br />
<br />
{| class="wikitable" style="text-align: left"<br />
|+ Uploaded to batcave?<br />
! scope="col" colspan="3" | Phase 1<br />
|-<br />
! scope="col" | Downloader || Count || Status<br />
|-<br />
| closure || 254869 || <br />
|-<br />
| kenneth || 206696 || <br />
|-<br />
| ndurner || 177665 || <br />
|-<br />
| Nemo || 111340 || '''Uploaded''' with errors, some incomplete<br />
|-<br />
| donbex || 71562 || <br />
|-<br />
| dnova || 68620 || '''Uploaded'''; still downloading more<br />
|-<br />
| underscor || 58774 || <br />
|-<br />
| Wyatt || 54525 || <br />
|-<br />
| crawl336 || 45785 || <br />
|-<br />
| Angra || 35752 || <br />
|-<br />
| cameron_d || 26357 || <br />
|-<br />
| db48x || 23120 || '''Uploaded''', three profiles not uploaded<br />
|-<br />
| yipdw || 18789 || Most uploaded, re-doing some larger blogs with errors<br />
|-<br />
| crawl338 || 17783 || <br />
|-<br />
| crawl337 || 16784 || <br />
|-<br />
| crawl334 || 15897 || <br />
|-<br />
| Coderjoe || 13749 || <br />
|-<br />
| bsmith093 || 13194 || <br />
|-<br />
| DoubleJ || 10301 || '''Uploaded''' from all machines w/ no errors<br />
|-<br />
| crawl339 || 9026 || <br />
|-<br />
| anonymous || 8653 || <br />
|-<br />
| kennethreitz || 8287 || <br />
|-<br />
| alard || 7299 || '''Uploaded''', one error<br />
|-<br />
| dashcloud || 6803 || <br />
|-<br />
| crawl333 || 6292 || <br />
|-<br />
| spirit || 6282 || <br />
|-<br />
| crawl335 || 6106 || <br />
|-<br />
| Paradoks || 5890 || '''Uploaded''', but still downloading 4 profiles<br />
|-<br />
| koon || 5029 || <br />
|-<br />
| chronomex || 4913 || '''Partially Uploaded''', moved house and has yet to get computers running<br />
|-<br />
| VMB || 4620 || <br />
|-<br />
| shoop || 4461 || <br />
|-<br />
| marceloantonio1 || 2927 || '''Uploaded'''<br />
|-<br />
| undercave || 2508 || <br />
|-<br />
| DFJustin || 2456 || <br />
|-<br />
| proub || 1178 || <br />
|-<br />
| Hydriz || 842 || '''Uploaded'''<br />
|-<br />
| canUbeatclosure || 669 || <br />
|-<br />
| tef || 440 || <br />
|-<br />
| arima || 347 || <br />
|-<br />
| NotGLaDOS || 259 || <br />
|-<br />
| sarpedon || 105 || <br />
|-<br />
| pberry || 89 || <br />
|-<br />
| Wyattq || 84 || <br />
|-<br />
| soultcer || 74 || <br />
|-<br />
| Konklone || 56 || <br />
|-<br />
| PepsiMax || 12 || <br />
|-<br />
| mareloantonio1 || 10 || '''Uploaded'''<br />
|-<br />
| hrbrmstr || 9 || <br />
|-<br />
| sente || 7 || <br />
|-<br />
| rebiolca || 6 || <br />
|-<br />
| 2 || 5 || <br />
|-<br />
| Wyatt-B || 3 || <br />
|-<br />
| Wyatt-A || 2 || <br />
|-<br />
| asdf || 2 || <br />
|}<br />
<br />
== How to help archiving ==<br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget.<br />
<br />
# Get the code: <code>git clone https://github.com/ArchiveTeam/splinder-grab</code><br />
# Get and compile the latest version of wget-warc: <code>./get-wget-warc.sh</code><br />
# Think of a nickname for yourself (preferably use your IRC name).<br />
# Run the download script:<br />
#* To run a single downloader, run <code>./dld-client.sh "<YOURNICK>"</code>.<br />
#* To run multiple downloaders (and thus use your bandwidth more efficiently), do either:<br />
#** simply run as many copies of <code>dld-client.sh</code> as you like<br />
#** run <code>./dld-streamer.sh <YOURNICK> <N></code>, where <N> is the number of concurrent downloads you want.<br />
# To stop the script gracefully, run <code>touch STOP</code> in the script's working directory. It will finish the current task and stop.<br />
<br />
===Notes===<br />
<br />
* Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the <code>gnutls-devel</code> or <code>gnutls-dev</code> package with your favorite package manager.<br />
* Downloading one user's data can take between 10 seconds and several days.<br />
* The data for one user is equally varied, from a few kB to several GB.<br />
* The downloaded data will be saved in the <code>./data/</code> subdirectory.<br />
* Download speeds from splinder.com are not that high (servers may be particularly overloaded during European day because of additional traffic of people exporting their blogs). You can run multiple clients to speed things up.<br />
<br />
===Errors===<br />
* There are some problems with subdomains containing dashes[http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626472]: if they fail on your machine (reported: wget compiled with +nls), for now stop and restart the script, someone else will do those users (although they seem to fail in part anyway). <br />
*: Some such users: macrisa, -Maryanne-, it:SalixArdens, it:MCris, it:7lilla, it:thepinkpenguin, it:bimbambolina, it:lazzaretta, it:Hedwige, it:N4m3L3Ss, it:Barbabietole_Azzurre, it:celebrolesa2212, it:buongiono.mattina, it:DarkExtra, it:-slash-, it:marlene1, it:Ohina, us:XyKy, us:Naluf, it:elisablu, it:*JuLs*, it:RikuSan, it:Nasutina<br />
* There are also some problems with upload-finished.sh because of some inconsistencies in escaping special characters, e.g. [http://p.defau.lt/?NITL0SVf4K4QFRgCKmlWIg]; remember not to delete those directories without fixing/uploading them.<br />
* The script looks for errors in English, so it's better if you wget-warc to use English. Otherwise, errors like [http://toolserver.org/~nemobis/wget-phase-1.log these] won't be detected and the script will mark as done users which failed. Please run <code>fix-dld.sh</code> to fix those users, after changing <code>if grep -q "ERROR 50"</code> to your localised output.<br />
<br />
==== splinder_noconn.html errors ====<br />
<br />
Please check your wget logs for presence of a file named <code>splinder_noconn.html</code>. This is a transient maintenance page that has appeared in some downloads, but cannot be detected as an error by wget, because the page isn't returned with a status code indicating "an error occurred".<br />
<br />
Some examples:<br />
<br />
* https://gist.github.com/a15c7707ee666502a825<br />
* https://gist.github.com/0427b4ed12ae48f2fb5f<br />
* http://p.defau.lt/?sJOFev7prpKYpC_CYRnqrg<br />
<br />
These accounts may have to be re-fetched.<br />
<br />
== Uploading your data ==<br />
<br />
* To upload the data you've downloaded, first contact SketchCow on IRC for an rsync slot. Once you have that you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh batcave.textfiles.com::YOURNICK/splinder/</code><br />
* The script will upload only completed users. To check how much space the incomplete users are taking, without killing your disk, you can use <code>ionice -c 3 find -name .incomplete -printf "%h\0" | ionice -c 3 du -mcs --files0-from=-</code> in your <code>splinder-grab</code> directory.<br />
<br />
== Status ==<br />
<br />
There is a [http://splinder.heroku.com real-time dashboard] where you can check the progress.<br />
<br />
==External links==<br />
*http://www.splinder.com/<br />
*http://www.us.splinder.com/<br />
<br />
==Site structure==<br />
<br />
The users are identified by their usernames. Fortunately, the side provides a list of all users. Usernames are not case-sensitive, but there is a case preference.<br />
<br />
==Example URLs==<br />
User profile: <code><nowiki>http://www.splinder.com/profile/<<username>></nowiki></code><br />
<br />
<pre><br />
Example profile:<br />
http://www.splinder.com/profile/difficilifoglie<br />
<br />
View count on profile page:<br />
http://www.splinder.com/ajax.php?type=counter&op=profile&profile=Romanticdreamer<br />
<br />
Example of friends list paging: (160 per page, starting at 0)<br />
http://www.splinder.com/profile/difficilifoglie/friends<br />
http://www.splinder.com/profile/difficilifoglie/friends/160<br />
<br />
Inverse friends (probably also paged):<br />
http://www.splinder.com/profile/difficilifoglie/friendof<br />
<br />
Link to blog: (note: not always the same as the username)<br />
http://difficilifoglie.splinder.com/<br />
http://learnonline.splinder.com/<br />
<br />
Photo:<br />
http://www.splinder.com/profile/difficilifoglie/photo<br />
http://www.splinder.com/mediablog/wondermum/media/24544805<br />
<br />
Video:<br />
http://www.splinder.com/profile/wondermum/video<br />
http://www.splinder.com/mediablog/wondermum/media/25737390<br />
<br />
Audio:<br />
Not a separate user feed, but only accessible via mediablog<br />
http://www.splinder.com/mediablog/learnonline/media/25727030<br />
<br />
Mediablog: combination of the audio + video + photo lists<br />
http://www.splinder.com/mediablog/learnonline<br />
(16 per page, starting at 0)<br />
http://www.splinder.com/mediablog/learnonline/16<br />
<br />
Mediablog has PowerPoint, Word files:<br />
http://www.splinder.com/mediablog/learnonline/media/25641346<br />
http://www.splinder.com/mediablog/learnonline/media/25546305<br />
http://www.splinder.com/mediablog/learnonline/media/21901634<br />
http://www.splinder.com/mediablog/learnonline/media/24875290<br />
<br />
User avatar: grab url from profile page<br />
<br />
Photo file: grab url from photo page and remove _medium to get original picture<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_square.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d_medium.jpg<br />
http://files.splinder.com/d5e492233631af39212268593afca02d.jpg<br />
older photos do not have this structure, different ids for each size:<br />
http://www.splinder.com/mediablog/babboramo/media/17359043<br />
http://files.splinder.com/13b615ccbd75354ee4e0d973da66c2b2.jpeg<br />
http://files.splinder.com/770d7b9ecac27083d9204af327ebe743.jpeg<br />
<br />
PowerPoint, Word files: grab url from media page<br />
http://files.splinder.com/46dbf3d5a0b12e490f81ddb8444b4fad.ppt<br />
http://files.splinder.com/ab3ce16c850ac530351d9df0937152c7.pdf<br />
<br />
Video items: grab url from media page<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_square.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_thumbnail.jpg<br />
http://files.splinder.com/8f5caff20685648bacd4ce1acf90e645_small.flv<br />
note: square, thumbnail, small is not always available, check flashvars for vidpath, imgpath<br />
http://www.splinder.com/mediablog/babboramo/media/13131052<br />
http://files.splinder.com/e067653e1532e55ee208605fcb84361a.flv<br />
http://files.splinder.com/f56060b7fef139f03b72e06ca9fcba55.jpeg<br />
<br />
Audio items: grab url from media page, flashvars<br />
sometimes there is a _thumbnail, remove that to get a better quality<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef_thumbnail.mp3<br />
http://files.splinder.com/a5043c34a12ee66f5ad995ffd14493ef.mp3<br />
<br />
Comments on blog posts:<br />
http://www.splinder.com/myblog/comment/list/25742358<br />
on some, but not on all blogs, those comments are also included in the blog page<br />
http://dal15al25.splinder.com/post/25740180<br />
http://soluzioni.splinder.com/post/2802227/blog-pager-su-piu-righe<br />
http://soluzioni.splinder.com/post/25737683/avviso-per-gli-utenti-ce-da-preoccuparsi/<br />
http://civati.splinder.com/post/25742977<br />
pagination: see media comments<br />
<br />
Comments on media items:<br />
http://www.splinder.com/media/comment/list/21254470<br />
http://www.splinder.com/media/comment/list/21254470?from=50<br />
(50 per page, starting at 0)<br />
number of comments is on the media page<br />
http://www.splinder.com/mediablog/danspo/media/21254470<br />
<br />
<br />
Blog urls:<br />
the blogs have content from their own subdomain, but also from<br />
files.splinder.com<br />
www.splinder.com/misc/ (topbar css, gif)<br />
www.splinder.com/includes/ (js)<br />
www.splinder.com/modules/service_links/ (images)<br />
syndication.splinder.com<br />
<br />
links to www.splinder.com that should NOT be followed:<br />
/myblog/<br />
/users/<br />
/media/<br />
/node/<br />
/profile/<br />
/mediablog/<br />
/community/<br />
/user/<br />
/night/<br />
/home/<br />
/mysearch/<br />
/online/<br />
/trackback/<br />
<br />
</pre><br />
<br />
wget-warc --mirror --page-requisites --span-hosts --domains=learnonline.splinder.com,files.splinder.com,www.splinder.com,syndication.splinder.com --exclude-directories="/users,/media,/node,/profile,/mediablog,/community,/user,/night,/home,/mysearch,/online,/trackback,/myblog/post,/myblog/posts,/myblog/tags,/myblog/tag,/myblog/view,/myblog/latest,/myblog/subscribe" -nv -o wget.log "http://learnonline.splinder.com/"<br />
<br />
{{Navigation box}}</div>Dnovahttps://wiki.archiveteam.org/index.php?title=User:Dnova&diff=6970User:Dnova2011-12-07T07:02:38Z<p>Dnova: Created page with 'I am dnova. if you need me you can find me on efnet.'</p>
<hr />
<div>I am dnova. if you need me you can find me on efnet.</div>Dnovahttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=6969Main Page2011-12-07T06:48:02Z<p>Dnova: </p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
[[Image:Archiveteam.jpg|center|300px]]<br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: ''IT DOESN'T HAVE TO BE THIS WAY''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<!-- featured article ends --><br />
</tr><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Currently Active Projects (Get Involved Here!)</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[Splinder]]''' - An Italian blog/media site with over 1.3 million users, closing on Jan 31, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** Most of the profiles are now rescued, with some difficult (large) ones remaining.<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, is shutting down on June 30, 2012.<br />
** A distributed tracker and very easy-to-use scripts are in place.<br />
** It has been reported that Mobileme is going somewhat slowly. At the time of writing, an estimated 2% of the site is rescued. <br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam wants a copy for posterity. <br />
** Coders are currently needed to figure out an intelligent way to comprehensively archive the site. <br />
<!-- active ends --><br />
<tr><th colspan=2><br />
<h2 style="margin:0;background-color:#a3b0bf;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Archive Team News</h2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Dnova