Talk:Alive... OR ARE THEY

From Archiveteam
Jump to navigation Jump to search

Has anyone attempted to restore a running copy of wikipedia from the dumps? If so do the dumps provide enough data that, in the event of a catastrophic data failure, wikipedia could be brought back up using just the backups?--Adewale 08:42, 27 February 2009 (UTC)

  • That's a very good question. The problem is that the dumps are now insanely huge. I don't know how many people would even have the capacity to unpack them. --Jscott 15:45, 27 February 2009 (UTC)
  • This: suggests that DBpedia and FreeBase are attempting to maintain their own structured versions of the wikipedia dataset. Theoretically, if Amazon keeps their public dataset up to date, then it is possible to restore wikipedia from that.
  • Even worse, there is no real backup of the image data. According to their backup procedures[1] they only manually rsync them to another remote host sometimes. They already lost some images because of a software bug that they could not restore[2]. Maybe there is a possibility to (slowly) download all images (they provide a dump of the database table that contains all the image metadata) and safe them. As the text-only dump of wikipedia is already > 2 TB without compression, the image size must be HUGE. --Soult 10:57, 4 April 2009 (UTC)
    • Just calculated the total size: All images (without meta-data) are about 2604 GB (2.54 TB) in size (as of 24 January, 2009) without counting deleted or replaced images. --Soult 23:40, 9 April 2009 (UTC)
All the images in Wikimedia Commons (7 millions) are about 6TB. Emijrp 23:12, 1 November 2010 (UTC)
  • Uh, well I know I use WikiTaxi regularly to read articles from the dumps. It works quite well. As for how easy it'd be to restore the site from the dumps rather than just read from them, I don't know. But I know the information is there and there's even tools already that read it. --Qwerty0 19:21, 20 April 2011 (UTC)

Wikimedia Commons monthly uploads

date	sum(img_size)
2003-1	1360188
2004-10	637349207
2004-11	726517177
2004-12	1503501023
2004-9	188850959
2005-1	1952816194
2005-10	17185495206
2005-11	9950998969
2005-12	11430418722
2005-2	3118680401
2005-3	3820401370
2005-4	5476827971
2005-5	10998180401
2005-6	7160629133
2005-7	9206024659
2005-8	12591218859
2005-9	14060418086
2006-1	15433548270
2006-10	33574470896
2006-11	34231957288
2006-12	30607951770
2006-2	14952310277
2006-3	19415486302
2006-4	23041609453
2006-5	29487911752
2006-6	29856352192
2006-7	32257412994
2006-8	50940607926
2006-9	37624697336
2007-1	40654722866
2007-10	89872715966
2007-11	81975793043
2007-12	75515001911
2007-2	39452895714
2007-3	53706627561
2007-4	72917771224
2007-5	72944518827
2007-6	63504951958
2007-7	76230887667
2007-8	91290158697
2007-9	100120203171
2008-1	84582810181
2008-10	122360827827
2008-11	116290099578
2008-12	126446332364
2008-2	77416420840
2008-3	89120317630
2008-4	98180062150
2008-5	117840970706
2008-6	100352888576
2008-7	128266650486
2008-8	130452484462
2008-9	120247362867
2009-1	127226957021
2009-10	345591510325
2009-11	197991117397
2009-12	228003186895
2009-2	125819024255
2009-3	273597778760
2009-4	212175602700
2009-5	191651496603
2009-6	195998789357
2009-7	241366758346
2009-8	262927838267
2009-9	184963508476
2010-1	226919138307
2010-2	191615007774
2010-3	216425793739
2010-4	312177184245
2010-5	312240110181
2010-6	283374261868
2010-7	362175217639
2010-8	172072631498

In bytes. In July 2010 were uploaded 362 GB. Emijrp 23:16, 1 November 2010 (UTC)

Add a section to the article that suggest workarounds

Easiest (to save content) is to submit to multiple websites. My personal favorites for typical files I upload are Ovi Share (by Nokia, unlimited diskspace for quite a wide variety of files but no easy mass download), Scribd, docstoc, Slideshare and (from which I can download files as a zip file). --Jaakkoh 04:39, 4 April 2009 (UTC)

Wasn't sure where to post this, but what about FriendFeed, given that they're in the process of being acquired by Facebook, and people are talking about leaving and taking their content with them/deleting accounts? TysonKey 17:49, 13 August 2009 (UTC)

Perhaps some mention should be made that the entity which owns this owes hundreds of *trillions* of dollars with no clear plan or schedule to repay this money. As a house in a bad neighbourhood with a world-leading quantity of foreclosures, it's just a matter of time before Commie China Inc. calls in the loans and the entire house of cards collapses. --Carlb 18:24, 13 February 2012 (UTC)

Uncyclopedia and various former Wikia

Quite a few of the Uncyclopedia individual-language Wikipedia parodies currently are Wikia and all but two of the rest have long been downloadable on - but there are two major ones missing (because they're hosted as independents): and (Russian and Korean, respectively). Odds are that there are backups of both from before they were moved independent, but these will be badly out of date. No idea whether the new hosts of these have any backups available for download.

The same pattern likely also holds for a long list of wikis which have left Wikia due to ad-heavy forced reskins of that site in 2008 and again in 2010. and were intended as consumer-complaint sites about Wikia, but de-facto should be being read as lists of MediaWiki installations, newly independent in hosting, which may or may not have downloadable backups.

Might be best to assume that, just because Wikia (or another wiki farm) left the old wiki open and abandoned since 2008, that there's anything in the Wikia backups other than outdated content and vandalism once the community has established itself independently elsewhere. Wikia's infamous for keeping old wikis open after the community leaves and Wikia staff have been seen removing links to the new wiki on numerous occasions.

Annoyingly, if some automated process on the old site is generating periodic backups of old data, the timestamp will appear current when the underlying data is three years outdated. --Carlb 18:24, 13 February 2012 (UTC)


Is it really doing 'ok' ? Or, could it really all come crumbling down?

    1. It would be a monumental (and likely impossible task)
    2. Most likely there would be ample opportunity and time given for users to grab their own data
      1. What about the if not case, and of those users who might not or could not?
  • [hxxp:// Facebook is Dying, New Class Action Lawsuit Takes Aim]

I think most Facebook data is user-private and cannot be grabbed. Zeronet (talk) 09:12, 26 November 2018 (UTC)

I am not sure if adding these is appropriate.

Youku - Chinese clone of YouTube
Tieba - Chinese clone of Reddit
Weibo - Chinese clone of Twitter
Zhihu - Chinese clone of Quora

niconico - Japan video site with "danmaku" (instant commentary system)
AcFun - Chinese clone of niconico
bilibili - Chinese clone of AcFun

Douban - Chinese equivalent of IMDb / ISBNdb, and have many other features - equivalent of Douban, focus on Japanese popular culture - music listening and downloading
Neteast Music - music listening and downloading

RuTracker - torrent site

Zeronet (talk) 19:33, 9 December 2018 (UTC)