Difference between revisions of "Talk:Alive... OR ARE THEY"

From Archiveteam
Jump to navigation Jump to search
(Add size of wikimedia image dump)
Line 3: Line 3:
* This: http://aws.amazon.com/publicdatasets/#1 suggests that DBpedia and FreeBase are attempting to maintain their own structured versions of the wikipedia dataset. Theoretically, if Amazon keeps their public dataset up to date, then it is possible to restore wikipedia from that.
* This: http://aws.amazon.com/publicdatasets/#1 suggests that DBpedia and FreeBase are attempting to maintain their own structured versions of the wikipedia dataset. Theoretically, if Amazon keeps their public dataset up to date, then it is possible to restore wikipedia from that.
* Even worse, there is no real backup of the image data. According to their backup procedures[http://wikitech.wikimedia.org/view/Backup_procedures] they only manually rsync them to another remote host sometimes. They already lost some images because of a software bug that they could not restore[http://thread.gmane.org/gmane.org.wikimedia.commons/4175/focus=4178]. Maybe there is a possibility to (slowly) download all images (they provide a dump of the database table that contains all the image metadata) and safe them. As the text-only dump of wikipedia is already > 2 TB without compression, the image size must be HUGE. --[[User:Soult|Soult]] 10:57, 4 April 2009 (UTC)
* Even worse, there is no real backup of the image data. According to their backup procedures[http://wikitech.wikimedia.org/view/Backup_procedures] they only manually rsync them to another remote host sometimes. They already lost some images because of a software bug that they could not restore[http://thread.gmane.org/gmane.org.wikimedia.commons/4175/focus=4178]. Maybe there is a possibility to (slowly) download all images (they provide a dump of the database table that contains all the image metadata) and safe them. As the text-only dump of wikipedia is already > 2 TB without compression, the image size must be HUGE. --[[User:Soult|Soult]] 10:57, 4 April 2009 (UTC)
** Just calculated the total size: All images (without meta-data) are about 2604 GB (2.54 TB) in size (as of 24 January, 2009) without counting deleted or replaced images. --[[User:Soult|Soult]] 23:40, 9 April 2009 (UTC)


== Add a section to the article that suggest workarounds ==
== Add a section to the article that suggest workarounds ==


Easiest (to save content) is to submit to multiple websites. My personal favorites for typical files I upload are [[Ovi Share]] (by Nokia, unlimited diskspace for quite a wide variety of files but no easy mass download), [[Scribd]], [[docstoc]], [[Slideshare]] and [[Box.net]] (from which I can download files as a zip file). --[[User:Jaakkoh|Jaakkoh]] 04:39, 4 April 2009 (UTC)
Easiest (to save content) is to submit to multiple websites. My personal favorites for typical files I upload are [[Ovi Share]] (by Nokia, unlimited diskspace for quite a wide variety of files but no easy mass download), [[Scribd]], [[docstoc]], [[Slideshare]] and [[Box.net]] (from which I can download files as a zip file). --[[User:Jaakkoh|Jaakkoh]] 04:39, 4 April 2009 (UTC)

Revision as of 23:40, 9 April 2009

Has anyone attempted to restore a running copy of wikipedia from the dumps? If so do the dumps provide enough data that, in the event of a catastrophic data failure, wikipedia could be brought back up using just the backups?--Adewale 08:42, 27 February 2009 (UTC)

  • That's a very good question. The problem is that the dumps are now insanely huge. I don't know how many people would even have the capacity to unpack them. --Jscott 15:45, 27 February 2009 (UTC)
  • This: http://aws.amazon.com/publicdatasets/#1 suggests that DBpedia and FreeBase are attempting to maintain their own structured versions of the wikipedia dataset. Theoretically, if Amazon keeps their public dataset up to date, then it is possible to restore wikipedia from that.
  • Even worse, there is no real backup of the image data. According to their backup procedures[1] they only manually rsync them to another remote host sometimes. They already lost some images because of a software bug that they could not restore[2]. Maybe there is a possibility to (slowly) download all images (they provide a dump of the database table that contains all the image metadata) and safe them. As the text-only dump of wikipedia is already > 2 TB without compression, the image size must be HUGE. --Soult 10:57, 4 April 2009 (UTC)
    • Just calculated the total size: All images (without meta-data) are about 2604 GB (2.54 TB) in size (as of 24 January, 2009) without counting deleted or replaced images. --Soult 23:40, 9 April 2009 (UTC)

Add a section to the article that suggest workarounds

Easiest (to save content) is to submit to multiple websites. My personal favorites for typical files I upload are Ovi Share (by Nokia, unlimited diskspace for quite a wide variety of files but no easy mass download), Scribd, docstoc, Slideshare and Box.net (from which I can download files as a zip file). --Jaakkoh 04:39, 4 April 2009 (UTC)