Wikimedia Commons
Wikimedia Commons | |
![]() Wikimedia Commons mainpage on 2010-12-13 | |
URL | http://commons.wikimedia.org |
Status | Online! |
Archiving status | In progress... |
Archiving type | Unknown |
IRC channel | #archiveteam-bs (on hackint) |
Wikimedia Commons is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).
Current size (based on January 18, 2012 estimate): 13.3TB, old versions 881GB
Archiving process
Tools
- Download script (Python)
- Checker script (Python)
- Feed lists (from 2004-09-07 to 2008-12-31; more coming soon)
How-to
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:
- python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.
To verify the download data use the checker script:
- python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]
Tools required
If downloading using a very new server (i.e. a default virtual machine), you got to download zip (Ubuntu: apt-get install zip)
Python should be already installed on your server, if not then just install it!
Also has a dependency on curl and wget, which should be installed on your server by default...
Volunteers
- Please, wait until we do some tests. Probably, long filenames bug.
Nick | Start date | End date | Images | Size | Revision | Status | Notes |
---|---|---|---|---|---|---|---|
Hydriz | 2004-09-07 | 2005-06-30 | ? | ? | r643 | Downloaded Uploaded to the Internet Archive |
Check: October 2004: [1] November 2004: [2] December 2004: [3] January 2005: [4] February 2005: [5] March 2005: [6] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking) April 2005: [7] May 2005: [8] June 2005: [9] |
Hydriz | 2005-07-01 | 2005-12-31 | ? | ? | r643 | Downloaded Uploaded to the Internet Archive |
Check: July 2005: [10] August 2005: [11] September 2005: [12] October 2005: [13] November 2005: [14] December 2005: [15] |
Hydriz | 2006-01-01 | 2006-01-10 | 13198 | 4.8GB | r349 | Downloaded Uploaded to the Internet Archive |
|
Hydriz | 2006-01-11 | 2006-06-30 | ? | ? | r349 | Downloaded Uploaded to the Internet Archive |
|
Hydriz | 2006-07-01 | 2006-12-31 | ? | ? | r643 | Downloaded Uploaded to the Internet Archive |
Check: July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw |
Hydriz | 2007-01-01 | 2007-12-31 | ? | ? | r349 | Downloading | Check: January 2007 |
Errors
- oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg
- broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
- Issue 45: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.
I'm going to file a bug in bugzilla.
Uploading
UPLOAD using the format: wikimediacommons-<year><month>
E.g. wikimediacommons-200601 for January 2006 grab.
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.
Other dumps
There is no public dump of all images. WikiTeam is working on a scraper (see section above).
Pictures of the Year (best ones):
Featured images
Wikimedia Commons contains a lot images of high quality.
Size stats
Combined image sizes hosted in Wikimedia Commons sorted by month.
date sum(img_size) in bytes 2003-1 1360188 2004-10 637349207 2004-11 726517177 2004-12 1503501023 2004-9 188850959 2005-1 1952816194 2005-10 17185495206 2005-11 9950998969 2005-12 11430418722 2005-2 3118680401 2005-3 3820401370 2005-4 5476827971 2005-5 10998180401 2005-6 7160629133 2005-7 9206024659 2005-8 12591218859 2005-9 14060418086 2006-1 15433548270 2006-10 33574470896 2006-11 34231957288 2006-12 30607951770 2006-2 14952310277 2006-3 19415486302 2006-4 23041609453 2006-5 29487911752 2006-6 29856352192 2006-7 32257412994 2006-8 50940607926 2006-9 37624697336 2007-1 40654722866 2007-10 89872715966 2007-11 81975793043 2007-12 75515001911 2007-2 39452895714 2007-3 53706627561 2007-4 72917771224 2007-5 72944518827 2007-6 63504951958 2007-7 76230887667 2007-8 91290158697 2007-9 100120203171 2008-1 84582810181 2008-10 122360827827 2008-11 116290099578 2008-12 126446332364 2008-2 77416420840 2008-3 89120317630 2008-4 98180062150 2008-5 117840970706 2008-6 100352888576 2008-7 128266650486 2008-8 130452484462 2008-9 120247362867 2009-1 127226957021 2009-10 345591510325 2009-11 197991117397 2009-12 228003186895 2009-2 125819024255 2009-3 273597778760 2009-4 212175602700 2009-5 191651496603 2009-6 195998789357 2009-7 241366758346 2009-8 262927838267 2009-9 184963508476 2010-1 226919138307 2010-2 191615007774 2010-3 216425793739 2010-4 312177184245 2010-5 312240110181 2010-6 283374261868 2010-7 362175217639 2010-8 172072631498
See also
- Wikipedia, some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use