Wikimedia Commons

From Archiveteam
Jump to navigation Jump to search
Wikimedia Commons
Wikimedia Commons mainpage on 2010-12-13
Wikimedia Commons mainpage on 2010-12-13
URL http://commons.wikimedia.org
Status Online!
Archiving status In progress...
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

Wikimedia Commons is a database of freely usable media files with more than 32 million files. As of August 2016, total size is over 84 TB (check).

Archiving process

Tools

How-to

Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:

  • python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]

Don't forget 30th days and 31st days on some months. Also, February 29th in some years.

To verify the download data use the checker script:

  • python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]

Tools required

If downloading using a very new server (i.e. a default virtual machine), you got to download zip (Ubuntu: apt-get install zip)

Python should be already installed on your server, if not then just install it!

Also has a dependency on curl and wget, which should be installed on your server by default...

Volunteers

Please, wait until we do some tests. Probably, long filenames bug.
Nick Start date End date Images Size Revision Status Notes
Hydriz 2004-09-07 2005-06-30 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
October 2004: [1]
November 2004: [2]
December 2004: [3]
January 2005: [4]
February 2005: [5]
March 2005: [6] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)
April 2005: [7]
May 2005: [8]
June 2005: [9]
Hydriz 2005-07-01 2005-12-31 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2005: [10]
August 2005: [11]
September 2005: [12]
October 2005: [13]
November 2005: [14]
December 2005: [15]
Hydriz 2006-01-01 2006-01-10 13198 4.8GB r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-01-11 2006-06-30 ? ? r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-07-01 2006-12-31 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ
August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ
September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw
October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w
November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg
December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw
Hydriz 2007-01-01 2007-12-31 ? ? r349 Downloading Check:
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007

Errors

  • oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg
  • broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
  • Issue 45: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.
  • Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26, 2007-03-07, 2007-03-13, 2007-03-25, 2007-03-30, 2007-04-12, 2007-04-14, 2007-04-20, 2007-05-04, 2007-05-08, 2007-05-10, 2007-05-29, 2007-06-05, 2007-06-22.

I'm going to file a bug in bugzilla.

Uploading

UPLOAD using the format: wikimediacommons-<year><month>

E.g. wikimediacommons-200601 for January 2006 grab.

If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.

Other dumps

There is no public dump of all images. WikiTeam is working on a scraper (see section above).

Pictures of the Year (best ones):

Featured images

Wikimedia Commons contains a lot images of high quality.

Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png

Statistics

Stats per year

MariaDB [commonswiki_p]> select year(img_timestamp) as date, count(*) as numimages, round(sum(img_size)/(1024*1024*1024)) as gigabytes from image where 1 group by date;
+------+-----------+-----------+
| date | numimages | gigabytes |
+------+-----------+-----------+
| NULL |         1 |         0 |
| 2003 |         1 |         0 |
| 2004 |     13990 |         3 |
| 2005 |    233100 |        94 |
| 2006 |    583717 |       310 |
| 2007 |   1147177 |       767 |
| 2008 |   1356619 |      1161 |
| 2009 |   1884533 |      2277 |
| 2010 |   2297470 |      3020 |
| 2011 |   3888479 |      5348 |
| 2012 |   3396594 |      7228 |
| 2013 |   4573531 |     12587 |
| 2014 |   4614931 |     19344 |
| 2015 |   5669382 |     17146 |
| 2016 |   3214198 |     17669 |
+------+-----------+-----------+
15 rows in set, 2 warnings (5 min 56.85 sec)

See also

  • Wikipedia, some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use

External links

v · t · e         Knowledge and Wikis
Software

DokuWiki · MediaWiki · MoinMoin · Oddmuse · PukiWiki · UseModWiki · YukiWiki

Wikifarms

atwiki · Battlestar Wiki · BluWiki · Communpedia · EditThis · elwiki.com · Fandom · Miraheze · Neoseeker.com · Orain · Referata · ScribbleWiki · Seesaa · ShoutWiki · SourceForge · TropicalWikis · Wik.is · Wiki.Wiki · Wiki-Site · Wikidot · WikiHub · Wikispaces · WikiForge · WikiTide · Wikkii · YourWiki.net

Wikimedia

Wikipedia · Wikimedia Commons · Wikibooks · Wikidata · Wikinews · Wikiquote · Wikisource · Wikispecies · Wiktionary · Wikiversity · Wikivoyage · Wikimedia Incubator · Meta-Wiki

Other

Anarchopedia · Citizendium · Conservapedia · Creation Wiki · EcuRed · Enciclopedia Libre Universal en Español · GNUPedia · Moegirlpedia · Nico Nico Pedia · Nupedia · OmegaWiki · OpenStreetMap · Pixiv Encyclopedia

Indexes and stats

WikiApiary · WikiIndex · Wikistats