Difference between revisions of "Wikimedia Commons"

From Archiveteam
Jump to navigation Jump to search
(→‎Size stats: update)
Line 82: Line 82:
 
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]
 
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]
  
== Size stats ==
+
== Statistics ==
Combined image sizes hosted in Wikimedia Commons sorted by month.
+
 
 +
=== Stats per year ===
 +
 
 
<pre>
 
<pre>
date sum(img_size) in bytes
+
MariaDB [commonswiki_p]> select year(img_timestamp) as date, count(*) as numimages, round(sum(img_size)/(1024*1024*1024)) as gigabytes from image where 1 group by date;
2003-1 1360188
+
+------+-----------+-----------+
2004-10 637349207
+
| date | numimages | gigabytes |
2004-11 726517177
+
+------+-----------+-----------+
2004-12 1503501023
+
| NULL |        1 |        0 |
2004-9 188850959
+
| 2003 |        1 |        0 |
2005-1 1952816194
+
| 2004 |    13990 |        3 |
2005-10 17185495206
+
| 2005 |    233100 |        94 |
2005-11 9950998969
+
| 2006 |    583717 |      310 |
2005-12 11430418722
+
| 2007 |  1147177 |      767 |
2005-2 3118680401
+
| 2008 |  1356619 |      1161 |
2005-3 3820401370
+
| 2009 |  1884533 |      2277 |
2005-4 5476827971
+
| 2010 |  2297470 |      3020 |
2005-5 10998180401
+
| 2011 |  3888479 |      5348 |
2005-6 7160629133
+
| 2012 |  3396594 |      7228 |
2005-7 9206024659
+
| 2013 |  4573531 |    12587 |
2005-8 12591218859
+
| 2014 |  4614931 |    19344 |
2005-9 14060418086
+
| 2015 |  5669382 |    17146 |
2006-1 15433548270
+
| 2016 |  3214198 |    17669 |
2006-10 33574470896
+
+------+-----------+-----------+
2006-11 34231957288
+
15 rows in set, 2 warnings (5 min 56.85 sec)
2006-12 30607951770
 
2006-2 14952310277
 
2006-3 19415486302
 
2006-4 23041609453
 
2006-5 29487911752
 
2006-6 29856352192
 
2006-7 32257412994
 
2006-8 50940607926
 
2006-9 37624697336
 
2007-1 40654722866
 
2007-10 89872715966
 
2007-11 81975793043
 
2007-12 75515001911
 
2007-2 39452895714
 
2007-3 53706627561
 
2007-4 72917771224
 
2007-5 72944518827
 
2007-6 63504951958
 
2007-7 76230887667
 
2007-8 91290158697
 
2007-9 100120203171
 
2008-1 84582810181
 
2008-10 122360827827
 
2008-11 116290099578
 
2008-12 126446332364
 
2008-2 77416420840
 
2008-3 89120317630
 
2008-4 98180062150
 
2008-5 117840970706
 
2008-6 100352888576
 
2008-7 128266650486
 
2008-8 130452484462
 
2008-9 120247362867
 
2009-1 127226957021
 
2009-10 345591510325
 
2009-11 197991117397
 
2009-12 228003186895
 
2009-2 125819024255
 
2009-3 273597778760
 
2009-4 212175602700
 
2009-5 191651496603
 
2009-6 195998789357
 
2009-7 241366758346
 
2009-8 262927838267
 
2009-9 184963508476
 
2010-1 226919138307
 
2010-2 191615007774
 
2010-3 216425793739
 
2010-4 312177184245
 
2010-5 312240110181
 
2010-6 283374261868
 
2010-7 362175217639
 
2010-8 172072631498
 
 
</pre>
 
</pre>
  

Revision as of 21:57, 2 August 2016

Wikimedia Commons
Wikimedia Commons logo
Wikimedia Commons mainpage on 2010-12-13
Wikimedia Commons mainpage on 2010-12-13
URL http://commons.wikimedia.org
Project status Online!
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam (on EFnet)
Project lead Unknown

Wikimedia Commons is a database of freely usable media files with more than 32 million files. As of August 2016, total size is over 84 TB (check).

Archiving process

Tools

How-to

Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:

  • python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]

Don't forget 30th days and 31st days on some months. Also, February 29th in some years.

To verify the download data use the checker script:

  • python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]

Tools required

If downloading using a very new server (i.e. a default virtual machine), you got to download zip (Ubuntu: apt-get install zip)

Python should be already installed on your server, if not then just install it!

Also has a dependency on curl and wget, which should be installed on your server by default...

Volunteers

Please, wait until we do some tests. Probably, long filenames bug.
Nick Start date End date Images Size Revision Status Notes
Hydriz 2004-09-07 2005-06-30 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
October 2004: [1]
November 2004: [2]
December 2004: [3]
January 2005: [4]
February 2005: [5]
March 2005: [6] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)
April 2005: [7]
May 2005: [8]
June 2005: [9]
Hydriz 2005-07-01 2005-12-31 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2005: [10]
August 2005: [11]
September 2005: [12]
October 2005: [13]
November 2005: [14]
December 2005: [15]
Hydriz 2006-01-01 2006-01-10 13198 4.8GB r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-01-11 2006-06-30 ? ? r349 Downloaded
Uploaded to the Internet Archive
Hydriz 2006-07-01 2006-12-31 ? ? r643 Downloaded
Uploaded to the Internet Archive
Check:
July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ
August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ
September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw
October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w
November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg
December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw
Hydriz 2007-01-01 2007-12-31 ? ? r349 Downloading Check:
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007

Errors

  • oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg
  • broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory
  • Issue 45: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.
  • Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26, 2007-03-07, 2007-03-13, 2007-03-25, 2007-03-30, 2007-04-12, 2007-04-14, 2007-04-20, 2007-05-04, 2007-05-08, 2007-05-10, 2007-05-29, 2007-06-05, 2007-06-22.

I'm going to file a bug in bugzilla.

Uploading

UPLOAD using the format: wikimediacommons-<year><month>

E.g. wikimediacommons-200601 for January 2006 grab.

If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.

Other dumps

There is no public dump of all images. WikiTeam is working on a scraper (see section above).

Pictures of the Year (best ones):

Featured images

Wikimedia Commons contains a lot images of high quality.

Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png

Statistics

Stats per year

MariaDB [commonswiki_p]> select year(img_timestamp) as date, count(*) as numimages, round(sum(img_size)/(1024*1024*1024)) as gigabytes from image where 1 group by date;
+------+-----------+-----------+
| date | numimages | gigabytes |
+------+-----------+-----------+
| NULL |         1 |         0 |
| 2003 |         1 |         0 |
| 2004 |     13990 |         3 |
| 2005 |    233100 |        94 |
| 2006 |    583717 |       310 |
| 2007 |   1147177 |       767 |
| 2008 |   1356619 |      1161 |
| 2009 |   1884533 |      2277 |
| 2010 |   2297470 |      3020 |
| 2011 |   3888479 |      5348 |
| 2012 |   3396594 |      7228 |
| 2013 |   4573531 |     12587 |
| 2014 |   4614931 |     19344 |
| 2015 |   5669382 |     17146 |
| 2016 |   3214198 |     17669 |
+------+-----------+-----------+
15 rows in set, 2 warnings (5 min 56.85 sec)

See also

  • Wikipedia, some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use

External links


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ