Difference between revisions of "Google Video (Archive)"

From Archiveteam
Jump to navigation Jump to search
(Add link to collection and item.)
 
(473 intermediate revisions by 67 users not shown)
Line 1: Line 1:
:''See also [[Google Video Warroom]].''
{{Infobox project
{{Infobox project
| title = Google Video
| title = Google Video
| image = Video logo lg.gif
| image = Video logo lg.gif
| description =  
| description = Google Video logo
| URL = http://video.google.com
| URL = http://video.google.com
| project_status = {{closing}} in 2011-04-29[http://video.google.com/support/bin/answer.py?answer=1233300&hl=en]
| project_status = {{offline}} on 2011-04-29[http://video.google.com/support/bin/answer.py?answer=1233300&hl=en]
| archiving_status = {{inprogress}}
| archiving_status = {{saved}}
| irc = googlegrape
| irc_network = EFnet
| irc_abandoned = true
| data = {{IA item|google-video-metadata-dumpage}} <br> {{IA collection|googlevideo2011}} (access restricted)
}}
}}
[[File:Papua videos.png|thumb|right|300px|Google Video results for "Papua New Guinea" keyword.]]
__NOTOC__
__NOTOC__
'''Google Video''' is a [[Video hostings|video sharing]] website which is shutting down.
'''Google Video''' is a [[Video hostings|video sharing]] website which is shutting down.
Line 12: Line 18:
If you want to '''save your own videos''', see the announcement and tools below.  
If you want to '''save your own videos''', see the announcement and tools below.  


If you want to '''help archive Google Video''', get some machines running and join us in [[IRC]] (EFNet #archiveteam / #googlegrape)
If you want to '''help archive Google Video''', get some machines running and join us in [[IRC]].


== Joining the archival effort ==
== Joining the archival effort ==
The automatic scripts only work on FreeBSD, Linux, Windows and maybe OS X. They also seem to work fine in Cygwin. Alternatively, you can run *nix in a virtual machine (given you have a fast enough machine).
The automatic scripts only work on FreeBSD, Linux, Solaris, Windows and maybe OS X. They also seem to work fine in Cygwin. Alternatively, you can run *nix in a virtual machine (given you have a fast enough machine).


=== To help scrape videos ===
Anyone can help out, but we would *really* appreciate it if you'd use an *NIX system over any thoughts of doing it on a Windows system. If you however choose to pursue the Magical World of Windows - please make sure that what you are collecting is not damaged as a consequence of running it on a Windows system.


First of all, please add your name/nickname to [http://piratepad.net/gv-participants this list], along with the storage and bandwidth you have available.
In any case, the first thing to do is to please add your name/nickname to [http://piratepad.net/gv-participants this list], along with the storage and bandwidth you have available.


'''On Linux Systems'''
=== What can I do? ===


* Download [http://www.textfiles.com/videoyahoo/SCRIPTS/youtube-dl youtube-dl] or from your distribution.
The two main tasks are: indexing and downloading. The easiest and least taxing is indexing (see [[Google Video Warroom#Indexing Videos To Identify Related Videos]]).  If you have some extra bandwidth and space think about running [[Google Video Warroom#Downloading Videos Via Related Video Metadata (aka Listerine)|Listerine]] to download videos. Both of these tasks are automated and can be left running in the background. It is often good practice to start a few process of each at once.
** Make sure it's marked executable: chmod +x youtube-dl
* Download and install wget for your distribution
* Download [http://199.48.254.90/at/googlegargle googlegargle] (Norc's updated, dupe-safe version of googlegargle is [https://github.com/norcnorc/googlegargle/blob/master/googlegargle here].)
* Get aria2 from your distribution (or if you're on Mac OS X, [http://www.macports.org/install.php MacPorts]) or [http://aria2.sourceforge.net/ SourceForge]
* Pick a seed list from below, save it under the filename "list" and add your name to the list (you will need a wiki account)
* Change the first few lines of the googlegargle script to reflect your installation
** If you're using youtube-dl from your distro, run "which youtube-dl" or "sudo updatedb; locate youtube-dl" to find the location of the command. Change DLSCRIPT to this.
* For older aria versions, some options need to be removed (--max-connection-per-server=16 --min-split-size=1M)
** You  might need to upgrade your version from your system package manager, however the most recent version still may not suffice.
* Change the ARIA variable in the script to the location of your ARIA executable. Usually (ubuntu) at /usr/bin/aria2c, change ARIA variable to this.
** To know where aria2 is located you can use either of these commands:
*** "sudo updatedb; locate aria2"
*** "which aria2" / "which aria2c"
* Invoke googlegargle
* Check with your OS settings to insure that your computer will not auto suspend or sleep after long periods of inactivity.<br/>
 
'''On Windows Systems'''
 
* Download the scraping script for Windows (you still need [http://python.org/download/ python] and [http://sourceforge.net/projects/aria2/files/stable/aria2-1.11.1/aria2-1.11.1-mingw32msvc-build1.zip/download aria2], which can be downloaded separately - instructions in archive). Script location: http://www.pentium100.com/gg_windows.zip
 
 
Don't forget to join the IRC channel to coordinate who's getting what!
 
=== To help index videos (low bandwidth/storage) ===
 
'''On Linux Systems'''
 
'''Note''': This will only work on Linux machines with X running. To run it on a headless server, use Xvfb (virtual framebuffer). On Ubuntu/Debian: 'apt-get install xvfb', then use xvfb-run to start your main script. An X server will now be made available to any programs that need it.
 
* Get the tools needed to build phantomjs (a headless web browser) and run the script: Qt WebKit, git, and curl. On Debian or Ubuntu, install the packages '''build-essential''', '''curl''', '''git''', '''libqtwebkit4''', '''libqtwebkit-dev''', and '''libqt4-dev''' by issuing the command:
<pre><nowiki>sudo apt-get install build-essential curl git libqtwebkit4 libqtwebkit-dev libqt4-dev</nowiki></pre>
or, on Fedora:
<pre><nowiki>sudo yum install curl git qt-webkit qt-webkit-devel qt-devel</nowiki></pre>
 
* Run the following command to get the phantomjs source code:
<pre><nowiki>git clone https://github.com/ariya/phantomjs.git</nowiki></pre>
 
* Enter the directory that was just created by using the following command:
<pre><nowiki>cd phantomjs</nowiki></pre>
 
* Build phantomjs by issuing the command:
<pre><nowiki>qmake && make -j2</nowiki></pre>
 
* Move the phantomjs binary somewhere in your path by issuing the command:
<pre><nowiki>cd bin && sudo mv ./phantomjs /usr/local/bin</nowiki></pre>
 
* Create a folder called '''gvscript''' and download the script to get the list of Google Video related pages to scrape: http://199.48.254.90/at/google_video_related.tar.gz
 
* Extract the above downloaded file (Right-click and Extract To.. or use '''tar -zxvf ./google_video_related.tar.gz''')
 
* In a terminal, navigate to the folder where you extracted the google_video_related file (above) and run the following command to help scrape Google Video:
<pre><nowiki>while : ; do ./related.sh ; done</nowiki></pre>
 
'''On Windows Systems'''
 
Grab the following archive which comes with full instructions:
http://nstrom.chaosnet.org/google_video_related_win.zip
 
Once the script's running simply leave it running and head on over to #ggtesting on EFnet (IRC) if you need any assistance or in case the script has any issues. The script will contact the server to get a page to index the related video links, do that indexing, send back the results and repeat! It takes very little processing and bandwidth on your end (a couple of kb/sec, if that).
 
=== Cherry picking ===
The seed files do currently not include all videos, so you might want to save precious videos explicitely. To do that, add IDs (docid URL parameter of the Google Video) to the "list" file in the same directory as the script, for example:
docid=1545969803753962248
docid=1598207563000425446
docid=-1679753730105404298
and start ./googlegargle
 
To request a cherrypick, add it to this list: http://piratepad.net/gvspecificrequests
 
If you download something from that list, add its docid to http://piratepad.net/TL7KDN8821 so that others won't download those videos for the second time.
 
===Custom keyword searches===
 
'''Linux Bash Command'''
 
If you want to grab videos by your own custom keyword search term, you can use this command:
<pre><nowiki>
SEARCH='my+search+term';for i in `seq 0 10 990 `;do curl -A "AT, Bitches" "http://www.google.com/search?q=$SEARCH+site:video.google.com&hl=en&safe=off&tbm=vid&start=$i&sa=N"|grep -o "docid=[0-9-]*"|sort -u|tee -a seed_videos_$SEARCH;done
</nowiki></pre>
Change "my+search+term" to your search term, and remember to use a plus sign instead of spaces (or url encode the text for other special characters).
 
'''Linux Bash Script'''
 
An alternative search script which sorts and dedupes results and can restrict searches to long, medium and short videos is [http://gv.nja.im/index.php?dir=tools here]. <-- Please evolve the script and upload to Github?
 
'''Searches Undertaken'''
 
Since we want to minimize overlap, here are some search terms that are already in progress of being downloaded (and the user who downloads them):
 
*Darkstar: "rare", "vintage", "commercial"
*NomDuClavier: "douglas adams", "richard dawkins", "charles darwin"
*oli: "australia history"
*dnova: "microelectronics"
*Lightblb: "documentary" (medium and long videos), "lecture" (medium and long videos)
Also check the specificrequest PiratePad under Cherry Picking on this page.
 
== Seed List Downloads ==
* Original Lists: http://199.48.254.90/at/seeds/
* PLEASE add your custom searches and their details to this table!
<center>
{| class="wikitable" style="text-align: center;"
! Seed list !! Videos (lines) !! Downloader !! Complete? (Size?)
|-
| [http://gv.nja.im/index.php?dir=seed_videos_ml_documentary_dedupe seed_videos_ml_documentary_dedupe] || 1975 || Lightblb || Lightblb: aa
|-
| [http://gv.nja.im/index.php?dir=seed_videos_ml_lecture_dedupe seed_videos_ml_lecture_dedupe] || 1898 || Lightblb || Lightblb: aa
|-
| seed_videos_2_a || 25,761 || swebb || 8.6G (4/17/2011)
|-
| seed_videos_2_k || 19,266 (24,242) || Lightblb, ARc[Clone, crackbab1, Pentium100, Mqrius || [http://notatypewriter.com/googlegargle Split] 49 chunks of 500<br />
Lightblb: aa ab ac ad ae<br />
crackbab1: af (Done: 11GB) working on ak, al<br />
Mqrius: Done: ag (3.9GB). Working on: ah ai aj <br />
Pentium100: az (complete, 5.15GB), ay, ax, aw, av, au, at <br />
ARc[Clone: bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw <br />
|-
| seed_videos_2_l || 22,641 || ndurner, wgfreewill || [http://gv.nja.im/index.php?dir=seed_videos_2_l Split] 46 chunks of 500<br />ndurner: 173; wgfreewill running in reverse from the bottom
|-
| seed_videos_2_m || 24,465 || Jade Falcon || Jade @ 2854/24465 ~77G and counting...(100 concurrent threads!)<br/>balrog running in reverse
|-
| seed_videos_2_o || 25,049 || travelinlibrarian || [http://gv.nja.im/index.php?dir=seed_videos_2_o Split] 51 chunks of 500<br />
travelinlibrarian 60/1-500<br />
perfinion downloading seed_videos_2_ob[n-y]
|-
| seed_videos_2_p || 23,713 || oli, Xentac, db48x, otro || [http://gv.nja.im/index.php?dir=seed_videos_2_p Split] 48 chunks of 500
oli: paa to paj<br/>
Xentac is downloading pbt, pbu, and pbv right now.
<br/>db48x: pbu (1.44GB, finished), pbv (187MB, finished), pba-pbf<br/>
otro: pbs
|-
| seed_videos_2_q || 17,727 || DoubleJ ||
|-
| seed_videos_2_t || 25,301 || businux || [http://gv.nja.im/index.php?dir=seed_videos_2_t Split] 51 chunks of 500
|-
| seed_videos_2_u || 23,528 || barbich, negge || [http://elmundo.barbich.net/gargle/ Split] 48 chunks of 500
barbich: currently processing 0 to 15<br/>
negge: getting the whole list (16 threads)
|-
| seed_videos_2_w || 21,732 || nickmoorman || [http://gv.nja.im/index.php?dir=seed_videos_2_w Split] 34 chunks of 500
|-
| seed_videos_2_x || 19,733 || ksh || 35% / 30GB
|-
| seed_videos_2_y || 20,965 || negge || 216G done (100%)
|-
| seed_videos_2_z || 18,877 || flare ||
|-
| seed_videos_a || 1000 || Dr.Sweety ||
|-
| seed_videos_a_related || This list contain errors || Dr.Sweety ||
|-
| seed_videos_b || 999 || bjwebb || 136/999
|-
| seed_videos_c || 981 || dnova || 555/981
|-
| seed_videos_d || 999 || nomduclav || 735/1000
|-
| seed_videos_e || 999 || nomduclav || 374/992
|-
| seed_videos_f || 999 || DoubleJ || Done (25GB)
|-
| seed_videos_g || 999 || dnova || 504/999
|-
| seed_videos_h || 999 || ARc[Clone || Done
|-
| seed_videos_i || 999 || DeCarabas || 683/999
|-
| seed_videos_j || 999 || joethehuman || In Progress (963/1000 35.1 GB)
|-
| seed_videos_k || 999 || aggroskater || In Progress [line 219]
|-
| seed_videos_l || 999 || yipdw || 632/999, 40 GB <br /> [http://guest:guest@googlegrape.iriscouch.com/googlegrape/_design/aggregates/_view/counts_and_sizes Status update, updated every 30 minutes or so]<br />(login as guest:guest if you get an authorization error)
|-
| seed_videos_m || 999 || TJ__ || Done (34.7GB)
|-
| seed_videos_n || 999 || ndurner || Done (38 GB)
|-
| seed_videos_o || 999 || com_lab ||
|-
| seed_videos_p || 999 || Pneu ||
|-
| seed_videos_q || 996 || nomduclavier || Done (~24Gb)
|-
| seed_videos_r || 996 || Pentium || Done (26.5GB), two bad IDs
|-
| seed_videos_s || 999 || Pentium ||
|-
| seed_videos_t || 999 || joethehuman || In Progress (673/1000 32.7 GB)
|-
| seed_videos_u || 999 || perfinion, 0xDEADBEEF, norc ||  0xDEADBEEF 158/1000. norc 500-1000 done, 24GB. Perfinion done, 44GB.
|-
| seed_videos_v || 999 || masterme1 || 162/999 (~11GB)
|-
| seed_videos_w || 1000 || com_lab || Done (~5.7GB)
|-
| seed_videos_x || 1000 || Dark-Star || Done (~33GB)
|-
| seed_videos_y || 1000 || beremat ||  361/1000, (~24.6GB)
|-
| seed_videos_z || 1000 || ksh || Done (27GB)
|-
| [http://pastebin.com/V70TScpb "microelectronics",<br />"intel",<br />"amd",<br />"electronic",<br />"computer",<br />"microprocessor",] || -- || dnova || 17/? (1.15GB)
|-
| [http://pastebin.com/ThCuzFwu "singularity"] || 174 || db48x || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/jMtaRuA2 "Feynman"] || 28 || db48x || completed, 2.20GB (grabbed 9am UTC April 18th 2011)
|-
| [http://pastebin.com/3HDwcsk5 "police"] || 998 || lutostag || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/Hy1nYdkC "eliezer"] || 1000 || norc || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/me1BGvg8 "obama"] || || || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/JQfdYaX9 "cia"] || || || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/yRQiqG4Q "charlie"] || || || (grabbed 8am UTC April 18th 2011)
|-
| [http://pastebin.com/sHQkmBuH IDs from the metafilter thread] || 28 || db48x || (grabbed 9am UTC April 18th 2011)
|-
| [http://pastebin.com/yFbtiW4b IDs from the reddit thread] || || || (grabbed 9am UTC April 18th 2011)
|-
| "rare" || || Darkstar ||
|-
| "vintage" || || Darkstar ||
|-
| "commercial" || || Darkstar ||
|-
| [http://pastebin.com/ZkzNmwEW "douglas adams",<br />"richard dawkins",<br />"charles darwin" || || NomDuClavier] || 513 videos, done ([http://pastebin.com/ZkzNmwEW one de-duped list] for the 3 terms)
|-
| "australia history" || 846 || oli || Done
|-
| "Bugs Bunny" || 153 || ||
|-
| [http://pastebin.com/jMtaRuA2 "rodney mullen"] || 176 || ||
|-
| [http://pastebin.com/iHvuYDLt "rick astley"] || 17 || db48x || (grabbed 13:00 UTC April 18th 2011)
|-
| '''Total''' || '''324,699''' || '''Archive Team''' || '''''In progress (417.1 GB and counting)'''''
|}
</center>
 
== Broken DocIDs ==
{| class="wikitable"
! DocID !! Title !! list
|-
| -4313176927520589553 || [http://video.google.com/videoplay?docid=-4313176927520589553 Ferrari 320 km/h SelMcKenzie] || seed_videos_h
|-
| 710915802292429594 || [http://video.google.com/videoplay?docid=710915802292429594# Triple H-Best Pedigree Ever] || seed_videos_h
|-
| 919675995190477263 || 404s || seed_videos_h
|-
| -7433458566080701467 || 404s || seed_videos_2_k
|-
| 7476314005948269525 || [http://video.google.com/videoplay?docid=7476314005948269525# Tan Tay Du Ky 2 tap 1 phan 2] || seed_videos_2_k
|-
| 1310034078921227326 || [http://video.google.com/videoplay?docid=1310034078921227326 Presentatie H. van Garderen] || seed_videos_h
|-
| -8196546459051063200 || [http://video.google.com/videoplay?docid=-8196546459051063200 Ethiopia - Ethiopian Talk Show - Dr. Kinfe M Kassaye] || seed_videos_m
|-
| 6012309833489564165 || [http://video.google.com/videoplay?docid=6012309833489564165 I&#39;m gonna miss you forever] || seed_videos_m
|-
| 1006201176909432045 || [http://video.google.com/videoplay?docid=1006201176909432045 Nick "KNUCKLEHEAD" Thomas Learning to Ride A KX 65] || seed_videos_2_k_br
|-
| 9013618753646293166 || [http://video.google.com/videoplay?docid=9013618753646293166 TooSexii] || seed_videos_m
|-
| 4607644763702261746 || [http://video.google.com/videoplay?docid=4607644763702261746 Most Haunted] || seed_videos_m
|-
| 910327017359455024 || 404s || seed_videos_2_k_br
|-
| -3505183273546479430 || [http://video.google.com/videoplay?docid=-3505183273546479430# Top 10 Dunkers in Slam Dunk Contest History by www.todonba.mx.kz] || seed_videos_2_k_bu
|-
| 515155312540224448 || [http://video.google.com/videoplay?docid=515155312540224448 Prof. Stephen Berk - The Six Day War] -- (Only downloads 106MB & manual seek fails) || seed_videos_m
|-
| 8233620694803027158 || [http://video.google.com/videoplay?docid=8233620694803027158 Tien Kiem Ky Hiep 12a] || seed_videos_2_k_bs
|-
| -7026671761719496982 || [http://video.google.com/videoplay?docid=-7026671761719496982# KV Kortrijk - Virton: kans Vervaeke] || seed_videos_2_k_bo
|-
| 4744936758707683681 || 404s || seed_videos_2_k_bo
|-
| -4138015874145288917 || [http://video.google.com/videoplay?docid=-4138015874145288917# Irvine City Council Regular Meeting] -- content too short (expected 880173643 bytes and served 871) || seed_videos_2_k_bo
|-
| 1751753922865083288 || [http://video.google.com/videoplay?docid=1751753922865083288# Lou Dobbs - Bill Gates Testifies to Senate: Part 2] || seed_videos_h
|-
| -1847242336625060764 || 404s || seed_videos_h
|-
| -840074924615574683 || [http://video.google.com/videoplay?docid=-840074924615574683# H.O.T. TV EPISODE 7] || seed_videos_h
|-
| 5450039563312738134 || || seed_videos_2_o
|-
| 2740779495236816438 || || seed_videos_2_o
 
|}
 
== Tools ==
 
=== Youtube-DL ===
* http://rg3.github.com/youtube-dl/download.html
** python youtube-dl googlevideourl
 
=== DocID scripts ===
* http://piratepad.net/googlevideoscript
 
=== GoogleGargle ===
* http://www.textfiles.com/googlegargle
 
=== Aria2c (APT) ===
* apt-add-repository ppa:t-tujikawa/ppa
* apt-get update
* apt-get install aria2c
** http://aria2.sourceforge.net/
=== Aria2c (RPM) ===
Fedora and CentOS have RPMs available.
* yum install aria2
 
== Troubleshooting ==
* /usr/bin/aria2c: unrecognized option '--max-connection-per-server=16'
** The Aria version available in many linux distributions is not up to date and will throw errors.
** To fix this remove the option from the goooglegargle script line starting with "ARIAOPTIONS="
 
* User 'negge' on IRC reports the following ARIA command line works for Debian Squeeze
**--max-overall-download-limit=1024M --file-allocation=falloc --max-connection-per-server=4 --min-split-size=1M --log-level=notice --remote-time=true


== FAQ ==
== FAQ ==
Line 353: Line 39:


* What happens to the data after you claim a seed on the wiki and download it?
* What happens to the data after you claim a seed on the wiki and download it?
** We've got 100TB of space allocated to us on archive.org, and can get more
** We've got 140TB of space allocated to us on archive.org, and can get more


* Is there already some space where it can be uploaded to?
* Is there already some space where it can be uploaded to?
Line 366: Line 52:
* How can I remove duplicates from a seed file before I start to use it?
* How can I remove duplicates from a seed file before I start to use it?
** On *nix machines use: '''sort [infile] | uniq -u > [outfile]''' to produce a new seed file with duplicates removed.
** On *nix machines use: '''sort [infile] | uniq -u > [outfile]''' to produce a new seed file with duplicates removed.
* If I wanted to run more than one listerine process, do I just make multiple clones? Do I need a different username for each?
** Only if you need to be able to differentiate later on, like we'll say, we need video 123 from "xentac3"


== Announcement: Uploaded video content no longer available  ==
== Announcement: Uploaded video content no longer available  ==
Line 378: Line 67:


== External links ==
== External links ==
* [http://video.google.com Google Video]
* {{url|1=http://video.google.com|2=Google Video}}
* [http://www.deaddyingdamned.com/assets/Google_Video_Shutdown_Email.html Announcement email]
* {{url|1=http://www.deaddyingdamned.com/assets/Google_Video_Shutdown_Email.html|2=Announcement email}}
* [http://video.google.com/support/bin/answer.py?answer=1233300&hl=en Announcement on Google Video Help]
* {{url|1=http://video.google.com/support/bin/answer.py?answer=1233300&hl=en|2=Announcement on Google Video Help}}


{{Navigation box}}
{{Navigation box}}

Latest revision as of 23:36, 22 November 2021

See also Google Video Warroom.
Google Video
Google Video logo
Google Video logo
URL http://video.google.com
Status Offline on 2011-04-29[1]
Archiving status Saved!
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
(formerly #googlegrape (on EFnet))
Data? google-video-metadata-dumpage
googlevideo2011 (access restricted)
Google Video results for "Papua New Guinea" keyword.

Google Video is a video sharing website which is shutting down.

If you want to save your own videos, see the announcement and tools below.

If you want to help archive Google Video, get some machines running and join us in IRC.

Joining the archival effort

The automatic scripts only work on FreeBSD, Linux, Solaris, Windows and maybe OS X. They also seem to work fine in Cygwin. Alternatively, you can run *nix in a virtual machine (given you have a fast enough machine).

Anyone can help out, but we would *really* appreciate it if you'd use an *NIX system over any thoughts of doing it on a Windows system. If you however choose to pursue the Magical World of Windows - please make sure that what you are collecting is not damaged as a consequence of running it on a Windows system.

In any case, the first thing to do is to please add your name/nickname to this list, along with the storage and bandwidth you have available.

What can I do?

The two main tasks are: indexing and downloading. The easiest and least taxing is indexing (see Google Video Warroom#Indexing Videos To Identify Related Videos). If you have some extra bandwidth and space think about running Listerine to download videos. Both of these tasks are automated and can be left running in the background. It is often good practice to start a few process of each at once.

FAQ

  • Is there any estimate on how many videos are on Google Video?
    • Wikipedia said it has 2,500,000 videos, a semi-official Google blog mentioned 2.8M
  • Is there anything about grabbing metadata for vids? like descriptions?
    • Googlegrape does that, it saves the html of the video download page
  • What happens to the data after you claim a seed on the wiki and download it?
    • We've got 140TB of space allocated to us on archive.org, and can get more
  • Is there already some space where it can be uploaded to?
    • Not yet, the effort is still young and things take time to organize.
  • How can I split seed files if I want to download fewer videos or share the task with others?
    • On *nix machines use: split --lines=500 [seedfile] [seedfile] to create a set of files each 500 lines in length in the form seedfileaa seedfileab ... etc.
  • How can I check if there are duplicates in a seed file?
    • On *nix machines use: sort [infile] | uniq -d to show all duplicates.
  • How can I remove duplicates from a seed file before I start to use it?
    • On *nix machines use: sort [infile] | uniq -u > [outfile] to produce a new seed file with duplicates removed.
  • If I wanted to run more than one listerine process, do I just make multiple clones? Do I need a different username for each?
    • Only if you need to be able to differentiate later on, like we'll say, we need video 123 from "xentac3"

Announcement: Uploaded video content no longer available

On April 29, 2011 videos that have been uploaded to Google Video will no longer be available for playback. We’ve added a Download button to the Video Status page, so you can download videos that you want to save. If you don’t want to download your videos, you don’t need to do anything. (The Download feature will be disabled after May 13, 2011.)

How do I download videos that I've uploaded?

On the Video Status page, click Download Video located on the right side of each of your videos in the "Actions" column.Once a video has been downloaded, an "Already Downloaded" message will appear. If you have many videos on Google Video, you may need to use the paging controls located on the bottom right of the page to access them all. This download option will be available through May 13, 2011.

I've downloaded my videos. Now what do I do with these FLV files?

FLV files are videos that have been encoded in the Flash Video Format. You can upload your videos in FLV format to other video hosting sites like YouTube or Picassa Web Albums. If you would like to playback your videos on your computer and they don’t seem to be working, you might need to install an FLV player. In order to find an FLV player to install, try doing a Google search for [ FLV player ].

External links


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · Discourse · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · Heroes of Newerth · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · VK · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ