Google Video (Archive)

From Archiveteam
Jump to navigation Jump to search
Google Video
Video logo lg.gif
Status Closing in 2011-04-29[1]
Archiving status In progress...
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

Google Video is a video sharing website which is shutting down.

If you want to save your own videos, see the announcement and tools below.

If you want to help archive Google Video, get some machines running and join us in IRC (EFNet #archiveteam / #googlegrape)

Joining the archival effort

The automatic scripts only work on FreeBSD, Linux, Windows and maybe OS X. They also seem to work fine in Cygwin. Alternatively, you can run *nix in a virtual machine (given you have a fast enough machine).

To help scrape videos

First of all, please add your name/nickname to this list, along with the storage and bandwidth you have available.

On Linux Systems

  • Download youtube-dl or from your distribution.
    • Make sure it's marked executable: chmod +x youtube-dl
  • Download and install wget for your distribution
  • Download googlegargle (Norc's updated, dupe-safe version of googlegargle is here.)
  • Get aria2 from your distribution (or if you're on Mac OS X, MacPorts) or SourceForge
  • Pick a seed list from below, save it under the filename "list" and add your name to the list (you will need a wiki account)
  • Change the first few lines of the googlegargle script to reflect your installation
    • If you're using youtube-dl from your distro, run "which youtube-dl" or "sudo updatedb; locate youtube-dl" to find the location of the command. Change DLSCRIPT to this.
  • For older aria versions, some options need to be removed (--max-connection-per-server=16 --min-split-size=1M)
    • You might need to upgrade your version from your system package manager, however the most recent version still may not suffice.
  • Change the ARIA variable in the script to the location of your ARIA executable. Usually (ubuntu) at /usr/bin/aria2c, change ARIA variable to this.
    • To know where aria2 is located you can use either of these commands:
      • "sudo updatedb; locate aria2"
      • "which aria2" / "which aria2c"
  • Invoke googlegargle
  • Check with your OS settings to insure that your computer will not auto suspend or sleep after long periods of inactivity.

On Windows Systems

Don't forget to join the IRC channel to coordinate who's getting what!

To help index videos (low bandwidth/storage)

On Linux Systems

Note: This will only work on Linux machines with X running – you can’t run it on headless servers due to phantomjs requirements.

  • Get the tools needed to build phantomjs (a headless web browser) and run the script: Qt WebKit, git, and curl. On Debian or Ubuntu, install the packages build-essential, curl, git, libqtwebkit4, libqtwebkit-dev, and libqt4-dev by issuing the command:
sudo apt-get install build-essential curl git libqtwebkit4 libqtwebkit-dev libqt4-dev

or, on Fedora:

sudo yum install curl git qt-webkit qt-webkit-devel qt-devel
  • Run the following command to get the phantomjs source code:
git clone
  • Enter the directory that was just created by using the following command:
cd phantomjs
  • Build phantomjs by issuing the command:
qmake && make -j2
  • Move the phantomjs binary somewhere in your path by issuing the command:
cd bin && sudo mv ./phantomjs /usr/local/bin
  • Extract the above downloaded file (Right-click and Extract To.. or use tar -zxvf ./google_video_related.tar.gz)
  • In a terminal, navigate to the folder where you extracted the google_video_related file (above) and run the following command to help scrape Google Video:
while : ; do ./ ; done

On Windows Systems

Grab the following archive which comes with full instructions:

Once the script's running simply leave it running and head on over to #ggtesting on EFnet (IRC) if you need any assistance or in case the script has any issues. The script will contact the server to get a page to index the related video links, do that indexing, send back the results and repeat! It takes very little processing and bandwidth on your end (a couple of kb/sec, if that).

Cherry picking

The seed files do currently not include all videos, so you might want to save precious videos explicitely. To do that, add IDs (docid URL parameter of the Google Video) to the "list" file in the same directory as the script, for example:


and start ./googlegargle

To request a cherrypick, add it to this list:

If you download something from that list, add its docid to so that others won't download those videos for the second time.

Custom keyword searches

If you want to grab videos by your own custom keyword search term, you can use this command:

SEARCH='my+search+term';for i in `seq 0 10 990 `;do curl -A "AT, Bitches" "$$i&sa=N"|grep -o "docid=[0-9-]*"|tee -a seed_videos_$SEARCH;done

Change "my+search+term" to your search term, and remember to use a plus sign instead of spaces (or url encode the text for other special characters).

Since we want to minimize overlap, here are some search terms that are already in progress of being downloaded (and the user who downloads them):

  • Darkstar: "rare", "vintage", "commercial"
  • NomDuClavier: "douglas adams", "richard dawkins", "charles darwin"
  • oli: "australia history"
  • dnova: "microelectronics"
  • Lightblb: "documentary"

Also check the specificrequest PiratePad under Cherry Picking on this page.

Seed List Downloads

Seed list Videos (lines) Downloader Complete? (Size?)
seed_videos_2_a 25,761 swebb 8.6G (4/17/2011)
seed_videos_2_k 19,266 (24,242) Lightblb, ARc[Clone, crackbab1, Pentium100, Mqrius Split 49 chunks of 500

Lightblb: aa ab ac ad ae
crackbab1: af
Mqrius: ag
Pentium100: az
ARc[Clone: bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw

seed_videos_2_l 22,641 ndurner, wgfreewill Split 46 chunks of 500
ndurner: 173; wgfreewill running in reverse from the bottom
seed_videos_2_m 24,465 Jade Falcon Jade @ 2854/24465 ~77G and counting...(100 concurrent threads!)
balrog running in reverse
seed_videos_2_o 25,049 travelinlibrarian Split 51 chunks of 500

travelinlibrarian 17/1-500
perfinion downloading seed_videos_2_ob[n-y]

seed_videos_2_p 23,713 oli, Xentac, db48x, otro Split 48 chunks of 500

oli: paa to paj
Xentac is downloading pbt, pbu, and pbv right now.
db48x: pbu, pbv, pba-pbf
otro: pbs

seed_videos_2_q 17,727 DoubleJ
seed_videos_2_t 25,301 businux Split 51 chunks of 500
seed_videos_2_u 23,528 barbich Split 48 chunks of 500

barbich: currently processing 0 to 11

seed_videos_2_w 21,732 nickmoorman
seed_videos_2_x 19,733 ksh 2016/19733
seed_videos_2_y 20,965 negge 188G done
seed_videos_2_z 18,877 flare
seed_videos_a 1000 Dr.Sweety
seed_videos_a_related This list contain errors Dr.Sweety
seed_videos_b 999 bjwebb
seed_videos_c 981 dnova 555/981
seed_videos_d 999 nomduclav 266/1000
seed_videos_e 999 nomduclav 153/992
seed_videos_f 999 DoubleJ Done (25GB)
seed_videos_g 999 dnova 504/999
seed_videos_h 999 ARc[Clone Done
seed_videos_i 999 DeCarabas 683/999
seed_videos_j 999 joethehuman In Progress (776/1000 26.3 GB)
seed_videos_k 999 aggroskater In Progress [line 219]
seed_videos_l 999 yipdw 632/999, 40 GB
Status update, updated every 30 minutes or so
(login as guest:guest if you get an authorization error)
seed_videos_m 999 TJ__ Done (34.7GB)
seed_videos_n 999 ndurner Done (38 GB)
seed_videos_o 999 com_lab
seed_videos_p 999 Pneu
seed_videos_q 996 nomduclavier Done (~24Gb)
seed_videos_r 996 Pentium
seed_videos_s 999 Pentium
seed_videos_t 999 joethehuman In Progress (573/1000 24.0 GB)
seed_videos_u 999 perfinion, 0xDEADBEEF, norc 158/1000, norc running in reverse: 59/1000. Perfinion done. 44GB
seed_videos_v 999 masterme1 162/999 (~11GB)
seed_videos_w 1000 com_lab Done (~5.7GB)
seed_videos_x 1000 Dark-Star Done (~33GB)
seed_videos_y 1000 beremat 361/1000, (~24.6GB)
seed_videos_z 1000 ksh Done (27GB)
"microelectronics" 703 dnova
"singularity" 174 db48x (grabbed 8am UTC April 18th 2011)
"Feynman" 28 db48x (grabbed 9am UTC April 18th 2011)
"police" (grabbed 8am UTC April 18th 2011)
"eliezer" (grabbed 8am UTC April 18th 2011)
"obama" (grabbed 8am UTC April 18th 2011)
"cia" (grabbed 8am UTC April 18th 2011)
"charlie" (grabbed 8am UTC April 18th 2011)
IDs from the metafilter thread (grabbed 9am UTC April 18th 2011)
IDs from the reddit thread (grabbed 9am UTC April 18th 2011)
"rare" Darkstar
"vintage" Darkstar
"commercial" Darkstar
"douglas adams" NomDuClavier
"richard dawkins" NomDuClavier
"charles darwin" NomDuClavier
"australia history" oli
"Bugs Bunny" 153
Total 324,699 Archive Team In progress (402.6 GB and counting)

Broken DocIDs

DocID Title list
-4313176927520589553 Ferrari 320 km/h SelMcKenzie seed_videos_h
710915802292429594 Triple H-Best Pedigree Ever seed_videos_h
919675995190477263 404s seed_videos_h
-7433458566080701467 404s seed_videos_2_k
7476314005948269525 Tan Tay Du Ky 2 tap 1 phan 2 seed_videos_2_k
1310034078921227326 Presentatie H. van Garderen seed_videos_h
-8196546459051063200 Ethiopia - Ethiopian Talk Show - Dr. Kinfe M Kassaye seed_videos_m
6012309833489564165 I'm gonna miss you forever seed_videos_m
1006201176909432045 Nick "KNUCKLEHEAD" Thomas Learning to Ride A KX 65 seed_videos_2_k_br
9013618753646293166 TooSexii seed_videos_m
4607644763702261746 Most Haunted seed_videos_m
910327017359455024 404s seed_videos_2_k_br
-3505183273546479430 Top 10 Dunkers in Slam Dunk Contest History by seed_videos_2_k_bu
515155312540224448 Prof. Stephen Berk - The Six Day War -- (Only downloads 106MB & manual seek fails) seed_videos_m
8233620694803027158 Tien Kiem Ky Hiep 12a seed_videos_2_k_bs
-7026671761719496982 KV Kortrijk - Virton: kans Vervaeke seed_videos_2_k_bo
4744936758707683681 404s seed_videos_2_k_bo
-4138015874145288917 Irvine City Council Regular Meeting -- content too short (expected 880173643 bytes and served 871) seed_videos_2_k_bo
1751753922865083288 Lou Dobbs - Bill Gates Testifies to Senate: Part 2 seed_videos_h
-1847242336625060764 404s seed_videos_h
-840074924615574683 H.O.T. TV EPISODE 7 seed_videos_h



DocID scripts


Aria2c (APT)

Aria2c (RPM)

Fedora and CentOS have RPMs available.

  • yum install aria2


  • /usr/bin/aria2c: unrecognized option '--max-connection-per-server=16'
    • The Aria version available in many linux distributions is not up to date and will throw errors.
    • To fix this remove the option from the goooglegargle script line starting with "ARIAOPTIONS="
  • User 'negge' on IRC reports the following ARIA command line works for Debian Squeeze
    • --max-overall-download-limit=1024M --file-allocation=falloc --max-connection-per-server=4 --min-split-size=1M --log-level=notice --remote-time=true


  • Is there any estimate on how many videos are on Google Video?
    • Wikipedia said it has 2,500,000 videos, a semi-official Google blog mentioned 2.8M
  • Is there anything about grabbing metadata for vids? like descriptions?
    • Googlegrape does that, it saves the html of the video download page
  • What happens to the data after you claim a seed on the wiki and download it?
    • We've got 100TB of space allocated to us on, and can get more
  • Is there already some space where it can be uploaded to?
    • Not yet, the effort is still young and things take time to organize.
  • How can I split seed files if I want to download fewer videos or share the task with others?
    • On *nix machines use: split --lines=500 [seedfile] [seedfile] to create a set of files each 500 lines in length in the form seedfileaa seedfileab ... etc.
  • How can I check if there are duplicates in a seed file?
    • On *nix machines use: sort [infile] | uniq -d to show all duplicates.
  • How can I remove duplicates from a seed file before I start to use it?
    • On *nix machines use: sort [infile] | uniq -u > [outfile] to produce a new seed file with duplicates removed.

Announcement: Uploaded video content no longer available

On April 29, 2011 videos that have been uploaded to Google Video will no longer be available for playback. We’ve added a Download button to the Video Status page, so you can download videos that you want to save. If you don’t want to download your videos, you don’t need to do anything. (The Download feature will be disabled after May 13, 2011.)

How do I download videos that I've uploaded?

On the Video Status page, click Download Video located on the right side of each of your videos in the "Actions" column.Once a video has been downloaded, an "Already Downloaded" message will appear. If you have many videos on Google Video, you may need to use the paging controls located on the bottom right of the page to access them all. This download option will be available through May 13, 2011.

I've downloaded my videos. Now what do I do with these FLV files?

FLV files are videos that have been encoded in the Flash Video Format. You can upload your videos in FLV format to other video hosting sites like YouTube or Picassa Web Albums. If you would like to playback your videos on your computer and they don’t seem to be working, you might need to install an FLV player. In order to find an FLV player to install, try doing a Google search for [ FLV player ].

External links