Difference between revisions of "Google Video (Archive)"
Masterme120 (talk | contribs) |
|||
Line 106: | Line 106: | ||
ARc[Clone: bu bv bw | ARc[Clone: bu bv bw | ||
|- | |- | ||
| seed_videos_2_l || 22,641 || ndurner || 71 | | seed_videos_2_l || 22,641 || ndurner || [http://gv.nja.im/index.php?dir=seed_videos_2_l Split] 46 chunks of 500<br />71 | ||
|- | |- | ||
| seed_videos_2_m || 24,465 || Jade Falcon || balrog running in reverse | | seed_videos_2_m || 24,465 || Jade Falcon || balrog running in reverse |
Revision as of 01:42, 18 April 2011
Google Video | |
![]() | |
URL | http://video.google.com |
Status | Closing in 2011-04-29[1] |
Archiving status | In progress... |
Archiving type | Unknown |
IRC channel | #archiveteam-bs (on hackint) |
Google Video is a video sharing website which is shutting down.
If you want to save your own videos, see the announcement and tools below.
If you want to help archive Google Video, get some Linux machines running and join us in IRC (EFNet #archiveteam / #googlegrape)
Joining the archival effort
The automatic scripts only work on FreeBSD, Linux, and maybe OS X. They also seem to work fine in Cygwin. Alternatively, you can run *nix in a virtual machine (given you have a fast enough machine).
To help scrape videos
First of all, please add your name/nickname to this list, along with the storage and bandwidth you have available.
- Download youtube-dl or from your distribution.
- Make sure it's marked executable: chmod +x youtube-dl
- Download and install wget for your distribution
- Download googlegargle (Norc's updated, dupe-safe version of googlegargle is here.)
- Get aria2 from your distribution (or if you're on Mac OS X, MacPorts) or SourceForge
- Pick a seed list from below, save it under the filename "list" and add your name to the list (you will need a wiki account)
- Change the first few lines of the googlegargle script to reflect your installation
- If you're using youtube-dl from your distro, run "which youtube-dl" or "sudo updatedb; locate youtube-dl" to find the location of the command. Change DLSCRIPT to this.
- For older aria versions, some options need to be removed (--max-connection-per-server=16 --min-split-size=1M)
- You might need to upgrade your version from your system package manager, however the most recent version still may not suffice.
- Change the ARIA variable in the script to the location of your ARIA executable. Usually (ubuntu) at /usr/bin/aria2c, change ARIA variable to this.
- To know where aria2 is located you can use either of these commands:
- "sudo updatedb; locate aria2"
- "which aria2" / "which aria2c"
- To know where aria2 is located you can use either of these commands:
- Invoke googlegargle
- Check with your OS settings to insure that your computer will not auto suspend or sleep after long periods of time.
Join the IRC channel to coordinate!
To help index videos (low bandwidth/storage)
Note: This will only work on Linux machines with X running – you can’t run it on headless servers due to phantomjs requirements.
- Get and build phantomjs (a headless web browser) by install build-essential, git, libqtwebkit4, libqtwebkit-dev, and libqt4-dev if necessary by issuing the command:
sudo apt-get install build-essential git libqtwebkit4 libqtwebkit-dev libqt4-dev
- Run the following command to get the phantomjs source code:
git clone https://github.com/ariya/phantomjs.git
- Enter the directory that was just created by using the following command:
cd phantomjs
- Build phantomjs by issuing the command:
qmake && make
- Move the phantomjs binary somewhere in your path by issuing the command:
cd bin && sudo mv ./phantomjs /usr/local/bin
- Create a folder called gvscript and download the script to get the list of Google Video related pages to scrape: http://199.48.254.90/at/google_video_related.tar.gz
- Extract the above downloaded file (Right-click and Extract To.. or use tar -zxvf ./google_video_related.tar.gz)
- In a terminal, navigate to the folder where you extracted the google_video_related file (above) and run the following command to help scrape Google Video:
while : ; do ./related.sh ; done
That's it! Simply leave the script running, and head on over to #ggtesting on EFnet (IRC) if you need any assistance or in case the script has any issues. The script will contact the server to get a page to index the related video links, do that indexing, send back the results and repeat! It takes very little processing and bandwidth on your end (a couple of kb/sec, if that).
Cherry picking
The seed files do currently not include all videos, so you might want to save precious videos explicitely. To do that, add IDs (docid URL parameter of the Google Video) to the "list" file in the same directory as the script, for example:
docid=1545969803753962248 docid=1598207563000425446 docid=-1679753730105404298
and start ./googlegargle
To request a cherrypick, add it to this list: http://piratepad.net/gvspecificrequests
If you download something from that list, add its docid to http://piratepad.net/TL7KDN8821 so that others won't download those videos for the second time.
Custom keyword searches
If you want to grab videos by your own custom keyword search term, you can use this command:
SEARCH='my+search+term';for i in `seq 0 10 990 `;do curl -A "AT, Bitches" "http://www.google.com/search?q=$SEARCH+site:video.google.com&hl=en&safe=off&tbm=vid&start=$i&sa=N"|grep -o "docid=[0-9-]*"|tee -a seed_videos_$SEARCH;done
Change "my+search+term" to your search term, and remember to use a plus sign instead of spaces (or url encode the text for other special characters).
Since we want to minimize overlap, here are some search terms that are already in progress of being downloaded (and the user who downloads them):
- Darkstar: "rare", "vintage", "commercial"
- NomDuClavier: "douglas adams", "richard dawkins", "charles darwin"
- oli: "australia history"
- dnova: "microelectronics"
Seed List Downloads
Seed list | Videos (lines) | Downloader | Complete? (Size?) |
---|---|---|---|
seed_videos_2_a | 25,761 | swebb | 8.6G (4/17/2011) |
seed_videos_2_k | 24,242 | Lightblb, ARc[Clone, crackbab1 | Split 49 chunks of 500 Lightblb: aa ab ac ad |
seed_videos_2_l | 22,641 | ndurner | Split 46 chunks of 500 71 |
seed_videos_2_m | 24,465 | Jade Falcon | balrog running in reverse |
seed_videos_2_o | 25,049 | travelinlibrarian | Split 51 chunks of 500 travelinlibrarian 17/1-500 |
seed_videos_2_p | 23,713 | oli | Split 48 chunks of 500 4% I'll need help. Will figure out how soon... |
seed_videos_2_q | 17,727 | DoubleJ | |
seed_videos_2_t | 25,301 | Split 51 chunks of 500 | |
seed_videos_2_u | 23,528 | barbich | |
seed_videos_2_w | 21,732 | nickmoorman | |
seed_videos_2_x | 19,733 | ksh | 2016/19733 |
seed_videos_2_y | 20,965 | negge | 64G done |
seed_videos_2_z | 18,877 | flare | |
seed_videos_a | 1000 | Dr.Sweety | |
seed_videos_a_related | This list contain errors | Dr.Sweety | |
seed_videos_b | 999 | bjwebb | |
seed_videos_c | 981 | dnova | 248/981 |
seed_videos_d | 999 | nomduclav | 266/1000 |
seed_videos_e | 999 | nomduclav | 153/992 |
seed_videos_f | 999 | DoubleJ | Done (25GB) |
seed_videos_g | 999 | dnova | 128/999 |
seed_videos_h | 999 | ARc[Clone | In Progress (have 526/999) |
seed_videos_i | 999 | DeCarabas | 466/999 |
seed_videos_j | 999 | joethehuman | In Progress (776/1000 26.3 GB) |
seed_videos_k | 999 | aggroskater | In Progress [line 219] |
seed_videos_l | 999 | yipdw | 264/999 (20 GB) |
seed_videos_m | 999 | TJ__ | In Progress [line 887] |
seed_videos_n | 999 | ndurner | Done (38 GB) |
seed_videos_o | 999 | com_lab | |
seed_videos_p | 999 | Pneu | |
seed_videos_q | 996 | nomduclavier | Done (~24Gb) |
seed_videos_r | 996 | Pentium | |
seed_videos_s | 999 | Pentium | |
seed_videos_t | 999 | joethehuman | In Progress (573/1000 24.0 GB) |
seed_videos_u | 999 | perfinion, 0xDEADBEEF, norc | 158/1000, norc running in reverse: 59/1000 |
seed_videos_v | 999 | masterme1 | 162/999 (~11GB) |
seed_videos_w | 1000 | com_lab | Done (~5.7GB) |
seed_videos_x | 1000 | Dark-Star | Done (~33GB) |
seed_videos_y | 1000 | beremat | 233/1000, (~19GB) |
seed_videos_z | 1000 | ksh | Done (27GB) |
Total | 323,996 | Archive Team | In progress... |
Broken DocIDs
DocID | Title | list |
---|---|---|
-4313176927520589553 | Ferrari 320 km/h SelMcKenzie | seed_videos_h |
710915802292429594 | Triple H-Best Pedigree Ever | seed_videos_h |
919675995190477263 | 404s | seed_videos_h |
-7433458566080701467 | 404s | seed_videos_2_k |
7476314005948269525 | Tan Tay Du Ky 2 tap 1 phan 2 | seed_videos_2_k |
Tools
Youtube-DL
- http://rg3.github.com/youtube-dl/download.html
- python youtube-dl googlevideourl
DocID scripts
GoogleGargle
Aria2c (APT)
- apt-add-repository ppa:t-tujikawa/ppa
- apt-get update
- apt-get install aria2c
Aria2c (RPM)
Fedora and CentOS have RPMs available.
- yum install aria2
Troubleshooting
- /usr/bin/aria2c: unrecognized option '--max-connection-per-server=16'
- The Aria version available in many linux distributions is not up to date and will throw errors.
- To fix this remove the option from the goooglegargle script line starting with "ARIAOPTIONS="
- User 'negge' on IRC reports the following ARIA command line works for Debian Squeeze
- --max-overall-download-limit=1024M --file-allocation=falloc --max-connection-per-server=4 --min-split-size=1M --log-level=notice --remote-time=true
FAQ
- Is there any estimate on how many videos are on Google Video?
- Wikipedia said it has 2,500,000 videos, a semi-official Google blog mentioned 2.8M
- Is there anything about grabbing metadata for vids? like descriptions?
- Googlegrape does that, it saves the html of the video download page
- What happens to the data after you claim a seed on the wiki and download it?
- We've got 100TB of space allocated to us on archive.org, and can get more
- Is there already some space where it can be uploaded to?
- Not yet, the effort is still young and things take time to organize.
- How can I split seed files if I want to download fewer videos or share the task with others?
- On *nix machines use: split --lines=500 [seedfile] [seedfile] to create a set of files each 500 lines in length in the form seedfileaa seedfileab ... etc.
- How can I check if there are duplicates in a seed file?
- On *nix machines use: sort [infile] | uniq -d to show all duplicates.
- How can I remove duplicates from a seed file before I start to use it?
- On *nix machines use: sort [infile] | uniq -u > [outfile] to produce a new seed file with duplicates removed.
Announcement: Uploaded video content no longer available
On April 29, 2011 videos that have been uploaded to Google Video will no longer be available for playback. We’ve added a Download button to the Video Status page, so you can download videos that you want to save. If you don’t want to download your videos, you don’t need to do anything. (The Download feature will be disabled after May 13, 2011.)
How do I download videos that I've uploaded?
On the Video Status page, click Download Video located on the right side of each of your videos in the "Actions" column.Once a video has been downloaded, an "Already Downloaded" message will appear. If you have many videos on Google Video, you may need to use the paging controls located on the bottom right of the page to access them all. This download option will be available through May 13, 2011.
I've downloaded my videos. Now what do I do with these FLV files?
FLV files are videos that have been encoded in the Flash Video Format. You can upload your videos in FLV format to other video hosting sites like YouTube or Picassa Web Albums. If you would like to playback your videos on your computer and they don’t seem to be working, you might need to install an FLV player. In order to find an FLV player to install, try doing a Google search for [ FLV player ].