YouTube

From Archiveteam
Jump to navigation Jump to search
YouTube
YouTube logo
YouTube2018.png
URL http://youtube.com[IAWcite.todayMemWeb]
Status Online! but possibly Endangered, see Vital signs
Archiving status Not saved yet
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

YouTube is a video sharing website currently owned by Google. YouTube is currently the most popular video hosting website on the planet.

Archiving tools

Several free FLV downloaders and video-to-URL converters exist on the web. AT rescue projects usually use youtube-dl.
YouTube annotations (speech bubbles and notes) are available as XML

http://www.youtube.com/api/reviews/y/read2?feat=TCS&video_id=

To transform this XML to SRT, use ann2srt

(Automatic) tubeup.py - Youtube Video IA Archiver

Note: When uploading to the Internet Archive, please avoid exposing the site to legal risk by adhering to their terms of service for blatantly copyrighted content. Unfortunately, they are subject to similar threats of DMCA takedowns as YouTube, so do use discretion.
Note: Be very careful dumping channels over 100 videos with this script. Let an admin know what you're doing, dump 50 videos, and have a collection created. Work is being started on adding a flag to specify a collection name instead of "Community Video" which is what it defaults to. Always try to create an item. For the time being the script will have to be hand edited to specify a different collection.

tubeup.py is an automated archival script that uses youtube-dl to download a Youtube video (or any other provider supported by youtube-dl), and then uploads it with all metadata to the Internet Archive.

This way, all metadata from the video, such as title, tags, categories, and description, are preserved in the corresponding Internet Archive item, without having to manually enter it.

It also creates a standardized Internet Archive item name format that makes it easy to find the video using the Youtube ID, and reduces duplication: https://archive.org/details/youtube-v9sGhNoSG3o

Youtube-dl also works with many other video sites.

(Manual) Recommended way to archive Youtube videos

First, download the video/playlist/channel/user using youtube-dl:

youtube-dl --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio URL

This can be simplified by running the script by emijrp and others, which also handles upload.

You need a recent (2014) ffmpeg or avconv for the bestvideo+bestaudio muxing to work. On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux.

Also, make sure you're using the most recent version of youtube-dl. Previous versions didn't work if the highest quality video+audio was webm+m4a. New versions should automagically merge incompatible formats into a .mkv file.[1]

Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 and/or .mkv files), but also the metadata files created along with it (.info.json, .jpg, .annotations.xml and .description).

kyan likes this method:

Youtube sucker (look out it leaves some incompletes in the directory afterward. Can clean up w/ rm -v ./*.mp4 ./*.webm then ls | grep \.part$ and get the video IDs out of that and redownload them and repeat etc etc). Can upload the WARCs only e.g. using ia (Python Internet Archive client) or warcdealer (automated uploader I hacked together) — or if you want, can upload the other stuff too, but that's kind of wasteful of storage space. In my opinion, getting stuff without a WARC is a great crime, given the ready availability of tools to create WARCs. Note that this method also works for other Web sites supported by youtube-dl too, although it maybe would need different cleanup commands afterward. Depends on youtube-dl and warcprox running on localhost:8000.

youtube-dl --continue --retries 100 --write-info-json --write-description --write-thumbnail --proxy="localhost:8000" --write-annotations --all-subs --no-check-certificate --ignore-errors -k -f bestvideo+bestaudio/best (stick the video/channel/playlist/whatever URL in here)

Site reconnaissance

Little is known about its database, but according to data from 2006, it was 45TB and doubling every 4 months. At this rate it would be 660 Petabytes (Oct 2014) by now.

According to Leo Leung's calculations based on available information, an often updated Google spreadsheet estimates that in early 2015 YouTube's content reached 500 petabytes in size.

FYI, all of Google Video was about 45TB, and the Archive Team's biggest project, MobileMe was 200TB. The Internet Archive's total capacity is 50PB as of August 2014. So let's hope YouTube stays healthy, because the Archive Team may have finally met its match.

Vital signs

Will be living off Google for a long time if nothing changes.

Around early 2017, numerous content creators have expressed concerns about recent changes with YouTube's advertising policies, and many have also noticed sharp drops in ad revenue as a result, with some creators like Casey Neistat and h3h3Productions expressing existential fears. While not necessarily a cause for imminent alarm, the situation should be watched closely in the event that a positive feedback loop was to begin with a creator exodus.

References

See also

External links