UC Berkeley Course Captures
The University of California, Berkeley is planning to remove their public lecture recordings ("course captures", audio and video) and put them behind authentication. The planned date for the change is 2017-03-15.
The removal will affect at least these public channels:
- https://www.youtube.com/user/UCBerkeley
- https://itunes.apple.com/institution/uc-berkeley/id354813951
- http://webcast.berkeley.edu/series (index of links to YouTube and iTunes)
The #Shutdown notice makes it sound as if YouTube videos will remain online at youtube.com, but will no longer be publicly listed. The new hosting behind authentication will lose playlist information (which links individual lecture videos together for one course). Therefore the pressing thing to do before 2017-03-15 (as regards the YouTube content) is to download indexes of videos and playlists—see #Indexes of files.
On the other hand, "iTunesU Course Capture content will be removed." It's not clear if iTunes content will continue to exist, even behind authentication. Don't know how to download from iTunes.
Ideas
Proposed archiving format:
- Sample: https://archive.org/details/TEST2_UCB_CS195_SP2015
- One item per YouTube playlist
- Identifier includes the course number and semester (there's a list of course subject abbreviations at http://guide.berkeley.edu/courses/)
- Upload
youtube-dl --dump-json
output as youtube-dl.json - Videos in the preview are YouTube's highest-quality muxed format (format 22?)
- Video file naming convention is
%(playlist_index)s-%(title)s.%(ext)s
(in youtube-dl's output template format) - All other formats stored in tar files, one file per format (maybe overkill, as these are derived anyway?)
- Include stderr output of youtube-dl, in order to have a record of videos that aren't accessible (e.g.,
ERROR: Zrzh3Fz8DhQ: YouTube said: This video contains content from BBC Worldwide, who has blocked it on copyright grounds.
)
There's an existing https://archive.org/details/ucberkeleylectures collection to which the newly archived files could perhaps be added.
Archiving scripts
Scripts for extracting YouTube metadata in an Internet Archive–compatible CSV format (the repo also includes #Indexes of files):
git clone https://repo.eecs.berkeley.edu/git-anon/users/fifield/archive-ucberkeley-webcast.git
Prerequisites:
- Python 2.7
- ia
- jq
- youtube-dl
How to download a playlist
This is how to download all the videos of a playlist in all available formats.
Get a list of playlist titles, IDs, and last line of video description (often lists the license):
gzip -dc indexes/youtube.com-user-UCBerkeley-playlists-20170301.json.gz | jq --compact-output '[.playlist_title,.playlist_id,.description|match(".*\\Z").string]' | uniq -c
Choose a playlist to download. Let's say it's
PLAYLIST=PL-XXv-cvA_iDAJCFxcERyaXngBMTHguhM OUTDIR="downloads/$PLAYLIST"
Make a directory for the download:
mkdir -p "$OUTDIR"
Extract just the JSON objects corresponding to this playlist:
gzip -dc indexes/youtube.com-user-UCBerkeley-playlists-20170301.json.gz | jq --compact-output "select(.playlist_id==\"$PLAYLIST\")" > "$OUTDIR/youtube-dl.json"
Now download all the files. It may fail partway through; you can keep running it again and again until it finishes.
youtube-dl --ignore-errors --no-progress --fixup warn --all-formats --output "$OUTDIR"/'%(format_id)s/%(playlist_index)s-%(title)s.%(ext)s' "https://www.youtube.com/playlist?list=$PLAYLIST" 2>&1 | tee -a "$OUTDIR/youtube-dl.log"
If you only want to download the highest-quality file-format, use --format=best
in place of --all-formats
in the youtube-dl command. By default (without any --format
option), youtube-dl will use --format=bestvideo+bestaudio
, which could locally mux together two separate video and audio streams, resulting in a file that never actually existed on YouTube.
How to extract metadata
The metadata.py script converts the metadata in the JSON file into CSV format. It's currently hardcoded to always set collection=test_collection
, so any uploads will not yet be permanent. You have to edit the script if you want to change that.
Think of an identifier for the item. A list of course subject abbreviations is at http://guide.berkeley.edu/courses/. Then run the metadata.py script.
./metadata.py "$IDENTIFIER" "$OUTDIR/youtube-dl.json" > "$PLAYLIST.metadata.csv"
How to upload files and set metadata
Note: you should probably hold off on uploading until there's a plan for naming conventions, etc.
First you have to upload a file (any file) to create the item, before you can set metadata. Important: you need to set the mediatype
and collection
metadata at this point, because they can't be changed later.
ia upload "$IDENTIFIER" "$OUTDIR"/youtube-dl.* --metadata "mediatype:movies" --metadata "collection:test_collection"
Now you can set the metadata. You'll be able to change this later if necessary.
ia metadata --spreadsheet "$PLAYLIST.metadata.csv"
Then upload video files of a certain format; e.g. for format 22, do:
ia upload "$IDENTIFIER" "$PLAYLIST"/22/*
To get an idea of what format to upload, check which directories are the largest:
du -sh "$PLAYLIST"/*
You can see short explanations of the available formats with:
jq '.formats[].format' "$PLAYLIST/youtube-dl.json"
Indexes of files
Watch out: these indexes may only be partial. For example, they seem to be missing https://www.youtube.com/playlist?list=PL4BBB74C7D2A1049C from 2006.
This guide to the YouTube API may be useful: https://developers.google.com/youtube/v3/guides/implementation/playlists#playlists-retrieve-for-user
- youtube.com-user-UCBerkeley-playlists-20170301.json.gz
- youtube-dl JSON dump of https://www.youtube.com/user/UCBerkeley/playlists, representing 234 playlists and 6,632 videos. It was produced like this:
youtube-dl --ignore-errors --dump-json https://www.youtube.com/user/UCBerkeley/playlists 2>youtube.com-user-UCBerkeley-playlists-20170301.stderr | gzip -9v >youtube.com-user-UCBerkeley-playlists-20170301.json.orig.gz gzip -dc youtube.com-user-UCBerkeley-playlists-20170301.json.orig.gz | jq --compact-output 'del(.url,((.formats[]?,.requested_formats[]?)|(.url,.manifest_url,.fragments)))' | gzip -9v > youtube.com-user-UCBerkeley-playlists-20170301.json.gz
- youtube.com-user-UCBerkeley-playlists-20170301.stderr
- youtube-dl stderr output for the preceding.
- youtube.com-user-UCBerkeley-videos-20170301.json.gz
- youtube-dl JSON dump of https://www.youtube.com/user/UCBerkeley/videos, representing 9,886 videos, but without playlist information. It was produced like this:
youtube-dl --ignore-errors --dump-json https://www.youtube.com/user/UCBerkeley/videos 2>youtube.com-user-UCBerkeley-videos-20170301.stderr | gzip -9v >youtube.com-user-UCBerkeley-videos-20170301.json.orig.gz gzip -dc youtube.com-user-UCBerkeley-videos-20170301.json.orig.gz | jq --compact-output 'del(.url,((.formats[]?,.requested_formats[]?)|(.url,.manifest_url,.fragments)))' | gzip -9v > youtube.com-user-UCBerkeley-videos-20170301.json.gz
- youtube.com-user-UCBerkeley-videos-20170301.stderr
- youtube-dl stderr output for the preceding.
- webcast.berkeley.edu-series-20170301.html.gz
- HTML of http://webcast.berkeley.edu/series on 2017-03-01. The page is dynamically generated using JavaScript, so the HTML is taken from the inspector in a browser after the page has loaded. The page contains links to YouTube and iTunes.
Sample commands for working with JSON indexes (using jq):
gzip -dc data/youtube.com-user-UCBerkeley-playlists-20170301.json.gz | jq -r .playlist_title | uniq
- Extract all playlist titles
gzip -dc data/youtube.com-user-UCBerkeley-playlists-20170301.json.gz | jq -r .playlist_id | uniq
- Extract all playlist IDs. Convert an ID into a URL as: https://www.youtube.com/playlist?list=id.
Status
YouTube playlists
These extra playlists look like they contain more than one course and may merit special treatment:
playlist | downloaded | uploaded |
---|---|---|
Fall 2012 Courses, Part 4 (200 videos) | ||
Spring 2013 Courses, Part 1 (193 videos) | ||
Spring 2013 Courses, Part 2 (196 videos) | ||
Spring 2013 Courses, Part 3 (199 videos) | ||
Spring 2013 Courses (116 videos) |
YouTube videos without playlists
Nothing yet. Have to find out what videos are in videos.json but not in playlists.json, and deal with them separately.
iTunes U
Nothing yet.
Shutdown notice
2017-03-01 http://news.berkeley.edu/2017/03/01/course-capture/
Cathy Koshland, UC Berkeley vice chancellor for undergraduate education, sent this message to the campus community today:
Dear Campus Community,
I wanted to share with you the decision to restrict access to our legacy Course Capture (classroom lecture) videos and podcasts, currently searchable at webcast.berkeley.edu and found on YouTube and UC Berkeley iTunesU, to members of the campus community.
As part of the campus’s ongoing effort to improve the accessibility of online content, we have determined that instead of focusing on legacy content that is 3-10 years old, much of which sees very limited use, we will work to create new public content that includes accessible features. Our public legacy libraries on YouTube and iTunesU include over 20,000 publications. This move will also partially address recent findings by the Department of Justice which suggests that the YouTube and iTunesU content meet higher accessibility standards as a condition of remaining publicly available. Finally, moving our content behind authentication allows us to better protect instructor intellectual property from “pirates” who have reused content for personal profit without consent.
Since fall 2015 we have piloted publishing all of our Course Capture content behind CAS/CalNet authentication. This strategy has enhanced our ability to accommodate students and UC Berkeley community members who have demonstrated an accessibility need, and we have concluded that authentication is an intervention that is appropriately responsive to the Berkeley community.
We will continue to evaluate the role of online Course Capture and distribution in tandem with advances in technology befitting the No. 1 public institution in the country. Berkeley will maintain its commitment to sharing content to the public through our partnership with EdX (edx.org). This free and accessible content includes a wide range of educational opportunities and topics from across higher ed.
Beginning March 15, 2017, access to iTunesU course content will be suspended. On the same day we will begin the process of moving the publicly offered YouTube content made from the current legacy channel [youtube.com/ucberkeley] to a new authentication login required channel. The entire process is expected to take three to five months. During this time the ETS team will migrate the videos into the new channel behind CalNet/CAS authentication. Berkeley users seeking to view this older content will be able to access it by logging into YouTube with their bConnected/Google-supported identity.
To help manage the instructional impact, instructors with legacy content have been contacted. Instructors utilizing the ETS Course Capture service since fall 2015 will experience no changes in viewing or accessing content.
Enrolled Berkeley students requiring accommodations will continue to receive support through the Disabled Students Program.
Finally, as we continue to strive for inclusion and effective teaching and learning for all members of the campus community, we encourage you to reference a new campus website designed to help instructors identify best practices and techniques in creating accessible course content for all users: accesscontent.berkeley.edu.
For additional information, please review this FAQ document.
2017-02-24 http://news.berkeley.edu/2017/02/24/faq-on-legacy-public-course-capture-content/
Here is additional information to assist the campus community and the public with upcoming changes to UC Berkeley’s library of legacy public Course Capture (classroom lecture) content from webcast.berkeley.edu, located on YouTube and UC Berkeley iTunesU.
- Who uses this content? How much of the content is used/watched?
- Course recordings are a study-tool for current students. Results from a recent review of our legacy (2006-2015) public course recordings on YouTube show that the average video is watched for less than eight minutes.
- Who are the “pirates” mentioned in the CalMessage?
- Pirates is a term used to describe websites that embed YouTube content without the permission of the original copyright holder for profit. UC Berkeley legacy Course Capture content has been discovered on for-profit websites, which use either a subscription fee or on-page advertising.
- Why now? Is this related to the DOJ letter?
- UC Berkeley stopped posting course lecture videos publicly through webcast.berkeley.edu in 2015 as a way to reduce costs and increase adoption. However, we left legacy content from 2006-2015 in place. The Department of Justice letter indicates that they believe our legacy Course Capture content from webcast.berkeley.edu and located on YouTube and iTunesU is in violation of the Americans with Disabilities Act. We are removing the legacy webcast.berkeley.edu content from public access to focus on making future public content more accessible. Instructors are encouraged to reference accesscontent.berkeley.edu for best practices and resources for making course content accessible.
- If we don’t add captions and descriptions, what happens?
- Failure to meet the expectations of the Department of Justice could mean potential legal and financial ramifications.
- What about current students who need captioning?
- ETS and the Disabled Students Program (DSP) have been partnering over the last several years to identify courses requiring captioning based on student need. The partnership and support of students working with DSP will continue.
- What will happen to the recordings?
- Beginning March 15, 2017, iTunesU Course Capture content will be removed. You may continue to use/download course capture content until that date. Other content in this location such as events, KALX and Public Affairs content will remain available after March 15. On the same day ETS will begin moving the publicly offered YouTube course capture content from the current legacy channel [youtube.com/ucberkeley] to a new authentication login-required channel. The entire process is expected to take three to five months. Berkeley users seeking to view this older content will be able to access it by logging into YouTube with their bConnected/Google supported identity. Instructors with course recordings on YouTube recorded fall 2015 or later will experience no change. Individual video URLs (links) will remain unchanged. Instructors currently using impacted recordings are encouraged to contact the Course Capture team to identify ways to mitigate any effect on their courses: coursecapture@berkeley.edu
- How long will videos be interrupted?
- The entire process to migrate the public YouTube videos from their current location to a new YouTube channel that will be accessible with campus member’s bConnected/Google supported identity will take 8-10 weeks and begin on March 15, 2017. Each video will be unavailable on bCourses for 2-3 business days. If you are a current instructor using impacted legacy recordings please contact the Course Capture team to review your needs: coursecapture@berkeley.edu
- If I have other videos that I want to get captioned or audio described, how would I do that?
- While speech-to-text tools continue to improve, effective captioning remains a very manual process. The UC System has recently introduced contracts with several vendors to provide captioning services.The vendor transcribes a recording and adds the text to the appropriate YouTube video, or a transcriber may be hired to caption an event live. At UC Berkeley, content created/captured by Berkeley Video and Berkeley AV is now being captioned. Information on audio description best practices are available at: https://webaccess.berkeley.edu/resources/tips/audio-description and https://webaccess.berkeley.edu/ask-pecan/descriptive-audio
- I’m using the impacted recordings (iTunesU or spring 2015 or earlier YouTube content) in my course now. What should I do?
- ETS is working hard to mitigate impacts to current instruction. If you already have a list of your video links, you have no additional steps to take. Video URLs will remain unchanged. If you need assistance or have additional concerns, please contact the Course Capture team to review your needs: coursecapture@berkeley.edu
- I am an instructor who is using impacted recordings (iTunesU or spring 2015 or earlier YouTube content) for something outside of UC Berkeley. What should I do?
- If you are an instructor using legacy recordings currently available to the public as an extension of your research or teaching, please contact the Course Capture team: coursecapture@berkeley.edu
- Why was the public not notified before webcast.berkeley.edu content disappeared so that we had a chance to download iTunes legacy content?
- We added notifications to our sites and provided a warning before content began to be removed. The legacy content on webcast.berkeley.edu located on YouTube and UC Berkeley’s iTunes U is three to ten years old.
- I am a Berkeley instructor who wants to use old content in my class, where can I find the URL to share with my students?
- Before videos are migrated: Instructors can copy/paste their YouTube links for future reference. Link URLs will remain unchanged. Educational Technology Services (ETS) is working to modify webcast.berkeley.edu so that videos are accessible to UC Berkeley CalNet users starting in April Instructors with immediate questions can contact the Course Capture team: coursecapture@berkeley.edu
- Can I get a copy of my old lectures from YouTube to use personally?
- Currently, ETS doesn’t have a service that provides copies of recordings to individuals.
- I am a Berkeley CalNet user, so why can’t I search for videos and playlists that I used to be able to see on webcast.berkeley.edu?
- The process that allows us to place the videos behind authentication removes playlists and content search options. ETS is working to provide campus users a new website that will function as a directory of recordings that should launch sometime in April on the existing webcast.berkeley.edu site.
- Can I still find previous events and other non-Course Capture recordings on YouTube?
- The public UC Berkeley Events Channel (youtube.com/ucberkeleyevents) will continue to be available. Many recordings at this location are already captioned and plans are in place to caption future content.