|Project status||Special Case (Archives to be deleted)|
|Archiving status||In progress...|
|Project source||Phase 1,Phase 2, Items|
|Project tracker||Phase 1, Phase 2|
|IRC channel||(on EFnet)|
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
youtube-dl -iappears to do some of them
- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
- How to decide which are important? 10+ views again? Do a discovery crawl first?
- Tahoe-LAFS? Grab all the videos into temp storage?
- Compress all the unwatched videos into postage stamp sized videos?
How can I help?
Download and fire up your warrior! Then select Twitch Phase 2. Better yet, select Archive Team's Choice.
Alternatively for advanced users, you can run the scripts manually. Details are described in the source code repos.
Don't forget to donate to the Internet Archive who will be hosting these files. Disk space is cheap but maintaining them is not!
For those not using the Warrior
Please run these sysctl tweaks to optimize uploads:
# Add to /etc/sysctl.conf and run "sysctl -p" # increase TCP max buffer size settable using setsockopt() net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216
You can also issue them without modifying /etc/sysctl.conf by running e.g.
sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216
, but be aware that those won't stick around across reboots.
apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
git clone https://github.com/ArchiveTeam/twitchtv-grab cd ./twitchtv-grab pip install seesaw ./get-wget-lua.sh ... pip install requests
What we are saving
- Videos with X or more views
Anything culturally significant to add? Comment on Talk:Twitch.tv. Don't forget to sign your comments with
By Archive Team
TODO: Archives will be made available later as WARC files and will be accessible by the Wayback Machine. A searchable index will be made later.
Renegade Stream Archives
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.
- Vinesauce Stream Archival Effort - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.
- Klaxa.eu's Archive of The 4chan Cup - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.