Twitch.tv/Warroom
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.[1]
Shutdown
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
[...]
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
[...]
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
[...][2]
Site structure
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
youtube-dl -i
appears to do some of them- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
Storage Issues
- How to decide which are important? 10+ views again? Do a discovery crawl first?
- Tahoe-LAFS? Grab all the videos into temp storage?
- Compress all the unwatched videos into postage stamp sized videos?
Discovery
All items discovered are located at twitchtv-items. A collated JSON dump is available.
How can I help?
Download and fire up your warrior! Then select Twitch Phase 2. Better yet, select Archive Team's Choice.
Alternatively for advanced users, you can run the scripts manually see below.
Don't forget to donate to the Internet Archive who will be hosting these files. Disk space is cheap but maintaining them is not!
For those not using the Warrior
Advanced User Quick Start
Please run these sysctl tweaks to optimize uploads:
# Add to /etc/sysctl.conf and run "sysctl -p" # increase TCP max buffer size settable using setsockopt() net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216
You can also issue them without modifying /etc/sysctl.conf by running e.g.
sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216
, but be aware that those won't stick around across reboots.
apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
git clone https://github.com/ArchiveTeam/twitchtv-grab cd ./twitchtv-grab pip install seesaw ./get-wget-lua.sh ... pip install requests
wget-lua may have failed earlier, if so then:
cd get-wget-lua.tmp mv src/wget ../wget-lua cd ..
And finally to actually run
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server
For troubleshooting and the details please see README.
What we are saving
Currently:
- twitchplayspokemon: "test" run, estimated 3 TB ($6000)
- These videos selected from these channels with 100 or more views: estimated 23 TB ($46000)
- These videos selected from these channels with 100 or more views: estimated 0.7 TB ($1400)
- These videos selected from these channels with 100 or more views: estimated 4 TB ($8000)
- These top videos which have 10000 or more views: estimated 20 TB ($40000)
- These videos selected from these channels with 100 or more views: estimated 8 TB ($16000)
- These videos selected from SocialBlade's top Twitch channels with 5000 or more views: estimated 10 TB ($20000)
- These videos selected from these channels with 100 or more views: estimated 15 TB ($30000)
- These videos, from previous suggestions, that are most viewed per channel: estimated 0.1TB ($200)
- These videos selected from these channels with 100 or more views: estimated 7 TB ($14000)
- These videos selected from these channels with 100 or more views or most viewed video per channel: estimated 1.6 TB ($3200)
Next:
- Sorry, no more suggestions! More suggestions may be considered if you donate to the Internet Archive.
Dollar figures shown to illustrate cost of permanent archives. These are not actual values but are meant to represent simplified values and act as a sane budget. Dollars in USD at $2000 per TB estimate (not per TB of disk space alone).
Channels not included:
- speeddemosarchivesda: already in IA
- vinesauce: avoiding duplication, see below
- and others
Anything culturally significant to add? Comment on Talk:Twitch.tv. Don't forget to sign your comments with ~~~~
.
References