|Status||Special case (archives of streams actively purged after an amount of time)|
|Archiving status||Partially saved (popular videos only)|
|Project source||Phase 1,Phase 2, Items, Index|
|Project tracker||Phase 1, Phase 2|
|IRC channel||#burnthetwitch (on hackint)|
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
youtube-dl -iappears to do some of them
- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
- How to decide which are important? 10+ views again? Do a discovery crawl first?
- Tahoe-LAFS? Grab all the videos into temp storage?
- Compress all the unwatched videos into postage stamp sized videos?
All items discovered are located at twitchtv-items. A collated JSON dump is available.
How can I help?
Download and fire up your warrior! Then select Twitch Phase 2. Better yet, select Archive Team's Choice.
Alternatively for advanced users, you can run the scripts manually see below.
Don't forget to donate to the Internet Archive who will be hosting these files. Disk space is cheap but maintaining them is not!
For those not using the Warrior
Advanced User Quick Start
Please run these sysctl tweaks to optimize uploads:
# Add to /etc/sysctl.conf and run "sysctl -p" # increase TCP max buffer size settable using setsockopt() net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216
You can also issue them without modifying /etc/sysctl.conf by running e.g.
sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216
, but be aware that those won't stick around across reboots.
apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
git clone https://github.com/ArchiveTeam/twitchtv-grab cd ./twitchtv-grab pip install seesaw ./get-wget-lua.sh ... pip install requests
wget-lua may of failed earlier, if so then:
cd get-wget-lua.tmp mv src/wget ../wget-lua cd ..
And finally to actually run
run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server
For troubleshooting and the details please see README.
What we are saving
- twitchplayspokemon: "test" run, estimated 3 TB ($6000)
- These videos selected from these channels with 100 or more views: estimated 23 TB ($46000)
- These videos selected from these channels with 100 or more views: estimated 0.7 TB ($1400)
- These videos selected from these channels with 100 or more views: estimated 4 TB ($8000)
- These top videos which have 10000 or more views: estimated 20 TB ($40000)
- These videos selected from these channels with 100 or more views: estimated 8 TB ($16000)
- These videos selected from SocialBlade's top Twitch channels with 5000 or more views: estimated 10 TB ($20000)
- These videos selected from these channels with 100 or more views: estimated 15 TB ($30000)
- These videos, from previous suggestions, that are most viewed per channel: estimated 0.1TB ($200)
- These videos selected from these channels with 100 or more views: estimated 7 TB ($14000)
- These videos selected from these channels with 100 or more views or most viewed video per channel: estimated 1.6 TB ($3200)
- Sorry, no more suggestions! More suggestions may be considered if you donate to the Internet Archive.
Dollar figures shown to illustrate cost of permanent archives. These are not actual values but are meant to represent simplified values and act as a sane budget. Dollars in USD at $2000 per TB estimate (not per TB of disk space alone).
Channels not included:
- speeddemosarchivesda: already in IA
- vinesauce: avoiding duplication, see below
- and others
Anything culturally significant to add? Comment on Talk:Twitch.tv. Don't forget to sign your comments with
By Archive Team
Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.
A work-in-progress searchable index is now available!
Renegade Stream Archives
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.
- Vinesauce Stream Archival Effort - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.
- Klaxa.eu's Archive of The 4chan Cup - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.
In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See http://www.dailydot.com/esports/twitch-butt-emotes/[IA•Wcite•.today•MemWeb] for details.)
The emotes can be found in the backend at: https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size> where id number ranges from 1 to 103598(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.
Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)
All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103598 were archived through Archivebot with job ids dmginny6vayrkxs90ed2kgs5t and b953b4vdqo8m2yjp3wtunl8on ; the resulting WARCs can be downloaded through the viewer.
- "TPP'S VODS WILL BE SAVED IN THEIR ENTIRETY"
- "Amazon Might Buy Video Game Streaming Service Twitch Before Google Can"
- "Twitch is Trying to Restore Deleted VODs"