|Status||Special case (archives of streams actively purged after an amount of time)|
|Archiving status||Partially saved (popular videos only)|
|Project source||Phase 1,Phase 2, Items, Index|
|Project tracker||Phase 1, Phase 2|
|IRC channel|| (on hackint)|
(formerly #burnthetwitch (on EFnet))
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Broadcast retention changes
After Twitch's acquisition by Amazon, changes were made to how long broadcasts (sometimes called VODs) were retained for viewing on the site. Previously, all partnered accounts had indefinite storage, while standard accounts had storage for a few days. This was then cut down to two months.
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
Although most partner accounts have broadcasts deleted after 60 days, there are some exceptions. Most of these are esports tournament channels, but other channels may be excluded for reasons such as being culturally significant. Below is an incomplete list of these exceptions.
|Channel||Indefinite since||Probable reason|
|Dota 2 The International||Policy Inception||Esports tournament|
|Riot Games, LCS, LEC||Policy Inception||Game developer & esports tournaments|
|Beyond The Summit||Policy Inception||Esports broadcaster|
|Twitch Plays Pokemon||Policy Inception||Culturally significant|
|Games Done Quick||Policy Inception||Culturally significant (charity fundraiser)|
|Tip of the Hats||Policy Inception||Culturally significant (charity fundraiser)|
|Rocket League||April 2016||Game channel & esports tournaments|
|Evo||Policy Inception||Esports tournament|
|ESL CS:GO||Policy Inception||Esports broadcaster|
|DreamHack Counter-Strike||Policy Inception||Esports & LAN event broadcaster|
|shroud||Policy Inception?||Notable streamer & esports personality|
|Ninja||January 2018||Notable streamer|
|pokimane||Policy Inception||Notable streamer|
|StarCraft||Policy Inception||Game channel & esports tournaments|
|PlayOverwatch||August 2017?||Game channel & esports tournaments|
|PlayHearthstone||Policy Inception||Game channel & esports tournaments|
|Warframe||April 2017||Game channel|
|Fortnite||Channel Creation||Game channel & esports tournaments|
|FACEIT TV||Policy Inception||Esports service & tournaments|
|teamfortress.tv||Policy Inception||Esports broadcaster|
|The GD Studio||Policy Inception||Esports broadcaster|
|Room On Fire||Policy Inception||Esports broadcaster|
|Ninjas In Pyjamas||Policy Inception||Esports team/broadcaster|
|Bob Ross||June 2016||TV show & culturally significant|
|PGL||Policy Inception||Esports tournament organizer|
|Notch||Policy Inception||Minecraft creator|
|Porter Robinson||Policy Inception?||Musician|
|MOGRA||Policy Inception?||Music club in Akihabara, JP|
|Mojang||Policy Inception||Game developers, including Minecraft|
|Yogscast||Policy Inception||Culturally significant|
|Twitch||Policy Inception||The site (duh)|
|Twitch Presents||March 2017||Also the site, special event streams|
|The Game Awards||2016?||Awards show|
|The Esports Awards||June 2017||Awards show|
|AMD||Policy Inception||CPU/GPU manufacturer|
|NVIDIA GeForce||Policy Inception||GPU manufacturer (uses NVIDIA these days)|
|Xbox||Policy Inception||Console platform|
|PlayStation||Policy Inception||Console platform|
|Nintendo||Policy Inception||Game developer & console manufacturer|
|PAX||Policy Inception||Gaming convention|
|IGN||Policy Inception||Magazine & website|
|PokerStars||Policy Inception||Poker website & tournaments|
|joeykaotyk||July 7, 2017||Notable streamer|
|Pokemon||Policy Inception||Game franchise & company|
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
youtube-dl -iappears to do some of them
- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
Twitch Chat Downloader
Download link: https://github.com/PetterKraabol/Twitch-Chat-Downloader
Twitch Chat Downloader (tcd) is software to download the twitch chat of a given twitch VOD, or a selection of VODs by channel. tcd requires "Python 3.8 or newer" and "Twitch client ID & client secret" in order to work. The Twitch client ID is easy to get; just click "Login with Twitch" at https://dev.twitch.tv/console/apps then you will see it in the URL. The Twitch client secret is harder to get; you will probably have to log in to Twitch to get it. For more info on "Client ID" and "Client Secret", see https://dev.twitch.tv/docs/api and https://dev.twitch.tv/docs/authentication/register-app. While an account is not required to see the twitch chat of a VOD, the ability to download twitch chat via tcd is loginwalled (you need a twitch account and probably also API access).
Note that you will want to (at least) use the JSON output to preserve the most information about the chat.
Example command to download the chat:
tcd --channel "AOC" -o "." --format "default,irc,json,srt,ssa" --timezone "UTC" --verbose --debug --log
The output files will be files named after the twitch video ID with the given file extensions.
Download link: https://github.com/Franiac/TwitchLeecher
Software to download the raw .ts segments of a twitch VOD (of a given quality) and combine them, with an optional conversion to MP4.
Example filename template (set in settings) to include all of the relevant information of the VOD in the filename:
By Archive Team
Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.
A work-in-progress searchable index is now available!
Renegade Stream Archives
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.
- Vinesauce Stream Archival Effort - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.
- Klaxa.eu's Archive of The 4chan Cup - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.
User:TheTechRobo runs an IRC bot that archives Twitch metadata and chat into WARC and JSON in (on hackint). Please go easy on the bot! I don't have the best bandwidth or storage.
Note if you're going through the dataset: In items grabbed on October 29, 2022, there may be orphan records or partial grabs in the WARCS. This is not faulty data - it is valid data from Twitch's GQL, and the records should be fine, but I changed the port warcprox starts up on for testing so I could test and run prod at the same time, but forgot to change the port used in commands etc. This is a fucking lame problem, but it at least isn't a data integrity thing. I'll take "lame" over "these WARCs need to be burned with fire".
In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See• • • for details.)
The emotes can be found in the backend at: https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size> where id number ranges from 1 to 103667(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.
Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)
All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103667 were archived through Archivebot here; the resulting WARCs can be downloaded through the viewer.
- "TPP'S VODS WILL BE SAVED IN THEIR ENTIRETY"
- "Amazon Might Buy Video Game Streaming Service Twitch Before Google Can"
- "Twitch is Trying to Restore Deleted VODs"