Twitch.tv
Twitch.tv | |
URL | http://twitch.tv |
Status | Special case (archives of streams actively purged after an amount of time) |
Archiving status | Partially saved (popular videos only) |
Archiving type | Unknown |
Project source | Phase 1,Phase 2, Items, Index |
Project tracker | Phase 1, Phase 2 |
IRC channel | #archiveteam-bs (on hackint) (formerly #burnthetwitch (on EFnet)) |
Data[how to use] | see #Archives |
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.[1]
Broadcast retention changes
After Twitch's acquisition by Amazon, changes were made to how long broadcasts (sometimes called VODs) were retained for viewing on the site. Previously, all partnered accounts had indefinite storage, while standard accounts had storage for a few days. This was then cut down to two months.
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
[...]
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
[...]
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
[...][2]
Thus a mission was started to archive as much of Twitch as we reasonably could. Our efforts can be found in the War Room and on the Talk page.
Known exceptions
Although most partner accounts have broadcasts deleted after 60 days, there are some exceptions. Most of these are esports tournament channels, but other channels may be excluded for reasons such as being culturally significant. Below is an incomplete list of these exceptions.
Channel | Indefinite since | Probable reason |
---|---|---|
Dota 2 The International | Policy Inception | Esports tournament |
Riot Games, LCS, LEC | Policy Inception | Game developer & esports tournaments |
Beyond The Summit | Policy Inception | Esports broadcaster |
Twitch Plays Pokemon | Policy Inception | Culturally significant |
Games Done Quick | Policy Inception | Culturally significant (charity fundraiser) |
Tip of the Hats | Policy Inception | Culturally significant (charity fundraiser) |
Rocket League | April 2016 | Game channel & esports tournaments |
Brawlhalla | February 2015 | Game channel & esports tournaments |
Evo | Policy Inception | Esports tournament |
ESL CS | Policy Inception | Esports broadcaster |
DreamHack Counter-Strike | Policy Inception | Esports & LAN event broadcaster |
shroud | Policy Inception? | Notable streamer & esports personality |
Ninja | January 2018 | Notable streamer |
pokimane | Policy Inception | Notable streamer |
sodapoppin | October 2020? | Notable streamer |
Blizzard | Policy Inception | Game developer |
Warcraft | Policy Inception | Game channel & esports tournaments |
StarCraft | Policy Inception | Game channel & esports tournaments |
PlayOverwatch | August 2017? | Game channel & esports tournaments |
PlayHearthstone | Policy Inception | Game channel & esports tournaments |
Warframe | April 2017 | Game channel |
Fortnite | Channel Creation | Game channel & esports tournaments |
Roblox | Policy Inception | Game channel/developers/platform |
FACEIT TV | Policy Inception | Esports service & tournaments |
teamfortress.tv | Policy Inception | Esports broadcaster |
The GD Studio | Policy Inception | Esports broadcaster |
Room On Fire | Policy Inception | Esports broadcaster |
Ninjas In Pyjamas | Policy Inception | Esports team/broadcaster |
Bob Ross | June 2016 | TV show & culturally significant |
PGL | February 2015 | Esports tournament organizer |
PGL Dota 2 | Channel Creation? | Esports tournament organizer |
Minecraft | Policy Inception | Game channel |
Mojang | Policy Inception | Game developers, including Minecraft |
Notch | Policy Inception | Minecraft creator |
deadmau5 | Policy Inception | Musician |
Porter Robinson | Policy Inception? | Musician |
MOGRA | Policy Inception? | Music club in Akihabara, JP |
Yogscast | Policy Inception | Culturally significant |
Twitch | Policy Inception | The site (duh) |
Twitch Presents | March 2017 | Also the site, special event streams |
The Game Awards | 2016? | Awards show |
The Esports Awards | June 2017 | Awards show |
AMD | Policy Inception | CPU/GPU manufacturer |
NVIDIA GeForce | Policy Inception | GPU manufacturer (uses NVIDIA these days) |
2017 | Website | |
Xbox | Policy Inception | Console platform |
PlayStation | Policy Inception | Console platform |
Nintendo | Policy Inception | Game developer & console manufacturer |
PAX | Policy Inception | Gaming convention |
IGN | Policy Inception | Magazine & website |
PokerStars | Policy Inception | Poker website & tournaments |
joeykaotyk | July 7, 2017 | Notable streamer |
AOC | Channel Creation | Politician |
Pokemon | Policy Inception | Game franchise & company |
Site structure
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
yt-dlp -i
appears to do some of them- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
Software
Twitch Chat Downloader
Download link: https://github.com/PetterKraabol/Twitch-Chat-Downloader
Twitch Chat Downloader (tcd) is software to download the twitch chat of a given twitch VOD, or a selection of VODs by channel. tcd requires[1] "Python 3.8 or newer" and "Twitch client ID & client secret" in order to work. The Twitch client ID is easy to get; just click "Login with Twitch" at https://dev.twitch.tv/console/apps then you will see it in the URL. The Twitch client secret is harder to get; you will probably have to log in to Twitch to get it. For more info on "Client ID" and "Client Secret", see https://dev.twitch.tv/docs/api and https://dev.twitch.tv/docs/authentication/register-app. While an account is not required to see the twitch chat of a VOD, the ability to download twitch chat via tcd is loginwalled (you need a twitch account and probably also API access).
Note that you will want to (at least) use the JSON output to preserve the most information about the chat.
Example command to download the chat:
tcd --channel "AOC" -o "." --format "default,irc,json,srt,ssa" --timezone "UTC" --verbose --debug --log
The output files will be files named after the twitch video ID with the given file extensions.
TwitchLeecher
Download link: https://github.com/Franiac/TwitchLeecher
Software to download the raw .ts segments of a twitch VOD (of a given quality) and combine them, with an optional conversion to MP4.
Example filename template (set in settings) to include all of the relevant information of the VOD in the filename:
{date}_{time24}_{channel}_{id}_{game}_{title}_{res}_{fps}fps_{start}_{end}
Archives
By Archive Team
Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.
A work-in-progress searchable index is now available!
Renegade Stream Archives
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.
- Vinesauce Stream Archival Effort - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.
- Klaxa.eu's Archive of The 4chan Cup - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.
By TheTechRobo (#burnthetwitch)
User:TheTechRobo runs an IRC bot that archives Twitch metadata and chat into WARC and JSON in #burnthetwitch (on hackint). Please go easy on the bot! I don't have the best bandwidth or storage. Source code is at github. The data can be found at this collection.
The directory structure isn't too difficult to understand. First, at the top level of the IA item, is just a folder with the twitch channel's name. This contains all the relevant data. Inside that folder, tar files are tars of grabs of the channel metadata and list of VODs, and the folders are the VOD grabs (remember that it is only metadata); inside the folders are one tar file for each grab of that VOD (there will almost always be only 1).
Note if you're going through the dataset: In items grabbed on October 29, 2022, there may be orphan records or partial grabs in the WARCS. This is not faulty data - it is valid data from Twitch's GQL, and the records should be fine, but I changed the port warcprox starts up on for testing so I could test and run prod at the same time, but forgot to change the port used in commands etc. The WARCs should be fine - they just might have unrelated data inside them. Also, items from before roughly June 28, 2023, may have one or more WARCs from failed grabs inside; however, again, these WARCs should be absolutely fine (the grab just failed).
Other
Butt Controversy
In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See http://www.dailydot.com/esports/twitch-butt-emotes/[IA•Wcite•.today•MemWeb] for details.)
The emotes can be found in the backend at: https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size> where id number ranges from 1 to 103667(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.
Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)
All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103667 were archived through Archivebot here; the resulting WARCs can be downloaded through the viewer.
See Also
External links
- "TPP'S VODS WILL BE SAVED IN THEIR ENTIRETY"
- "Amazon Might Buy Video Game Streaming Service Twitch Before Google Can"
- "Twitch is Trying to Restore Deleted VODs"