Twitch.tv

From Archiveteam
Jump to navigation Jump to search
Twitch.tv
Twitch.tv logo
Twitch homepage screenshot.png
URL http://twitch.tv
Status Special case (archives of streams actively purged after an amount of time)
Archiving status Partially saved (popular videos only)
Archiving type Unknown
Project source Phase 1,Phase 2, Items, Index
Project tracker Phase 1, Phase 2
IRC channel #archiveteam-bs (on hackint)
(formerly #burnthetwitch (on EFnet))
Data[how to use] see #Archives

Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.

Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.[1]

Broadcast retention changes

After Twitch's acquisition by Amazon, changes were made to how long broadcasts (sometimes called VODs) were retained for viewing on the site. Previously, all partnered accounts had indefinite storage, while standard accounts had storage for a few days. This was then cut down to two months.

Changes To VODs On Twitch

Aug 06 2014 · Engineering, Tech

Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.

In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.

[...]

Looking at Viewership Data

We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.

We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.

Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.

[...]

As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.

[...][2]

Thus a mission was started to archive as much of Twitch as we reasonably could. Our efforts can be found in the War Room and on the Talk page.

Known exceptions

Although most partner accounts have broadcasts deleted after 60 days, there are some exceptions. Most of these are esports tournament channels, but other channels may be excluded for reasons such as being culturally significant. Below is an incomplete list of these exceptions.

Channel Indefinite since Probable reason
Dota 2 The International Policy Inception Esports tournament
Riot Games, LCS, LEC Policy Inception Game developer & esports tournaments
Beyond The Summit Policy Inception Esports broadcaster
Twitch Plays Pokemon Policy Inception Culturally significant
Games Done Quick Policy Inception Culturally significant (charity fundraiser)
Tip of the Hats Policy Inception Culturally significant (charity fundraiser)
Rocket League April 2016 Game channel & esports tournaments
Brawlhalla February 2015 Game channel & esports tournaments
Evo Policy Inception Esports tournament
ESL CS Policy Inception Esports broadcaster
DreamHack Counter-Strike Policy Inception Esports & LAN event broadcaster
shroud Policy Inception? Notable streamer & esports personality
Ninja January 2018 Notable streamer
pokimane Policy Inception Notable streamer
sodapoppin October 2020? Notable streamer
Blizzard Policy Inception Game developer
StarCraft Policy Inception Game channel & esports tournaments
PlayOverwatch August 2017? Game channel & esports tournaments
PlayHearthstone Policy Inception Game channel & esports tournaments
Warframe April 2017 Game channel
Fortnite Channel Creation Game channel & esports tournaments
FACEIT TV Policy Inception Esports service & tournaments
teamfortress.tv Policy Inception Esports broadcaster
The GD Studio Policy Inception Esports broadcaster
Room On Fire Policy Inception Esports broadcaster
Ninjas In Pyjamas Policy Inception Esports team/broadcaster
Bob Ross June 2016 TV show & culturally significant
PGL Policy Inception Esports tournament organizer
Minecraft Policy Inception Game channel
Mojang Policy Inception Game developers, including Minecraft
Notch Policy Inception Minecraft creator
deadmau5 Policy Inception Musician
Porter Robinson Policy Inception? Musician
MOGRA Policy Inception? Music club in Akihabara, JP
Yogscast Policy Inception Culturally significant
Twitch Policy Inception The site (duh)
Twitch Presents March 2017 Also the site, special event streams
The Game Awards 2016? Awards show
The Esports Awards June 2017 Awards show
AMD Policy Inception CPU/GPU manufacturer
NVIDIA GeForce Policy Inception GPU manufacturer (uses NVIDIA these days)
reddit 2017 Website
Xbox Policy Inception Console platform
PlayStation Policy Inception Console platform
Nintendo Policy Inception Game developer & console manufacturer
PAX Policy Inception Gaming convention
IGN Policy Inception Magazine & website
PokerStars Policy Inception Poker website & tournaments
joeykaotyk July 7, 2017 Notable streamer
AOC Channel Creation Politician
Pokemon Policy Inception Game franchise & company

Site structure

Software

Twitch Chat Downloader

Download link: https://github.com/PetterKraabol/Twitch-Chat-Downloader

Twitch Chat Downloader (tcd) is software to download the twitch chat of a given twitch VOD, or a selection of VODs by channel. tcd requires[1] "Python 3.8 or newer" and "Twitch client ID & client secret" in order to work. The Twitch client ID is easy to get; just click "Login with Twitch" at https://dev.twitch.tv/console/apps then you will see it in the URL. The Twitch client secret is harder to get; you will probably have to log in to Twitch to get it. For more info on "Client ID" and "Client Secret", see https://dev.twitch.tv/docs/api and https://dev.twitch.tv/docs/authentication/register-app. While an account is not required to see the twitch chat of a VOD, the ability to download twitch chat via tcd is loginwalled (you need a twitch account and probably also API access).

Note that you will want to (at least) use the JSON output to preserve the most information about the chat.

Example command to download the chat:

   tcd --channel "AOC" -o "." --format "default,irc,json,srt,ssa" --timezone "UTC" --verbose --debug --log

The output files will be files named after the twitch video ID with the given file extensions.

TwitchLeecher

Download link: https://github.com/Franiac/TwitchLeecher

Software to download the raw .ts segments of a twitch VOD (of a given quality) and combine them, with an optional conversion to MP4.

Example filename template (set in settings) to include all of the relevant information of the VOD in the filename:

   {date}_{time24}_{channel}_{id}_{game}_{title}_{res}_{fps}fps_{start}_{end}


Archives

By Archive Team

Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.

A work-in-progress searchable index is now available!

Renegade Stream Archives

These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.

By TheTechRobo (#burnthetwitch)

User:TheTechRobo runs an IRC bot that archives Twitch metadata and chat into WARC and JSON in #burnthetwitch (on hackint). Please go easy on the bot! I don't have the best bandwidth or storage. Source code is at github. The data can be found at this collection.

The directory structure isn't too difficult to understand. First, at the top level of the IA item, is just a folder with the twitch channel's name. This contains all the relevant data. Inside that folder, tar files are tars of grabs of the channel metadata and list of VODs, and the folders are the VOD grabs (remember that it is only metadata); inside the folders are one tar file for each grab of that VOD (there will almost always be only 1).

Note if you're going through the dataset: In items grabbed on October 29, 2022, there may be orphan records or partial grabs in the WARCS. This is not faulty data - it is valid data from Twitch's GQL, and the records should be fine, but I changed the port warcprox starts up on for testing so I could test and run prod at the same time, but forgot to change the port used in commands etc. The WARCs should be fine - they just might have unrelated data inside them. Also, items from before roughly June 28, 2023, may have one or more WARCs from failed grabs inside; however, again, these WARCs should be absolutely fine (the grab just failed).

Other

Butt Controversy

In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See http://www.dailydot.com/esports/twitch-butt-emotes/[IAWcite.todayMemWeb] for details.)

The emotes can be found in the backend at: https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size> where id number ranges from 1 to 103667(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.

Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)

All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103667 were archived through Archivebot here; the resulting WARCs can be downloaded through the viewer.

See Also

External links

References