Twitch.tv
Twitch.tv | |
URL | http://twitch.tv |
Status | Special case (archives of streams actively purged after an amount of time) |
Archiving status | Partially saved (popular videos only) |
Archiving type | Unknown |
Project source | Phase 1,Phase 2, Items, Index |
Project tracker | Phase 1, Phase 2 |
IRC channel | #archiveteam-bs (on hackint) (formerly #burnthetwitch (on EFnet)) |
Data[how to use] | see #Archives |
Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.
Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.[1]
Broadcast retention changes
After Twitch's acquisition by Amazon, changes were made to how long broadcasts (sometimes called VODs) were retained for viewing on the site. Previously, all partnered accounts had indefinite storage, while standard accounts had storage for a few days. This was then cut down to two months.
Changes To VODs On Twitch
Aug 06 2014 · Engineering, Tech
Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.
In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.
[...]
Looking at Viewership Data
We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.
Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.
[...]
As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.
[...][2]
Thus a mission was started to archive as much of Twitch as we reasonably could. Our efforts can be found in the War Room and on the Talk page.
The current retention policy as of 2024, introduced in 2022, is 60 days for Partners, Turbo, and Prime users, 14 days for Affiliates, and 7 days for all other (free) users, with exceptions noted below.[3]
Known exceptions
Although most partner accounts have broadcasts deleted after 60 days, there are some exceptions. Most of these are esports tournament channels, but other channels may be excluded for reasons such as being culturally significant. Below is an incomplete list of these exceptions.
Channel | Indefinite since | Probable reason |
---|---|---|
Dota 2 The International | Policy Inception | Esports tournament |
Riot Games, LCS, LEC | Policy Inception | Game developer & esports tournaments |
Beyond The Summit | Policy Inception | Esports broadcaster |
Twitch Plays Pokemon | Policy Inception | Culturally significant |
Games Done Quick | Policy Inception | Culturally significant (charity fundraiser) |
Tip of the Hats | Policy Inception | Culturally significant (charity fundraiser) |
Rocket League | April 2016 | Game channel & esports tournaments |
Brawlhalla | February 2015 | Game channel & esports tournaments |
Evo | Policy Inception | Esports tournament |
ESL CS | Policy Inception | Esports broadcaster |
DreamHack Counter-Strike | Policy Inception | Esports & LAN event broadcaster |
shroud | Policy Inception? | Notable streamer & esports personality |
Ninja | January 2018 | Notable streamer |
pokimane | Policy Inception | Notable streamer |
sodapoppin | October 2020? | Notable streamer |
Blizzard | Policy Inception | Game developer |
Warcraft | Policy Inception | Game channel & esports tournaments |
StarCraft | Policy Inception | Game channel & esports tournaments |
PlayOverwatch | August 2017? | Game channel & esports tournaments |
PlayHearthstone | Policy Inception | Game channel & esports tournaments |
Warframe | April 2017 | Game channel |
Fortnite | Channel Creation | Game channel & esports tournaments |
Roblox | Policy Inception | Game channel/developers/platform |
FACEIT TV | Policy Inception | Esports service & tournaments |
teamfortress.tv | Policy Inception | Esports broadcaster |
The GD Studio | Policy Inception | Esports broadcaster |
Room On Fire | Policy Inception | Esports broadcaster |
Ninjas In Pyjamas | Policy Inception | Esports team/broadcaster |
Bob Ross | June 2016 | TV show & culturally significant |
PGL | February 2015 | Esports tournament organizer |
PGL Dota 2 | Channel Creation? | Esports tournament organizer |
Minecraft | Policy Inception | Game channel |
Mojang | Policy Inception | Game developers, including Minecraft |
Notch | Policy Inception | Minecraft creator |
deadmau5 | Policy Inception | Musician |
Porter Robinson | Policy Inception? | Musician |
MOGRA | Policy Inception? | Music club in Akihabara, JP |
Yogscast | Policy Inception | Culturally significant |
Twitch | Policy Inception | The site (duh) |
Twitch Presents | March 2017 | Also the site, special event streams |
The Game Awards | 2016? | Awards show |
The Esports Awards | June 2017 | Awards show |
AMD | Policy Inception | CPU/GPU manufacturer |
NVIDIA GeForce | Policy Inception | GPU manufacturer (uses NVIDIA these days) |
2017 | Website | |
Xbox | Policy Inception | Console platform |
PlayStation | Policy Inception | Console platform |
Nintendo | Policy Inception | Game developer & console manufacturer |
PAX | Policy Inception | Gaming convention |
IGN | Policy Inception | Magazine & website |
PokerStars | Policy Inception | Poker website & tournaments |
joeykaotyk | July 7, 2017 | Notable streamer |
AOC | Channel Creation | Politician |
Pokemon | Policy Inception | Game franchise & company |
Site structure
- HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879
- Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t
- You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t
- There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true
- JSON file contains list of URLs to their FLV files.
- Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)
- http://www.twitchtools.com/video-download.php provides the above service
yt-dlp -i
appears to do some of them- Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all
- Is there any irregularities? Differences between highlights and past broadcasts?
Software
Twitch Chat Downloader
Download link: https://github.com/PetterKraabol/Twitch-Chat-Downloader ※This project is no longer being updated
Active fork: https://github.com/TheDrHax/Twitch-Chat-Downloader
Twitch Chat Downloader (tcd) is software to download the twitch chat of a given twitch VOD, or a selection of VODs by channel. tcd requires[1] "Python 3.8 or newer" and "Twitch client ID & client secret" in order to work. The Twitch client ID is easy to get; just click "Login with Twitch" at https://dev.twitch.tv/console/apps then you will see it in the URL. The Twitch client secret is harder to get; you will probably have to log in to Twitch to get it. For more info on "Client ID" and "Client Secret", see https://dev.twitch.tv/docs/api and https://dev.twitch.tv/docs/authentication/register-app. While an account is not required to see the twitch chat of a VOD, the ability to download twitch chat via tcd is loginwalled (you need a twitch account and probably also API access).
Note that you will want to (at least) use the JSON output to preserve the most information about the chat.
Example command to download the chat:
tcd --channel "AOC" -o "." --format "default,irc,json,srt,ssa" --timezone "UTC" --verbose --debug --log
The output files will be files named after the twitch video ID with the given file extensions.
TwitchLeecher
Download link: https://github.com/Franiac/TwitchLeecher
Software to download the raw .ts segments of a twitch VOD (of a given quality) and combine them, with an optional conversion to MP4.
Example filename template (set in settings) to include all of the relevant information of the VOD in the filename:
{date}_{time24}_{channel}_{id}_{game}_{title}_{res}_{fps}fps_{start}_{end}
Twitch-archiver
Twitch-Arciver is a software that allows you to download VOD,Chat from Twitch, no ClientID etc. is required by default Docker or Python environment required Already downloaded items are written to the DB file, avoiding duplication
Download link: https://github.com/Brisppy/twitch-archiver
Example 1: Downloading all VODs (video and chat) from the Brisppy channel to a specific directory:
twitch-archiver -c Brisppy -d "Z:\\twitch-archive":
Example 2: Download Chat Only:
twitch-archiver -D --archive-only --chat -c Brisppy,Brisppy2 -d YourDirectory
Example 3: Downloading specific VODs, only the video, and using a specified number of download threads:
twitch-archiver -v 1276315849,1275305106 -d "/mnt/twitch-archive" -V -t 10
twitch-logger-docker
Twitch Logger is a Docker container designed to log IRC messages from multiple Twitch channels to files. Log files are split monthly and older logs are compressed using GZIP. Timestamps are recorded in UTC format by default.
Key Features:
- Logs multiple Twitch channels concurrently.
- Stores logs in separate folders for each channel.
- Splits log files monthly.
- Compresses old log files with GZIP.
- Uses UTC timestamps for accurate timekeeping.
- Offers options to log raw IRC messages or formatted Twitch chat messages.
- Allows customization of timezone, timestamp format, log rotation, Chatty-style formatting, and Logstash formatting.
- It can also log IRC chat even when the stream is offline.
Example Usage:
Basic Example: Logs messages from the "dbkynd" channel to the C:/Users/DBKynd/Desktop/logs directory.
docker run -d -e TWITCH_CHANNELS=dbkynd -v C:/Users/DBKynd/Desktop/logs:/app/logs dbkynd/twitch-logger
Advanced Example: Logs messages from the "dbkynd" and "annemunition" channels, using Los Angeles time, daily log rotation, Chatty-style formatting, and storing them in the C:/Users/DBKynd/Desktop/logs directory without zipping old logs.
docker run -d -e TWITCH_CHANNELS=dbkynd,annemunition -e RAW=false -e ZIP=false -e TZ=America/Los_Angeles -e TS_FORMAT=hh:mm:ss -e DATE_PATTERN=YYYY-MM-DD -e CHATTY_STYLE=true -v C:/Users/DBKynd/Desktop/logs:/app/logs dbkynd/twitch-logger
Examples of Timestamp and Date Pattern Configuration:
TS_FORMAT="YYYY-MM-DD HH:mm:ss" displays timestamps as 2020-04-18 18:03:35.
TS_FORMAT="hh:mm:ss A" displays timestamps as 05:56:57 PM.
DATE_PATTERN="YYYY-MM" rotates log files monthly (default).
DATE_PATTERN="YYYY-MM-DD" rotates log files daily.
DATE_PATTERN="null" disables log rotation, logging all messages to a single file (warning: this file can become very large). In this case, the ZIP option is not used.
Archives
By Archive Team
Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.
A work-in-progress searchable index is now available!
Renegade Stream Archives
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.
- Vinesauce Stream Archival Effort - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.
- Klaxa.eu's Archive of The 4chan Cup - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.
By TheTechRobo (#burnthetwitch)
User:TheTechRobo runs an IRC bot that archives Twitch metadata and chat into WARC and JSON in #burnthetwitch (on hackint). Please go easy on the bot! I don't have the best bandwidth or storage. Source code is at github. The data can be found at this collection.
The directory structure isn't too difficult to understand. First, at the top level of the IA item, is just a folder with the twitch channel's name. This contains all the relevant data. Inside that folder, tar files are tars of grabs of the channel metadata and list of VODs, and the folders are the VOD grabs (remember that it is only metadata); inside the folders are one tar file for each grab of that VOD (there will almost always be only 1).
Note if you're going through the dataset: In items grabbed on October 29, 2022, there may be orphan records or partial grabs in the WARCS. This is not faulty data - it is valid data from Twitch's GQL, and the records should be fine, but I changed the port warcprox starts up on for testing so I could test and run prod at the same time, but forgot to change the port used in commands etc. The WARCs should be fine - they just might have unrelated data inside them. Also, items from before roughly June 28, 2023, may have one or more WARCs from failed grabs inside; however, again, these WARCs should be absolutely fine (the grab just failed).
Other
Butt Controversy
In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See http://www.dailydot.com/esports/twitch-butt-emotes/[IA•Wcite•.today•MemWeb] for details.)
The emotes can be found in the backend at: https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size> where id number ranges from 1 to 103667(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.
Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)
All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103667 were archived through Archivebot here; the resulting WARCs can be downloaded through the viewer.
See Also
External links
- "TPP'S VODS WILL BE SAVED IN THEIR ENTIRETY"
- "Amazon Might Buy Video Game Streaming Service Twitch Before Google Can"
- "Twitch is Trying to Restore Deleted VODs"