From Archiveteam
Jump to navigation Jump to search

Twitch Logo.png—sorry, cough, I mean to say— is a live video streaming service.

Twitch was rumored to have been acquired by YouTube/Google but Amazon was the final buyer.[1]


Changes To VODs On Twitch

Aug 06 2014 · Engineering, Tech

Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.

In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.


Looking at Viewership Data

We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.

We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.

Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.


As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.


Site structure

Storage Issues

  • How to decide which are important? 10+ views again? Do a discovery crawl first?
  • Tahoe-LAFS? Grab all the videos into temp storage?
  • Compress all the unwatched videos into postage stamp sized videos?


All items discovered are located at twitchtv-items. A collated JSON dump is available.

How can I help?

Download and fire up your warrior! Then select Twitch Phase 2. Better yet, select Archive Team's Choice.

Alternatively for advanced users, you can run the scripts manually see below.

Don't forget to donate to the Internet Archive who will be hosting these files. Disk space is cheap but maintaining them is not!

For those not using the Warrior

Advanced User Quick Start

Please run these sysctl tweaks to optimize uploads:

# Add to /etc/sysctl.conf and run "sysctl -p"
# increase TCP max buffer size settable using setsockopt()
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# increase Linux autotuning TCP buffer limit
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

You can also issue them without modifying /etc/sysctl.conf by running e.g.

sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216

, but be aware that those won't stick around across reboots.

apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
git clone
cd ./twitchtv-grab
pip install seesaw


pip install requests

wget-lua may have failed earlier, if so then:

cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..

And finally to actually run

run-pipeline --concurrent 2 YOURNICKHERE --disable-web-server

For troubleshooting and the details please see README.

What we are saving



Dollar figures shown to illustrate cost of permanent archives. These are not actual values but are meant to represent simplified values and act as a sane budget. Dollars in USD at $2000 per TB estimate (not per TB of disk space alone).

Channels not included:

  • speeddemosarchivesda: already in IA
  • vinesauce: avoiding duplication, see below
  • and others

Anything culturally significant to add? Comment on Don't forget to sign your comments with ~~~~ .