Twitch.tv

From Archiveteam
Revision as of 06:09, 14 September 2014 by Wickedplayer494 (talk | contribs) (let's make it more specific)
Jump to navigation Jump to search
Twitch.tv
Twitch.tv logo
Twitch homepage screenshot.png
URL http://twitch.tv
Status Special Case (Archives to be deleted)
Archiving status Saved! (popular videos only), Lost an unknown amount of other videos
Archiving type Unknown
Project source Phase 1,Phase 2, Items, Index
Project tracker Phase 1, Phase 2
IRC channel #burnthetwitch (on hackint)

Justin.tv—sorry, cough, I mean to say—Twitch.tv is a live video streaming service.

Twitch was supposedly acquired by YouTube/Google but Amazon was the final buyer.[1]

Shutdown

Changes To VODs On Twitch

Aug 06 2014 · Engineering, Tech

Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.

In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.

[...]

Looking at Viewership Data

We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.

We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.

Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.

[...]

As for existing past broadcasts, beginning three weeks from today, we will begin removing them from Twitch servers. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.

[...][2]

Site structure

Storage Issues

  • How to decide which are important? 10+ views again? Do a discovery crawl first?
  • Tahoe-LAFS? Grab all the videos into temp storage?
  • Compress all the unwatched videos into postage stamp sized videos?

Discovery

All items discovered are located at twitchtv-items. A collated JSON dump is available.

How can I help?

Download and fire up your warrior! Then select Twitch Phase 2. Better yet, select Archive Team's Choice.

Alternatively for advanced users, you can run the scripts manually see below.

Don't forget to donate to the Internet Archive who will be hosting these files. Disk space is cheap but maintaining them is not!

For those not using the Warrior

Advanced User Quick Start

Please run these sysctl tweaks to optimize uploads:

# Add to /etc/sysctl.conf and run "sysctl -p"
# increase TCP max buffer size settable using setsockopt()
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# increase Linux autotuning TCP buffer limit
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

You can also issue them without modifying /etc/sysctl.conf by running e.g.

sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216

, but be aware that those won't stick around across reboots.

apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev
git clone https://github.com/ArchiveTeam/twitchtv-grab
cd ./twitchtv-grab
pip install seesaw
./get-wget-lua.sh

...

pip install requests


wget-lua may of failed earlier, if so then:

cd get-wget-lua.tmp
mv src/wget ../wget-lua
cd ..

And finally to actually run

run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server

For troubleshooting and the details please see README.

What we are saving

Currently:

Next:

Dollar figures shown to illustrate cost of permanent archives. These are not actual values but are meant to represent simplified values and act as a sane budget. Dollars in USD at $2000 per TB estimate (not per TB of disk space alone).

Channels not included:

  • speeddemosarchivesda: already in IA
  • vinesauce: avoiding duplication, see below
  • and others

Anything culturally significant to add? Comment on Talk:Twitch.tv. Don't forget to sign your comments with ~~~~ .

Archives

By Archive Team

Archives will be made available later as WARC files in the archiveteam_twitchtv collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files.

A work-in-progress searchable index is now available!

Renegade Stream Archives

These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.

See Also

External links

References