Difference between revisions of "The Pirate Bay"

From Archiveteam
Jump to navigation Jump to search
(→‎Archival Tools: list data types and structure)
(→‎Backups: clarify format of files in torrent)
Line 49: Line 49:
* [http://thepiratebay.se/torrent/7028505/The_Pirate_Bay_full_siterip_2012 Siterip]: This 3GB archive saves all the html pages from the Piratebay, including comments. No torrent files, as usual.
* [http://thepiratebay.se/torrent/7028505/The_Pirate_Bay_full_siterip_2012 Siterip]: This 3GB archive saves all the html pages from the Piratebay, including comments. No torrent files, as usual.
** '''Magnet Link''': ''magnet:?xt=urn:btih:3ab8dd096aea63ddf668a127b81ba7fb6799364d&dn=The+Pirate+Bay+full+siterip+2012&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''
** '''Magnet Link''': ''magnet:?xt=urn:btih:3ab8dd096aea63ddf668a127b81ba7fb6799364d&dn=The+Pirate+Bay+full+siterip+2012&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''
* [https://thepiratebay.se/torrent/7706886/Backup_of_The_Pirate_Bay_%28IDs__3200000_-_7700000%29 IDs 3200000-7699999]: Backup as of 2012-10-06. It includes all comments, filelists, and details in csv format.
* [https://thepiratebay.se/torrent/7706886/Backup_of_The_Pirate_Bay_%28IDs__3200000_-_7700000%29 IDs 3200000-7699999]: tpb2csvBackup as of 2012-10-06. It includes all comments, filelists, and details in csv format.
** '''Magnet Link''': ''magnet:?xt=urn:btih:0dfe31d5d91058bcbe5cfbcf98646700890afea0&dn=Backup+of+The+Pirate+Bay+%28IDs%3A+3200000+-+7700000%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''
** '''Magnet Link''': ''magnet:?xt=urn:btih:0dfe31d5d91058bcbe5cfbcf98646700890afea0&dn=Backup+of+The+Pirate+Bay+%28IDs%3A+3200000+-+7700000%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''
* [https://thepiratebay.se/torrent/8044295/Backup_of_The_Pirate_Bay_%28IDs__7700000_-_7999999%29 IDs 7700000-7999999]: Backup as of 2013-01-09. It includes all comments, filelists, and details in csv format.
* [https://thepiratebay.se/torrent/8044295/Backup_of_The_Pirate_Bay_%28IDs__7700000_-_7999999%29 IDs 7700000-7999999]: tpb2csv Backup as of 2013-01-09. It includes all comments, filelists, and details in csv format.
** '''Magnet Link''': ''magnet:?xt=urn:btih:9f9c8cab8b68956a25d6e8c190e5e8dc8cf7186c&dn=Backup+of+The+Pirate+Bay+%28IDs%3A+7700000+-+7999999%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''
** '''Magnet Link''': ''magnet:?xt=urn:btih:9f9c8cab8b68956a25d6e8c190e5e8dc8cf7186c&dn=Backup+of+The+Pirate+Bay+%28IDs%3A+7700000+-+7999999%29&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.istole.it%3A6969&tr=udp%3A%2F%2Ftracker.ccc.de%3A80''



Revision as of 23:19, 21 December 2014

The Pirate Bay
The Pirate Bay logo
Thepiratebay homepage screenshot.png
URL http://www.thepiratebay.org/
Status Offline
Archiving status Partially Saved!
Archiving type Unknown
IRC channel #yarharfiddlededee (on hackint)

The Pirate Bay is one of the largest and most popular torrent search engines.

It's still having persistent legal problems. The tracker went down in November 2012, but the site still serves torrents and magnet links. If a torrent is lost, it becomes impossible to connect to other computers distributing the shared files. Considering that there are links to TPB all over this wiki, this site is pretty dang important.

On December 2014, the website went offline due to an alleged raid[1].

In case of Fire

To prevent damage to the Archive Team if The Pirate Bay ever goes down, we should include a Magnet Link next to every TPB link we have.

Archival Methods

We can simply scrape the magnet links, descriptions, and comments. The hard part would probably be keeping it all updated... (Maybe we could use a git repository, and pull as necessary?)

Magnet links are provided in the Pirate Bay Magnet Archive below, and descriptions and comments are in the siterip.

Archival Tools

  • Magnet link Dumper: A perl script that dumps magnet links into a single text file. It was used to make the below magnet archive.
  • tpb2csv: scrapes the pirate bay website and strips out all the html crap, leaving only pure sweet metadata.
    • details.csv: Title, Type, Files, Size, IMDB, Spoken Languages, Texted Languages, Tags, Quality (+), Quality (-), Uploaded, By, User Type, Seeders, Leechers, Info Hash, Picture, Capture Date
    • description.txt
    • comments.csv: User Type, Username, Date, Text

Backups


  • Pirate Bay Metadata Git Repo: https://github.com/tpb-archive tpb2csv scraped metadata including comments, file lists, descriptions, details. link broken, github censorship?
    • IDs 8000000-8999999 tpb2csv scraped metadata not included by "Backup" torrents listed above, fetched on 2013-06-02

at the end of 2014 there were 11000000 ish torrent ID's

References