Search results

Jump to navigation Jump to search
  • ...n URLs across the various domains were grabbed, resulting in 102.6 GiB of (compressed) WARCs.
    6 KB (1,017 words) - 23:41, 29 December 2023
  • ...an be exported with the "Google Takeout" interface which sends a series of compressed archives with data from the various services. It's not always reliable.<ref
    11 KB (1,629 words) - 06:01, 25 November 2023
  • ...ch record is compressed via gzip. A gzip file supports multiple "members"; compressed warcs end in .warc.gz. According to the guidelines, WARC files should top o
    18 KB (2,481 words) - 01:00, 24 March 2024
  • ...ML text, but doesn't help at all when downloading material that is already compressed, like JPEG or PNG files. To enable compression, use:
    7 KB (1,114 words) - 16:27, 17 January 2017
  • ...| bigint(20) | NO | | NULL | | (length of the (compressed) individual record)
    13 KB (1,827 words) - 16:45, 14 November 2021
  • ...13_common_crawl_index_urls Common Crawl index] is a very big (21 gigabytes compressed) list of URLs in the Common Crawl corpus. Grepping this list may well revea
    9 KB (1,436 words) - 02:35, 18 September 2023
  • | 2,200,001 || 2,300,000 || '''Uploaded''' || 50gb compressed || Darkstar | 2,300,001 || 2,400,000 || '''Uploaded''' || 70gb compressed || Darkstar
    54 KB (6,859 words) - 16:44, 14 November 2021
  • The file is a tar archive compressed with [http://tukaani.org/xz/ `xz(1)`] from 674MB to 39MB. It contains the c
    12 KB (1,788 words) - 20:15, 14 March 2021
  • ! Archive Name !! Archive Type !! Size (Compressed) !! Size (Uncompressed) !! # of Profiles !! Volunteer
    10 KB (1,143 words) - 01:09, 15 November 2021
  • ...9/8c0e7aae4607412f82bf4a7a4486fe36/fat.jpg~tplv-banciyuan-obj.image is the compressed version of <!-- Referer ACL is enabled on img5, so don't make it a hyperlin
    20 KB (2,985 words) - 21:02, 16 July 2023
  • project pages and random other files wget got. Size: 400 mb compressed.
    14 KB (2,057 words) - 01:47, 11 November 2018
  • ...cly available Reddit comment for research. ~ 1.7 billion comments @ 250 GB compressed. Any interest in this?]
    18 KB (2,797 words) - 21:11, 11 September 2023
  • ...es, with the largest ones being a few tens (less than 100) megabytes (WARC compressed). Note that this is a rough estimate with a small sample. (That would mean
    22 KB (3,273 words) - 00:34, 5 December 2017
  • ...in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files). | divided up into 3,835 files in the last old-style dump, totaling 39 GB (compressed!). Also worked on as a Warrior job, see below.
    82 KB (13,434 words) - 16:01, 25 April 2024
  • ...the original video files in (semi-)offline storage, and store transcoded (compressed) versions on the Internet Archive.
    32 KB (4,950 words) - 22:40, 30 October 2023
  • ...it. So we put ourselves up on The Pirate Bay, we have a 641GB - because it compressed well - torrent, with 7,854 files that were basically 7zs, and we put that s
    41 KB (7,606 words) - 02:37, 12 December 2017
  • ...tation archive is available at {{IA collection|youtubeannotations}}, and a compressed copy can be found at {{IA item|youtubeannotations.tar.zstd}}. 16GB of just
    53 KB (7,698 words) - 07:32, 26 March 2024
  • we have a 641GB - because it compressed well
    64 KB (8,282 words) - 04:09, 25 June 2015

View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)