Data compression algorithms and tools

This list contains the most popular data compression algorithms and tools. All of them are free and open source, an important detail if you want to preserve data for a long time from now and to be able to decompress the data in the future.

General purpose compression

LZMA/LZMA2

Aside from 7z, there's also

xz
- Based on LZMA SDK
- Commonly included by default in Linux distros
lzip
- Not as widely used as xz
- Well defined file format and emphasis on file integrity
- lziprecover can correct some bit-flip errors and merge damaged copies.

Zip

Available by default in any Windows version available today, but if you need cross-platform, use 7-zip.

Zstandard

https://facebook.github.io/zstd/
Very efficient in both time and compression ratio.
First-class support for custom dictionaries, which is particularly useful when compressing many small data units (e.g. WARC file with many HTML pages from one particular website). Using a trained dictionary for the compression massively improves the compression ratio in such scenarios.

Heavy duty compression

These programs often use large amounts of memory to get the best possible compression ratio.

lrzip

"This is a compression program optimised for large files" -lrzip readme

lrzip is fantastic for archiving - the compression ratio improves as the size of the input file grows - albeit a terribly slow compressor. lrzip really shines when compressing large sets of redundant information - but distant, and otherwise unconnected. General purpose compression algorithms would never see this, given their tiny compression window.

lrzip benchmarks

ZPAQ

http://mattmahoney.net/dc/zpaq.html
Uses deduplication, journaling, and several different compression algorithms (LZ77, BWT, and PAQ context mixing)
Supported by lrzip
EXTREMELY slow

KGB

http://sourceforge.net/projects/kgbarchiver

Uses the PAQ6 compression algorithm. Excellent compression ratio (better than 7z), but a bit slow.

You can install it in Ubuntu with: sudo apt-get install kgb

How to:

kgb -m file.kgb originalfile
m is a number from 0 to 9 (lowest compression ratio from higher; higher use 1616 MB of RAM, a lot of CPU and time)

External links

http://en.wikibooks.org/wiki/Guide_to_Unix/Commands/File_Compression

Data compression algorithms and tools

Contents

General purpose compression

7z

Bzip2

Gzip

LZMA/LZMA2

Zip

Zstandard

Heavy duty compression

lrzip

ZPAQ

KGB

External links

Navigation menu

Data compression algorithms and tools

General purpose compression

7z

Bzip2

Gzip

LZMA/LZMA2

Zip

Zstandard

Heavy duty compression

lrzip

ZPAQ

KGB

External links

Navigation menu

Search