packrat
A Block-Sorting Lossless Data Compression Tool
Implements the Burrows-Wheeler Transform pipeline with SA-IS suffix array construction for efficient compression of text and source code.
Abstract
packrat is a compression tool that implements the BWT-MTF-RLE-Huffman pipeline, achieving compression ratios of 10–20% on typical source code. The tool employs the SA-IS (Suffix Array by Induced Sorting) algorithm for linear-time suffix array construction, enabling efficient processing of large inputs.
A key feature is solid archive support, which groups files by type before compression to exploit cross-file redundancy patterns common in software projects.
Key Features
BWT Pipeline
Uses Burrows-Wheeler Transform, Move-to-Front, Run-Length Encoding, and Canonical Huffman coding-the same approach as bzip2.
Solid Archives
Groups files by extension (C, Python, JavaScript, etc.) and compresses them together to exploit cross-file redundancy.
SA-IS Algorithm
Implements the Suffix Array by Induced Sorting algorithm for O(n) suffix array construction on large inputs.
Simple Interface
Command-line tool with straightforward flags for compression, decompression, archiving, and extraction.
Usage
# Build from source
make
# Compress a single file
./packrat -c input.txt output.prt
# Decompress
./packrat -d output.prt restored.txt
# Create a solid archive of a directory
./packrat -a --solid archive.prt src/
# Extract archive contents
./packrat -x archive.prt dest/