packrat

A Block-Sorting Lossless Data Compression Tool

Implements the Burrows-Wheeler Transform pipeline with SA-IS suffix array construction for efficient compression of text and source code.

Abstract

packrat is a compression tool that implements the BWT-MTF-RLE-Huffman pipeline, achieving compression ratios of 10–20% on typical source code. The tool employs the SA-IS (Suffix Array by Induced Sorting) algorithm for linear-time suffix array construction, enabling efficient processing of large inputs.

A key feature is solid archive support, which groups files by type before compression to exploit cross-file redundancy patterns common in software projects.

Key Features

BWT Pipeline

Uses Burrows-Wheeler Transform, Move-to-Front, Run-Length Encoding, and Canonical Huffman coding-the same approach as bzip2.

Solid Archives

Groups files by extension (C, Python, JavaScript, etc.) and compresses them together to exploit cross-file redundancy.

SA-IS Algorithm

Implements the Suffix Array by Induced Sorting algorithm for O(n) suffix array construction on large inputs.

Simple Interface

Command-line tool with straightforward flags for compression, decompression, archiving, and extraction.

Usage

# Build from source
make

# Compress a single file
./packrat -c input.txt output.prt

# Decompress
./packrat -d output.prt restored.txt

# Create a solid archive of a directory
./packrat -a --solid archive.prt src/

# Extract archive contents
./packrat -x archive.prt dest/