Overview
DEFLATE is the compression algorithm used in GZIP files. It combines LZ77 sliding window compression with Huffman coding to achieve efficient lossless compression.
The DEFLATE algorithm is specified in RFC 1951 .
DEFLATE Function
From our_zlib.h:71:
char * deflate ( char * filename , char * bytes , size_t len , size_t * out_len )
Parameters:
filename - Pointer to the filename (used for GZIP header metadata)
bytes - Pointer to data to compress
len - Number of bytes of input data
out_len - Output parameter for the length of the compressed data
Returns:
A malloc’d buffer containing a complete GZIP file (header + compressed data + footer)
NULL on error (NULL pointers, compression failure, etc.)
The returned buffer is malloc’d. The caller is responsible for freeing it.
DEFLATE Algorithm
The DEFLATE compression process consists of several steps:
Build GZIP header
Create a GZIP member header with metadata
Chunk data into blocks
Split input into blocks of at most 65,535 bytes
LZ77 compression
Apply LZ77 to find repeated sequences
Huffman encoding
Encode LZ77 tokens with Huffman coding
Build GZIP footer
Compute CRC and add footer
Return complete GZIP file
Concatenate header + compressed blocks + footer
DEFLATE Flowchart
From README lines 551-646:
INPUT: Raw uncompressed data
| Gets chunked into 65535-byte blocks
v
+------------------------------------------------------------------+
| LZ77 TRANSFORM |
+------------------------------------------------------------------+
| - Scan input for matches in sliding window |
| - Output: Stream of literals + <length, distance> pairs |
+------------------------------------------------------------------+
|
| Stream: literals and match tokens
|
v
+------------------------------------------------------------------+
| ENCODE TO LITERAL/LENGTH & DISTANCE CODES |
+------------------------------------------------------------------+
| - Map lengths (3-258) to codes (257-285) + extra bits |
| - Map distances (1-32768) to codes (0-29) + extra bits |
| - Literals (0-255) stay as codes (0-255) |
| - Code 256 = end-of-block |
+------------------------------------------------------------------+
|
v
+------------------------+
| Choose BTYPE |
+------------------------+
|
+----+----+
| |
BTYPE=01 BTYPE=10
(Fixed) (Dynamic)
| |
v v
+--------+ +------------------------------------------+
| FIXED | | BUILD HUFFMAN TREES |
| CODES | +------------------------------------------+
+--------+ | 1. Analyze frequency of lit/len codes |
| | 2. Build optimal lit/len Huffman tree |
| | 3. Analyze frequency of distance codes |
| | 4. Build optimal distance Huffman tree |
| +------------------------------------------+
| |
| v
| +------------------------------------------+
| | EXTRACT CODE LENGTHS FROM TREES |
| +------------------------------------------+
| |
| v
| +------------------------------------------+
| | BUILD CODE LENGTH HUFFMAN TREE |
| +------------------------------------------+
| |
| v
| +------------------------------------------+
| | WRITE DYNAMIC BLOCK HEADER |
| | - HLIT, HDIST, HCLEN |
| | - Code length tree |
| | - Lit/len and distance code lengths |
| +------------------------------------------+
| |
+----------------------+
|
v
+------------------------------------------------------------------+
| ENCODE LZ77 DATA WITH HUFFMAN CODES |
+------------------------------------------------------------------+
| - For each token: |
| * Lookup Huffman code from tree |
| * Write Huffman code bits |
| * Write extra bits (if any) |
| - Write end-of-block (code 256) |
+------------------------------------------------------------------+
|
v
OUTPUT: Compressed DEFLATE block
Block Size Constraints
From README lines 433-444:
For simplicity and consistency during testing, compressed blocks produced by your code should hold at most 65535 bytes, the same restriction placed on uncompressed blocks.
Split input data into chunks of at most 65,535 bytes before compressing each chunk.
From our_zlib.h:15:
#define DEFLATE_BLOCK_SIZE 65535
While your compressor should create blocks of at most 65,535 bytes, your decompressor must handle blocks of any size.
Implementation Steps
Create a GZIP member header:
// Write magic number
output [pos ++ ] = 0x 1f ;
output [pos ++ ] = 0x 8b ;
// Write compression method (8 = DEFLATE)
output [pos ++ ] = 8 ;
// Write flags
unsigned char flags = 0 ;
if (filename && filename [ 0 ]) {
flags |= F_NAME; // Include original filename
}
output [pos ++ ] = flags;
// Write modification time (use current time or 0)
unsigned int mtime = time ( NULL );
write_32_le (output, & pos , mtime);
// Write extra flags (0 = default compression)
output [pos ++ ] = 0 ;
// Write OS (3 = Unix)
output [pos ++ ] = 3 ;
// Write optional filename
if (flags & F_NAME) {
strcpy (( char * ) & output [pos], filename);
pos += strlen (filename) + 1 ; // Include null terminator
}
Step 2: Chunk Data into Blocks
Split the input into blocks:
size_t offset = 0 ;
while (offset < len) {
size_t block_size = (len - offset > DEFLATE_BLOCK_SIZE)
? DEFLATE_BLOCK_SIZE
: (len - offset);
// Compress this block
compress_block (bytes + offset, block_size, is_final_block);
offset += block_size;
}
Step 3: Compress Each Block
For each block:
void compress_block ( const unsigned char * data , size_t len , int is_final ) {
// Step 3a: Apply LZ77 compression
size_t num_tokens;
lz_token_t * tokens = lz_compress_tokens (data, len, & num_tokens);
if ( ! tokens) {
return ; // Error
}
// Step 3b: Huffman encode the tokens
unsigned long bits_written;
size_t encoded_len;
unsigned char btype;
unsigned char * encoded = huffman_encode_tokens (
tokens, num_tokens, & bits_written, & encoded_len, & btype);
free (tokens);
if ( ! encoded) {
return ; // Error
}
// Step 3c: Write block header
write_block_header (is_final, btype);
// Step 3d: Write encoded data
write_block_data (encoded, bits_written);
free (encoded);
}
From README lines 447-456:
Each 'block' of compressed data begins with three header **bits** (NOT bytes).
The first, `BFINAL` is 1 if and only if this block is the last block.
The next two, `BTYPE` contains information about the compression mode.
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (your program MUST throw an error if it encounters this)
From our_zlib.h:10-14:
#define BF_SET 1 // BFINAL = 1
#define BT_STATIC 1 // Fixed Huffman
#define BT_DYNAMIC 2 // Dynamic Huffman
#define BT_NO_COMPRESSION 0 // Uncompressed
#define BT_MASK 3 // Mask for BTYPE bits
Writing the header:
void write_block_header ( int is_final , int btype ) {
// 3-bit header: BFINAL (1 bit) + BTYPE (2 bits)
unsigned char header = 0 ;
if (is_final) {
header |= 1 ; // BFINAL = 1
}
header |= (btype << 1 ); // BTYPE in bits 1-2
// Write 3 bits to output stream
write_bits (header, 3 );
}
Step 5: Write Compressed Data
Write the Huffman-encoded data:
void write_block_data ( const unsigned char * data , unsigned long num_bits ) {
// Copy bits to output stream
for ( unsigned long i = 0 ; i < num_bits; i ++ ) {
int bit = ( data [i / 8 ] >> (i % 8 )) & 1 ;
write_bit (bit);
}
}
After all blocks are written:
// Compute CRC32 of original data
unsigned int crc = get_crc (( unsigned char * ) bytes , len);
write_32_le (output, & pos , crc);
// Write uncompressed size (modulo 2^32)
write_32_le (output, & pos , ( unsigned int ) len );
Complete Implementation Outline
char * deflate ( char * filename , char * bytes , size_t len , size_t * out_len ) {
if ( ! bytes || ! len || ! out_len) return NULL ;
// Allocate output buffer (estimate)
size_t output_capacity = len + 1024 ; // Header + footer + overhead
unsigned char * output = malloc (output_capacity);
if ( ! output) return NULL ;
size_t pos = 0 ;
unsigned long bit_pos = 0 ;
// 1. Write GZIP header
pos += write_gzip_header (output + pos, filename);
bit_pos = pos * 8 ;
// 2. Compress data in blocks
size_t offset = 0 ;
while (offset < len) {
size_t block_size = (len - offset > DEFLATE_BLOCK_SIZE)
? DEFLATE_BLOCK_SIZE
: (len - offset);
int is_final = (offset + block_size >= len);
// LZ77 compression
size_t num_tokens;
lz_token_t * tokens = lz_compress_tokens (
( unsigned char * )bytes + offset, block_size, & num_tokens);
if ( ! tokens) {
free (output);
return NULL ;
}
// Huffman encoding
unsigned long block_bits;
size_t block_bytes;
unsigned char btype;
unsigned char * block_data = huffman_encode_tokens (
tokens, num_tokens, & block_bits, & block_bytes, & btype);
free (tokens);
if ( ! block_data) {
free (output);
return NULL ;
}
// Ensure buffer has space
while (pos + block_bytes + 100 > output_capacity) {
output_capacity *= 2 ;
output = realloc (output, output_capacity);
}
// Write block header (3 bits)
unsigned char block_header = (is_final ? 1 : 0 ) | (btype << 1 );
write_bits_to_buffer (output, & bit_pos, block_header, 3 );
// Write block data
copy_bits_to_buffer (output, & bit_pos, block_data, block_bits);
free (block_data);
offset += block_size;
}
// 3. Pad to byte boundary
if (bit_pos % 8 != 0 ) {
bit_pos = (bit_pos + 7 ) & ~ 7 ;
}
pos = bit_pos / 8 ;
// 4. Write GZIP footer
unsigned int crc = get_crc (( unsigned char * )bytes, len);
write_32_le (output, & pos, crc);
write_32_le (output, & pos, ( unsigned int )len);
* out_len = pos;
return ( char * )output;
}
Helper Functions
Writing Little-Endian Integers
void write_32_le ( unsigned char * buf , size_t * pos , unsigned int value ) {
buf [( * pos) ++ ] = value & 0x FF ;
buf [( * pos) ++ ] = (value >> 8 ) & 0x FF ;
buf [( * pos) ++ ] = (value >> 16 ) & 0x FF ;
buf [( * pos) ++ ] = (value >> 24 ) & 0x FF ;
}
void write_16_le ( unsigned char * buf , size_t * pos , unsigned short value ) {
buf [( * pos) ++ ] = value & 0x FF ;
buf [( * pos) ++ ] = (value >> 8 ) & 0x FF ;
}
Bit Writing
void write_bits_to_buffer ( unsigned char * buf , unsigned long * bit_pos ,
unsigned int value , int num_bits ) {
for ( int i = 0 ; i < num_bits; i ++ ) {
int byte_idx = ( * bit_pos) / 8 ;
int bit_idx = ( * bit_pos) % 8 ;
if (value & ( 1 << i)) {
buf [byte_idx] |= ( 1 << bit_idx);
}
( * bit_pos) ++ ;
}
}
Testing DEFLATE
Test Cases
Input: 0 bytes Expected: Valid GZIP file with header + empty block + footer
Input: “Hello, World!” (13 bytes) Expected: Single block, compressible
Input: 200 KB of data Expected: Multiple blocks (at least 4), each ≤ 65,535 bytes
Input: “AAAAAAA…” (1000 A’s) Expected: Very small output due to LZ77 matches
Input: Random bytes (low compressibility) Expected: Output nearly as large as input
Validation
Round-trip test: Compress then decompress, verify output matches input
Standard tools: Use gunzip to decompress your output
Header validation: Check magic number, CRC, size are correct
Block structure: Verify BFINAL is set only on last block
# Test with gunzip
./bin/png -i input.txt -c -o output.gz
gunzip -t output.gz # Test integrity
gunzip -c output.gz > result.txt # Decompress
diff input.txt result.txt # Should be identical
Error Handling
Return NULL if:
bytes is NULL
len is 0 (or handle as special case)
out_len is NULL
lz_compress_tokens() fails
huffman_encode_tokens() fails
Memory allocation fails
Memory Management
Free all intermediate buffers:
LZ77 tokens
Huffman-encoded blocks
Temporary buffers
The caller must free the returned buffer.
Implementation Reference
See main.c:102-139 for how deflate() is called:
case M_DEFLATE: {
// Read entire input file
char * input = malloc (( size_t )file_size);
fread (input, 1 , ( size_t )file_size, file);
// Compress
size_t out_len = 0 ;
char * out_buf = deflate (filename, input, ( size_t )file_size, & out_len);
free (input);
if ( ! out_buf) {
return 1 ; // Error
}
// Write compressed data
FILE * out = fopen (output_filename, "wb" );
fwrite (out_buf, 1 , out_len, out);
fclose (out);
free (out_buf);
break ;
}
Summary
Key DEFLATE implementation points:
Combine LZ77 + Huffman for compression
Build complete GZIP file (header + blocks + footer)
Split data into blocks of ≤ 65,535 bytes
Each block has 3-bit header (BFINAL + BTYPE)
Choose between static and dynamic Huffman per block
Compute CRC32 of original data
Write ISIZE in footer
All multi-byte integers are little-endian
Return malloc’d buffer
Handle all error cases
Next Steps