Huffman Encoding

Overview

Huffman encoding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency. More frequent characters get shorter codes, resulting in efficient compression.

The implementation uses a priority queue (min-heap) and binary tree to build optimal prefix codes.

Core Functions

Character Frequency Analysis

character_frequency

Analyzes character frequencies in a string.

huffman.py:5

def character_frequency(s: str) -> dict:
    """ 
    Takes a string a returns a dictionary of character frequencies.

    Args:
        s (str): The string to analyze.

    Returns:
        Dictionary containing character frequency.
    """

Parameters:

s (str): The string to analyze

Returns:

dict: Dictionary mapping each character to its frequency count

build_frequency_table

Creates a priority queue of characters based on their frequencies.

huffman.py:23

def build_frequency_table(s: str) -> PriorityQueue:
    """ 
    Accepts a string to encode and returns a heap of the characters.

    Args:
        s (str): The string to encode.

    Returns:
        A priority queue of the characters based on frequencies.
    """

Parameters:

s (str): The string to encode

Returns:

PriorityQueue: Priority queue with TreeNodes containing characters, prioritized by frequency

Huffman Tree Construction

build_huffman_tree

Builds a Huffman tree from a priority queue of characters.

huffman.py:42

def build_huffman_tree(pq: PriorityQueue) -> Tree:
    """ 
    Accepts a priority queue and returns a Huffman Tree.

    Args:
        pq (PriorityQueue): A PriorityQueue containing TreeNodes of characters based on frequencies. 

    Returns:
        A Huffman Tree.
    """

Parameters:

pq (PriorityQueue): Priority queue containing TreeNodes with characters

Returns:

Tree: The Huffman tree used for encoding/decoding

build_huffman_dictionary

Generates encoding dictionary from a Huffman tree.

huffman.py:60

def build_huffman_dictionary(node: TreeNode, bit_string: str="") -> dict:
    """
    Given a TreeNode, build a Huffman Dictionary.

    Args:
        node (TreeNode): The Huffman Node.
        bit_string (str): The bit string.

    Returns:
        A Huffman Dictionary.
    """

Parameters:

node (TreeNode): Root node of the Huffman tree
bit_string (str): Current bit string (used for recursion)

Returns:

dict: Dictionary mapping characters to their binary codes

Encoding and Decoding

huffman_encode

Encodes a string using a Huffman dictionary.

huffman.py:80

def huffman_encode(st: str, hd: dict) -> str:               
    """
    Encode the string using the Huffman Dictionary.

    Args:
        st (str): The string to encode.
        hd (dict): The Huffman Dictionary.

    Returns:
        The encoded string.
    """

Parameters:

st (str): The string to encode
hd (dict): The Huffman dictionary (character to bit string mapping)

Returns:

str: The encoded bit string

huffman_decode

Decodes an encoded string using a Huffman tree.

huffman.py:96

def huffman_decode(encoded_data: str, tree: Tree) -> str:
    """
    Decode the encoded data using the Huffman Tree.
    
    Args:
        encoded_data (str): The encoded data.
        tree (Tree): The Huffman Tree.

    Returns:
        The decoded data.
    """

Parameters:

encoded_data (str): The encoded bit string
tree (Tree): The Huffman tree used for decoding

Returns:

str: The decoded original string

Byte Conversion Utilities

bitstring_to_bytes

Converts a bit string to bytes for file storage.

huffman.py:120

def bitstring_to_bytes(s: str) -> bytes:
    """
    Convert a bitstring to bytes.

    Args:
        s (str): The bitstring.

    Returns:
        Bitstring converted to bytes.
    """

bytes_to_bitstring

Converts bytes back to a bit string.

huffman.py:132

def bytes_to_bitstring(ba: bytes, bitlength: int=8) -> str:
    """
    Convert bytes to bitstring.

    Args:
        ba (bytes): The bytes to convert.
        bitlength (int): The bit length.
    
    Returns:
        The bytes converted to bitstring.
    """

Complete Usage Example

Import required modules

from dsa.huffman import (
    build_frequency_table,
    build_huffman_tree,
    build_huffman_dictionary,
    huffman_encode,
    huffman_decode,
    bitstring_to_bytes,
    bytes_to_bitstring
)

Build the Huffman tree and dictionary

# Original string to compress
text = "this is an example for huffman encoding"

# Build frequency table (priority queue)
pq = build_frequency_table(text)

# Build Huffman tree
huffman_tree = build_huffman_tree(pq)

# Generate encoding dictionary
huffman_dict = build_huffman_dictionary(huffman_tree.root)

print("Huffman codes:")
for char, code in huffman_dict.items():
    print(f"'{char}': {code}")

Encode the text

# Encode the text
encoded = huffman_encode(text, huffman_dict)

print(f"Original length: {len(text) * 8} bits")
print(f"Encoded length: {len(encoded)} bits")
print(f"Compression ratio: {len(encoded) / (len(text) * 8):.2%}")
print(f"Encoded: {encoded[:50]}...")  # First 50 bits

Decode the text

# Decode back to original text
decoded = huffman_decode(encoded, huffman_tree)

print(f"Decoded: {decoded}")
print(f"Match original: {decoded == text}")

Convert to bytes for storage

# Convert to bytes for file storage
encoded_bytes = bitstring_to_bytes(encoded)

# Convert back to bitstring
bitlength = len(encoded) % 8 or 8
restored_bitstring = bytes_to_bitstring(encoded_bytes, bitlength)

print(f"Bytes size: {len(encoded_bytes)} bytes")

Algorithm Details

# The tree is built by repeatedly merging the two nodes with lowest frequency
while len(pq) > 1:
    priority1, node1 = pq.pop_pair()
    priority2, node2 = pq.pop_pair()
    node = TreeNode(node1.value + node2.value, node1, node2)
    pq.push(priority1 + priority2, node)

Time Complexity

Building frequency table: O(n) where n is the length of the input string
Building Huffman tree: O(k log k) where k is the number of unique characters
Building dictionary: O(k) where k is the number of unique characters
Encoding: O(n) where n is the length of the input string
Decoding: O(m) where m is the length of the encoded bit string

Key Properties

Prefix-free codes: No code is a prefix of another, enabling unambiguous decoding
Optimal compression: Huffman codes are provably optimal for character-based compression
Variable-length codes: More frequent characters get shorter codes
Lossless: Original data can be perfectly reconstructed

The Huffman tree must be stored or transmitted along with the encoded data to enable decoding.

Priority Queue/Heap - Used for building the Huffman tree
Binary Tree - The Huffman tree structure

Get Started

Linear Data Structures

Tree Data Structures

Hash-Based Structures

Graphs

Algorithms

Utilities

Huffman Encoding

Overview

Core Functions

Character Frequency Analysis

character_frequency

build_frequency_table

Huffman Tree Construction

build_huffman_tree

build_huffman_dictionary

Encoding and Decoding

huffman_encode

huffman_decode

Byte Conversion Utilities

bitstring_to_bytes

bytes_to_bitstring

Complete Usage Example

Algorithm Details

Time Complexity

Key Properties

Build docs developers (and LLMs) love

Get Started

Linear Data Structures

Tree Data Structures

Hash-Based Structures

Graphs

Algorithms

Utilities

​Overview

​Core Functions

​Character Frequency Analysis

​character_frequency

​build_frequency_table

​Huffman Tree Construction

​build_huffman_tree

​build_huffman_dictionary

​Encoding and Decoding

​huffman_encode

​huffman_decode

​Byte Conversion Utilities

​bitstring_to_bytes

​bytes_to_bitstring

​Complete Usage Example

​Algorithm Details

​Time Complexity

​Key Properties

​Related

Build docs developers (and LLMs) love

Overview

Core Functions

Character Frequency Analysis

character_frequency

build_frequency_table

Huffman Tree Construction

build_huffman_tree

build_huffman_dictionary

Encoding and Decoding

huffman_encode

huffman_decode

Byte Conversion Utilities

bitstring_to_bytes

bytes_to_bitstring

Complete Usage Example

Algorithm Details

Time Complexity

Key Properties

Related