Huffman encoding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency. More frequent characters get shorter codes, resulting in efficient compression.
The implementation uses a priority queue (min-heap) and binary tree to build optimal prefix codes.
def character_frequency(s: str) -> dict: """ Takes a string a returns a dictionary of character frequencies. Args: s (str): The string to analyze. Returns: Dictionary containing character frequency. """
Parameters:
s (str): The string to analyze
Returns:
dict: Dictionary mapping each character to its frequency count
Creates a priority queue of characters based on their frequencies.
huffman.py:23
def build_frequency_table(s: str) -> PriorityQueue: """ Accepts a string to encode and returns a heap of the characters. Args: s (str): The string to encode. Returns: A priority queue of the characters based on frequencies. """
Parameters:
s (str): The string to encode
Returns:
PriorityQueue: Priority queue with TreeNodes containing characters, prioritized by frequency
Builds a Huffman tree from a priority queue of characters.
huffman.py:42
def build_huffman_tree(pq: PriorityQueue) -> Tree: """ Accepts a priority queue and returns a Huffman Tree. Args: pq (PriorityQueue): A PriorityQueue containing TreeNodes of characters based on frequencies. Returns: A Huffman Tree. """
Parameters:
pq (PriorityQueue): Priority queue containing TreeNodes with characters
Generates encoding dictionary from a Huffman tree.
huffman.py:60
def build_huffman_dictionary(node: TreeNode, bit_string: str="") -> dict: """ Given a TreeNode, build a Huffman Dictionary. Args: node (TreeNode): The Huffman Node. bit_string (str): The bit string. Returns: A Huffman Dictionary. """
Parameters:
node (TreeNode): Root node of the Huffman tree
bit_string (str): Current bit string (used for recursion)
Returns:
dict: Dictionary mapping characters to their binary codes
def huffman_encode(st: str, hd: dict) -> str: """ Encode the string using the Huffman Dictionary. Args: st (str): The string to encode. hd (dict): The Huffman Dictionary. Returns: The encoded string. """
Parameters:
st (str): The string to encode
hd (dict): The Huffman dictionary (character to bit string mapping)
def huffman_decode(encoded_data: str, tree: Tree) -> str: """ Decode the encoded data using the Huffman Tree. Args: encoded_data (str): The encoded data. tree (Tree): The Huffman Tree. Returns: The decoded data. """
def bytes_to_bitstring(ba: bytes, bitlength: int=8) -> str: """ Convert bytes to bitstring. Args: ba (bytes): The bytes to convert. bitlength (int): The bit length. Returns: The bytes converted to bitstring. """
# Original string to compresstext = "this is an example for huffman encoding"# Build frequency table (priority queue)pq = build_frequency_table(text)# Build Huffman treehuffman_tree = build_huffman_tree(pq)# Generate encoding dictionaryhuffman_dict = build_huffman_dictionary(huffman_tree.root)print("Huffman codes:")for char, code in huffman_dict.items(): print(f"'{char}': {code}")
# Decode back to original textdecoded = huffman_decode(encoded, huffman_tree)print(f"Decoded: {decoded}")print(f"Match original: {decoded == text}")
5
Convert to bytes for storage
# Convert to bytes for file storageencoded_bytes = bitstring_to_bytes(encoded)# Convert back to bitstringbitlength = len(encoded) % 8 or 8restored_bitstring = bytes_to_bitstring(encoded_bytes, bitlength)print(f"Bytes size: {len(encoded_bytes)} bytes")
# The tree is built by repeatedly merging the two nodes with lowest frequencywhile len(pq) > 1: priority1, node1 = pq.pop_pair() priority2, node2 = pq.pop_pair() node = TreeNode(node1.value + node2.value, node1, node2) pq.push(priority1 + priority2, node)