Hash Tables

Overview

A hash table is a data structure that provides fast insertion, deletion, and lookup operations by mapping keys to array indices using a hash function. This implementation uses chaining to handle collisions, where each array slot contains a linked list of key-value pairs.

Data Structure Definitions

Hash tables in this implementation consist of two structures defined in hash_tables.h:

Hash Node Structure

hash_tables.h

/**
 * struct hash_node_s - Node of a hash table
 *
 * @key: The key, string
 * The key is unique in the HashTable
 * @value: The value corresponding to a key
 * @next: A pointer to the next node of the List
 */
typedef struct hash_node_s
{
    char *key;
    char *value;
    struct hash_node_s *next;
} hash_node_t;

Hash Table Structure

hash_tables.h

/**
 * struct hash_table_s - Hash table data structure
 *
 * @size: The size of the array
 * @array: An array of size @size
 * Each cell of this array is a pointer to the first node of a linked list,
 * because we want our HashTable to use a Chaining collision handling
 */
typedef struct hash_table_s
{
    unsigned long int size;
    hash_node_t **array;
} hash_table_t;

Collision Handling via Chaining:When multiple keys hash to the same index, they form a linked list at that array position:

Array Index 0: [key1:val1] -> [key2:val2] -> NULL
Array Index 1: NULL
Array Index 2: [key3:val3] -> NULL
Array Index 3: [key4:val4] -> [key5:val5] -> [key6:val6] -> NULL

This approach handles collisions gracefully but can degrade to O(n) in worst case.

Core Operations

Creating a Hash Table

hash_table_create.c

/**
 * hash_table_create - creates a hash table
 * @size: size of the table
 * Return: hash table on success, NULL on failure.
 */
hash_table_t *hash_table_create(unsigned long int size)
{
    hash_table_t *hash_table;

    if (size == 0)
        return (NULL);

    hash_table = malloc(sizeof(hash_table_t));

    if (hash_table == NULL)
        return (NULL);

    hash_table->size = size;
    hash_table->array = calloc(size, sizeof(hash_node_t *));
    if (hash_table->array == NULL)
    {
        free(hash_table);
        return (NULL);
    }
    return (hash_table);
}

Important: Uses calloc() to initialize all array pointers to NULL, indicating empty buckets.Time Complexity: O(n) where n is the size
Space Complexity: O(n)

Hash Function: djb2 Algorithm

djb2.c

/**
 * hash_djb2 - implementation of the djb2 algorithm
 * @str: string used to generate hash value
 *
 * Return: hash value
 */
unsigned long int hash_djb2(const unsigned char *str)
{
    unsigned long int hash;
    int c;

    hash = 5381;
    while ((c = *str++))
    {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }
    return (hash);
}

Show Understanding the djb2 Algorithm

The djb2 hash function was created by Daniel J. Bernstein and is one of the most effective string hash functions.How it works:

Start with magic constant 5381
For each character in the string:
- Multiply hash by 33 (hash * 33 = (hash << 5) + hash)
- Add the character’s ASCII value

Example: Hashing “Hi”

Initial: hash = 5381
Char 'H' (72): hash = (5381 * 33) + 72 = 177645
Char 'i' (105): hash = (177645 * 33) + 105 = 5862390
Final: 5862390

Why multiply by 33?

Prime number reduces collisions
Bit shift optimization: (hash << 5) + hash is faster than multiplication
Empirically proven to distribute strings well

Time Complexity: O(k) where k is the length of the string

Getting Array Index from Key

key_index.c

/**
 * key_index - returns the index of a key
 * @key: key
 * @size: size of the array of the hash table
 * Return: index
 */
unsigned long int key_index(const unsigned char *key, unsigned long int size)
{
    unsigned long int index = hash_djb2(key) % size;

    return (index);
}

The modulo operation (% size) maps the large hash value to a valid array index:

Hash value: 5862390
Table size: 1024
Index: 5862390 % 1024 = 934

Time Complexity: O(k) where k is key length

Inserting Key-Value Pairs

hash_table_set.c

int hash_table_set(hash_table_t *ht, const char *key, const char *value)
{
    hash_node_t *node;
    hash_node_t *new_node;
    unsigned long int index;

    if (ht == NULL || *key == '\n' || *value == '\n')
        return (0);

    index = key_index((const unsigned char *)key, ht->size);
    node = ht->array[index];

    if (node == NULL)
    {
        new_node = create_new_node(key, value);
        if (new_node == NULL)
            return (0);

        ht->array[index] = new_node;
        return (1);
    }

    /*If key exists, replace value*/
    while (node != NULL)
    {
        if (strcmp(key, node->key) == 0)
        {
            free(node->value);
            node->value = strdup(value);
            return (1);
        }
        node = node->next;
    }
    /*If key doesn't exist, create new node*/
    new_node = create_new_node(key, value);
    if (new_node == NULL)
        return (0);

    new_node->next = ht->array[index];
    ht->array[index] = new_node;
    return (1);
}

/**
 * create_new_node - create a new node
 * @key: is the key. key can not be an empty string
 * @value: value associated with the key.
 * value must be duplicated. value can be an empty string
 * Return: 1 on success, 0 on failure
 */
hash_node_t *create_new_node (const char *key, const char *value)
{
    hash_node_t *new_node;

    new_node = malloc(sizeof(hash_node_t));

    if (new_node == NULL)
        return (NULL);

    new_node->key = strdup(key);
    new_node->value = strdup(value);
    new_node->next = NULL;

    return (new_node);
}

Show Understanding Hash Table Set Algorithm

Three scenarios when setting a key-value pair:Scenario 1: Empty bucket

Array[index] = NULL
↓
Array[index] = [new_key:new_value] -> NULL

Scenario 2: Key already exists (update)

Array[index] = [key1:old_val] -> [key2:val2] -> NULL
                ↓ (update value)
Array[index] = [key1:new_val] -> [key2:val2] -> NULL

Scenario 3: Collision (new key)

Array[index] = [existing_key:val] -> NULL
                ↓ (prepend new node)
Array[index] = [new_key:new_val] -> [existing_key:val] -> NULL

Note: New colliding keys are prepended (O(1)) rather than appended (O(n)).Time Complexity:

Average case: O(1)
Worst case: O(n) where n is the length of the chain

Retrieving Values

hash_table_get.c

/**
 * hash_table_get - retrieves a value associated with a key.
 * @ht: the hash table you want to look into
 * @key: is the key you are looking for
 * Return: value associated with key if found, NULL if failed
 */
char *hash_table_get(const hash_table_t *ht, const char *key)
{
    hash_node_t *node;
    unsigned long int index;

    if (ht == NULL)
        return (NULL);

    index = key_index((const unsigned char *) key, ht->size);
    node = ht->array[index];

    while (node != NULL)
    {
        if (strcmp(node->key, key) == 0)
            return (node->value);

        node = node->next;
    }
    return (NULL);
}

Algorithm:

Calculate index from key using hash function
Traverse the linked list at that index
Compare each node’s key until match found
Return value if found, NULL otherwise

Time Complexity:

Average case: O(1)
Worst case: O(n) if all keys collide

Deleting Hash Table

hash_table_delete.c

/**
 * hash_table_delete - deletes a hash table
 * @ht: the hash table you want to delete
 */
void hash_table_delete(hash_table_t *ht)
{
    unsigned long int i;
    hash_node_t *node;

    if (ht == NULL)
        return;

    for (i = 0; i < ht->size; i++)
    {
        node = ht->array[i];
        free_hash_list(node);
    }
    free(ht->array);
    free(ht);
}

/**
 * free_hash_list - frees a hash_node_t list
 * @head: head of linked list
 */
void free_hash_list(hash_node_t *head)
{
    hash_node_t *current;
    hash_node_t *next;

    current = head;

    while (current != NULL)
    {
        next = current->next;
        free(current->key);
        free(current->value);
        free(current);
        current = next;
    }
}

Memory deallocation order:

For each bucket in the array:
- Free each node’s key string
- Free each node’s value string
- Free the node itself
Free the array of pointers
Free the hash table structure

Critical: Must free strings before nodes, and nodes before array.Time Complexity: O(n) where n is total number of stored elements

Complete API

From hash_tables.h:37-45, the full API includes:

hash_tables.h

hash_table_t *hash_table_create(unsigned long int size);
unsigned long int hash_djb2(const unsigned char *str);
unsigned long int key_index(const unsigned char *key, unsigned long int size);
hash_node_t *create_new_node (const char *key, const char *value);
int hash_table_set(hash_table_t *ht, const char *key, const char *value);
char *hash_table_get(const hash_table_t *ht, const char *key);
void hash_table_print(const hash_table_t *ht);
void hash_table_delete(hash_table_t *ht);
void free_hash_list(hash_node_t *head);

Usage Example

example.c

#include "hash_tables.h"

int main(void)
{
    hash_table_t *ht;
    
    /* Create hash table with 1024 buckets */
    ht = hash_table_create(1024);
    
    /* Insert key-value pairs */
    hash_table_set(ht, "name", "Alice");
    hash_table_set(ht, "age", "25");
    hash_table_set(ht, "city", "San Francisco");
    
    /* Retrieve values */
    printf("Name: %s\n", hash_table_get(ht, "name"));  // Alice
    printf("Age: %s\n", hash_table_get(ht, "age"));    // 25
    
    /* Update existing key */
    hash_table_set(ht, "age", "26");
    printf("Age: %s\n", hash_table_get(ht, "age"));    // 26
    
    /* Non-existent key returns NULL */
    char *country = hash_table_get(ht, "country");
    if (country == NULL)
        printf("Key not found\n");
    
    /* Print entire table */
    hash_table_print(ht);
    
    /* Cleanup */
    hash_table_delete(ht);
    
    return (0);
}

Performance Analysis

Time Complexity

Operation	Average Case	Worst Case	Notes
`hash_table_create`	O(n)	O(n)	n = size of array
`hash_table_set`	O(1)	O(n)	n = chain length
`hash_table_get`	O(1)	O(n)	n = chain length
`hash_table_delete`	O(n)	O(n)	n = total elements
`hash_djb2`	O(k)	O(k)	k = key length

Space Complexity

Hash table structure: O(m) where m is the array size
Stored elements: O(n) where n is the number of key-value pairs
Total: O(m + n)

Load Factor

The load factor (α) determines performance:

α = n / m

where:
  n = number of elements stored
  m = array size (number of buckets)

Performance guidelines:

α < 0.7: Excellent performance, mostly O(1) operations
α ≈ 1.0: Good performance, occasional collisions
α > 2.0: Degraded performance, long chains
α >> 1: Poor performance, approaching O(n)

Choosing table size:For optimal performance:

Use a prime number for the size to reduce clustering
Common choices: 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593
Resize when load factor exceeds 0.7-0.75

This implementation uses a fixed size without automatic resizing.

Collision Handling: Chaining vs Other Methods

Chaining (used in this implementation):

✓ Simple to implement
✓ Never fills up (limited only by memory)
✓ Easy deletion
✗ Extra memory for pointers
✗ Cache performance suffers with long chains

Alternative: Open Addressing:

✓ Better cache locality
✓ No extra pointer memory
✗ More complex deletion
✗ Must handle table-full condition
✗ Performance degrades faster as table fills

Key Implementation Details

String duplication: Both keys and values are duplicated using strdup() to ensure the hash table owns its data
Collision prepending: New nodes are prepended to chains (O(1)) rather than appended (O(n))
Update semantics: Setting an existing key replaces its value rather than creating a duplicate
NULL handling: Returns NULL for missing keys, distinguishing from empty string values
Memory safety: Comprehensive cleanup in hash_table_delete() prevents memory leaks

Get Started

Fundamentals

Pointers & Memory

Advanced Concepts

Data Structures

System Programming

Overview

Data Structure Definitions

Hash Node Structure

Hash Table Structure

Core Operations

Creating a Hash Table

Hash Function: djb2 Algorithm

Getting Array Index from Key

Inserting Key-Value Pairs

Retrieving Values

Deleting Hash Table

Complete API

Usage Example

Performance Analysis

Time Complexity

Space Complexity

Load Factor

Collision Handling: Chaining vs Other Methods

Key Implementation Details

Build docs developers (and LLMs) love

Get Started

Fundamentals

Pointers & Memory

Advanced Concepts

Data Structures

System Programming

​Overview

​Data Structure Definitions

​Hash Node Structure

​Hash Table Structure

​Core Operations

​Creating a Hash Table

​Hash Function: djb2 Algorithm

​Getting Array Index from Key

​Inserting Key-Value Pairs

​Retrieving Values

​Deleting Hash Table

​Complete API

​Usage Example

​Performance Analysis

​Time Complexity

​Space Complexity

​Load Factor

​Collision Handling: Chaining vs Other Methods

​Key Implementation Details

Build docs developers (and LLMs) love

Overview

Data Structure Definitions

Hash Node Structure

Hash Table Structure

Core Operations

Creating a Hash Table

Hash Function: djb2 Algorithm

Getting Array Index from Key

Inserting Key-Value Pairs

Retrieving Values

Deleting Hash Table

Complete API

Usage Example

Performance Analysis

Time Complexity

Space Complexity

Load Factor

Collision Handling: Chaining vs Other Methods

Key Implementation Details