Skip to main content

Overview

A hash table is a data structure that provides fast insertion, deletion, and lookup operations by mapping keys to array indices using a hash function. This implementation uses chaining to handle collisions, where each array slot contains a linked list of key-value pairs.

Data Structure Definitions

Hash tables in this implementation consist of two structures defined in hash_tables.h:

Hash Node Structure

hash_tables.h
/**
 * struct hash_node_s - Node of a hash table
 *
 * @key: The key, string
 * The key is unique in the HashTable
 * @value: The value corresponding to a key
 * @next: A pointer to the next node of the List
 */
typedef struct hash_node_s
{
    char *key;
    char *value;
    struct hash_node_s *next;
} hash_node_t;

Hash Table Structure

hash_tables.h
/**
 * struct hash_table_s - Hash table data structure
 *
 * @size: The size of the array
 * @array: An array of size @size
 * Each cell of this array is a pointer to the first node of a linked list,
 * because we want our HashTable to use a Chaining collision handling
 */
typedef struct hash_table_s
{
    unsigned long int size;
    hash_node_t **array;
} hash_table_t;
Collision Handling via Chaining:When multiple keys hash to the same index, they form a linked list at that array position:
Array Index 0: [key1:val1] -> [key2:val2] -> NULL
Array Index 1: NULL
Array Index 2: [key3:val3] -> NULL
Array Index 3: [key4:val4] -> [key5:val5] -> [key6:val6] -> NULL
This approach handles collisions gracefully but can degrade to O(n) in worst case.

Core Operations

Creating a Hash Table

hash_table_create.c
/**
 * hash_table_create - creates a hash table
 * @size: size of the table
 * Return: hash table on success, NULL on failure.
 */
hash_table_t *hash_table_create(unsigned long int size)
{
    hash_table_t *hash_table;

    if (size == 0)
        return (NULL);

    hash_table = malloc(sizeof(hash_table_t));

    if (hash_table == NULL)
        return (NULL);

    hash_table->size = size;
    hash_table->array = calloc(size, sizeof(hash_node_t *));
    if (hash_table->array == NULL)
    {
        free(hash_table);
        return (NULL);
    }
    return (hash_table);
}
Important: Uses calloc() to initialize all array pointers to NULL, indicating empty buckets.Time Complexity: O(n) where n is the size
Space Complexity: O(n)

Hash Function: djb2 Algorithm

djb2.c
/**
 * hash_djb2 - implementation of the djb2 algorithm
 * @str: string used to generate hash value
 *
 * Return: hash value
 */
unsigned long int hash_djb2(const unsigned char *str)
{
    unsigned long int hash;
    int c;

    hash = 5381;
    while ((c = *str++))
    {
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    }
    return (hash);
}

Getting Array Index from Key

key_index.c
/**
 * key_index - returns the index of a key
 * @key: key
 * @size: size of the array of the hash table
 * Return: index
 */
unsigned long int key_index(const unsigned char *key, unsigned long int size)
{
    unsigned long int index = hash_djb2(key) % size;

    return (index);
}
The modulo operation (% size) maps the large hash value to a valid array index:
  • Hash value: 5862390
  • Table size: 1024
  • Index: 5862390 % 1024 = 934
Time Complexity: O(k) where k is key length

Inserting Key-Value Pairs

hash_table_set.c
int hash_table_set(hash_table_t *ht, const char *key, const char *value)
{
    hash_node_t *node;
    hash_node_t *new_node;
    unsigned long int index;

    if (ht == NULL || *key == '\n' || *value == '\n')
        return (0);

    index = key_index((const unsigned char *)key, ht->size);
    node = ht->array[index];

    if (node == NULL)
    {
        new_node = create_new_node(key, value);
        if (new_node == NULL)
            return (0);

        ht->array[index] = new_node;
        return (1);
    }

    /*If key exists, replace value*/
    while (node != NULL)
    {
        if (strcmp(key, node->key) == 0)
        {
            free(node->value);
            node->value = strdup(value);
            return (1);
        }
        node = node->next;
    }
    /*If key doesn't exist, create new node*/
    new_node = create_new_node(key, value);
    if (new_node == NULL)
        return (0);

    new_node->next = ht->array[index];
    ht->array[index] = new_node;
    return (1);
}

/**
 * create_new_node - create a new node
 * @key: is the key. key can not be an empty string
 * @value: value associated with the key.
 * value must be duplicated. value can be an empty string
 * Return: 1 on success, 0 on failure
 */
hash_node_t *create_new_node (const char *key, const char *value)
{
    hash_node_t *new_node;

    new_node = malloc(sizeof(hash_node_t));

    if (new_node == NULL)
        return (NULL);

    new_node->key = strdup(key);
    new_node->value = strdup(value);
    new_node->next = NULL;

    return (new_node);
}

Retrieving Values

hash_table_get.c
/**
 * hash_table_get - retrieves a value associated with a key.
 * @ht: the hash table you want to look into
 * @key: is the key you are looking for
 * Return: value associated with key if found, NULL if failed
 */
char *hash_table_get(const hash_table_t *ht, const char *key)
{
    hash_node_t *node;
    unsigned long int index;

    if (ht == NULL)
        return (NULL);

    index = key_index((const unsigned char *) key, ht->size);
    node = ht->array[index];

    while (node != NULL)
    {
        if (strcmp(node->key, key) == 0)
            return (node->value);

        node = node->next;
    }
    return (NULL);
}
Algorithm:
  1. Calculate index from key using hash function
  2. Traverse the linked list at that index
  3. Compare each node’s key until match found
  4. Return value if found, NULL otherwise
Time Complexity:
  • Average case: O(1)
  • Worst case: O(n) if all keys collide

Deleting Hash Table

hash_table_delete.c
/**
 * hash_table_delete - deletes a hash table
 * @ht: the hash table you want to delete
 */
void hash_table_delete(hash_table_t *ht)
{
    unsigned long int i;
    hash_node_t *node;

    if (ht == NULL)
        return;

    for (i = 0; i < ht->size; i++)
    {
        node = ht->array[i];
        free_hash_list(node);
    }
    free(ht->array);
    free(ht);
}

/**
 * free_hash_list - frees a hash_node_t list
 * @head: head of linked list
 */
void free_hash_list(hash_node_t *head)
{
    hash_node_t *current;
    hash_node_t *next;

    current = head;

    while (current != NULL)
    {
        next = current->next;
        free(current->key);
        free(current->value);
        free(current);
        current = next;
    }
}
Memory deallocation order:
  1. For each bucket in the array:
    • Free each node’s key string
    • Free each node’s value string
    • Free the node itself
  2. Free the array of pointers
  3. Free the hash table structure
Critical: Must free strings before nodes, and nodes before array.Time Complexity: O(n) where n is total number of stored elements

Complete API

From hash_tables.h:37-45, the full API includes:
hash_tables.h
hash_table_t *hash_table_create(unsigned long int size);
unsigned long int hash_djb2(const unsigned char *str);
unsigned long int key_index(const unsigned char *key, unsigned long int size);
hash_node_t *create_new_node (const char *key, const char *value);
int hash_table_set(hash_table_t *ht, const char *key, const char *value);
char *hash_table_get(const hash_table_t *ht, const char *key);
void hash_table_print(const hash_table_t *ht);
void hash_table_delete(hash_table_t *ht);
void free_hash_list(hash_node_t *head);

Usage Example

example.c
#include "hash_tables.h"

int main(void)
{
    hash_table_t *ht;
    
    /* Create hash table with 1024 buckets */
    ht = hash_table_create(1024);
    
    /* Insert key-value pairs */
    hash_table_set(ht, "name", "Alice");
    hash_table_set(ht, "age", "25");
    hash_table_set(ht, "city", "San Francisco");
    
    /* Retrieve values */
    printf("Name: %s\n", hash_table_get(ht, "name"));  // Alice
    printf("Age: %s\n", hash_table_get(ht, "age"));    // 25
    
    /* Update existing key */
    hash_table_set(ht, "age", "26");
    printf("Age: %s\n", hash_table_get(ht, "age"));    // 26
    
    /* Non-existent key returns NULL */
    char *country = hash_table_get(ht, "country");
    if (country == NULL)
        printf("Key not found\n");
    
    /* Print entire table */
    hash_table_print(ht);
    
    /* Cleanup */
    hash_table_delete(ht);
    
    return (0);
}

Performance Analysis

Time Complexity

OperationAverage CaseWorst CaseNotes
hash_table_createO(n)O(n)n = size of array
hash_table_setO(1)O(n)n = chain length
hash_table_getO(1)O(n)n = chain length
hash_table_deleteO(n)O(n)n = total elements
hash_djb2O(k)O(k)k = key length

Space Complexity

  • Hash table structure: O(m) where m is the array size
  • Stored elements: O(n) where n is the number of key-value pairs
  • Total: O(m + n)

Load Factor

The load factor (α) determines performance:
α = n / m

where:
  n = number of elements stored
  m = array size (number of buckets)
Performance guidelines:
  • α < 0.7: Excellent performance, mostly O(1) operations
  • α ≈ 1.0: Good performance, occasional collisions
  • α > 2.0: Degraded performance, long chains
  • α >> 1: Poor performance, approaching O(n)
Choosing table size:For optimal performance:
  • Use a prime number for the size to reduce clustering
  • Common choices: 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593
  • Resize when load factor exceeds 0.7-0.75
This implementation uses a fixed size without automatic resizing.

Collision Handling: Chaining vs Other Methods

Chaining (used in this implementation):
  • ✓ Simple to implement
  • ✓ Never fills up (limited only by memory)
  • ✓ Easy deletion
  • ✗ Extra memory for pointers
  • ✗ Cache performance suffers with long chains
Alternative: Open Addressing:
  • ✓ Better cache locality
  • ✓ No extra pointer memory
  • ✗ More complex deletion
  • ✗ Must handle table-full condition
  • ✗ Performance degrades faster as table fills

Key Implementation Details

  1. String duplication: Both keys and values are duplicated using strdup() to ensure the hash table owns its data
  2. Collision prepending: New nodes are prepended to chains (O(1)) rather than appended (O(n))
  3. Update semantics: Setting an existing key replaces its value rather than creating a duplicate
  4. NULL handling: Returns NULL for missing keys, distinguishing from empty string values
  5. Memory safety: Comprehensive cleanup in hash_table_delete() prevents memory leaks

Build docs developers (and LLMs) love