The HashSet class provides a set data structure that stores unique elements with fast membership testing. It’s built on top of the HashTable class and provides a clean interface for set operations.
HashSet internally uses a HashTable where each item is stored as a key with a value of True, ensuring uniqueness and O(1) average-case lookup time.
from dsa.hashset import HashSet# Create an empty hashset with default capacityhs = HashSet()# Create a hashset with custom capacityhs = HashSet(capacity=50)
Parameters:
capacity (int, optional): The initial capacity of the underlying hash table. Defaults to 10.
# Create from a lisths = HashSet.from_list([1, 2, 3, 4, 5])# Create from a list with duplicates (duplicates are automatically removed)hs = HashSet.from_list([1, 2, 2, 3, 3, 3, 4])print(list(hs)) # [1, 2, 3, 4]# Create from a tuplehs = HashSet.from_list(("apple", "banana", "cherry"))# Create from a string (each character becomes an element)hs = HashSet.from_list("hello")print(list(hs)) # ['h', 'e', 'l', 'o'] - note 'l' appears once
Parameters:
iterable: An iterable collection of initial elements
Returns: A new HashSet instance containing the elements
The from_list method automatically sets the capacity to twice the length of the input iterable for optimal performance.
hs = HashSet()# Add elementshs.add("apple")hs.add("banana")hs.add("cherry")print(len(hs)) # 3# Adding duplicate has no effecths.add("apple")print(len(hs)) # Still 3
hs = HashSet.from_list(["apple", "banana", "cherry"])# Remove an elemenths.remove("banana")print(len(hs)) # 2# Attempting to remove non-existent element raises KeyErrortry: hs.remove("grape")except KeyError as e: print(f"Item not found: {e}")
Parameters:
item: The element to remove
Raises:KeyError if the item is not in the setTime Complexity: O(1) average case
hs = HashSet.from_list(["apple", "banana", "cherry"])# Check membershipif hs.contains("apple"): print("Apple is in the set") # This will printif not hs.contains("grape"): print("Grape is not in the set") # This will print
Parameters:
item: The element to check
Returns:True if the element exists, False otherwiseTime Complexity: O(1) average case
The HashSet implements several magic methods for intuitive Python integration:
Membership
Length
Iteration
Representation
Equality
hs = HashSet.from_list(["apple", "banana", "cherry"])# Check membership with 'in' operatorif "apple" in hs: print("Apple is in the set")if "grape" not in hs: print("Grape is not in the set")
hs = HashSet.from_list([1, 2, 3, 4, 5])# Get number of elementscount = len(hs) # Returns 5# Empty setempty_hs = HashSet()print(len(empty_hs)) # Returns 0
hs = HashSet.from_list(["red", "green", "blue"])# Iterate over elementsfor color in hs: print(color)# List comprehensionuppercase = [color.upper() for color in hs]
from dsa.hashset import HashSet# Track unique user IDs in a sessionactive_users = HashSet()def user_login(user_id): active_users.add(user_id) print(f"User {user_id} logged in") print(f"Total active users: {len(active_users)}")def user_logout(user_id): if user_id in active_users: active_users.remove(user_id) print(f"User {user_id} logged out") else: print(f"User {user_id} was not logged in")user_login("user123")user_login("user456")user_login("user123") # Already logged inuser_logout("user123")user_logout("user789") # Never logged in
The HashSet provides excellent average-case performance for membership testing, making it ideal for scenarios where you need to quickly check if an item has been seen before.
from dsa.hashset import HashSet# Create a word frequency analyzer that tracks unique wordsclass WordAnalyzer: def __init__(self): self.unique_words = HashSet() def add_text(self, text): # Split text into words and add to set words = text.lower().split() for word in words: # Remove punctuation clean_word = ''.join(c for c in word if c.isalnum()) if clean_word: self.unique_words.add(clean_word) def word_count(self): return len(self.unique_words) def has_word(self, word): return word.lower() in self.unique_words def get_words(self): return sorted(self.unique_words.to_list())# Usageanalyzer = WordAnalyzer()analyzer.add_text("The quick brown fox jumps over the lazy dog")analyzer.add_text("The dog was not amused")print(f"Unique words: {analyzer.word_count()}")print(f"Contains 'fox': {analyzer.has_word('fox')}")print(f"Contains 'cat': {analyzer.has_word('cat')}")print(f"All words: {analyzer.get_words()}")
from dsa.hashset import HashSeths = HashSet.from_list([1, 2, 3])hs.add(4)hs.remove(2)if 3 in hs: print("Found")
# Python's built-in set has similar functionalitys = set([1, 2, 3])s.add(4)s.remove(2)if 3 in s: print("Found")
While Python’s built-in set is more optimized and feature-rich, the HashSet class demonstrates the underlying implementation principles and is useful for educational purposes or when you need to understand the internals.