Skip to main content

Overview

The LeetCodeRetriever class provides fast semantic search over LeetCode solutions using FAISS HNSW indexing and SentenceTransformers embeddings. It supports similarity search and metadata-based filtering by company, difficulty, and topics.

Class Definition

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever, Solution

retriever = LeetCodeRetriever(
    index_path="leetcode_hnsw2.index",
    metadata_path="leetcode_metadata2.pkl",
    model_name="all-MiniLM-L6-v2",
    ef_search=32
)

Solution Dataclass

Represents a LeetCode solution with associated metadata.
@dataclass
class Solution:
    title: str          # Problem title (e.g., "Two Sum")
    solution: str       # Complete solution with explanation and code
    difficulty: str     # "Easy", "Medium", or "Hard"
    topics: str         # Comma-separated topics (e.g., "Array, Hash Table")
    companies: str      # Comma-separated companies (e.g., "Amazon, Google")
Attributes:
title
str
The problem title as it appears on LeetCode
solution
str
Complete solution text including approach explanation, code implementation, and complexity analysis
difficulty
str
Problem difficulty level: "Easy", "Medium", or "Hard"
topics
str
Comma-separated list of algorithmic topics and data structures
companies
str
Comma-separated list of companies known to ask this problem

Constructor Parameters

index_path
str
default:"leetcode_hnsw2.index"
Path to the FAISS HNSW index file. Defaults to leetcode_hnsw2.index in the component directory.
metadata_path
str
default:"leetcode_metadata2.pkl"
Path to the pickled metadata file containing Solution objects. Defaults to leetcode_metadata2.pkl in the component directory.
model_name
str
default:"all-MiniLM-L6-v2"
SentenceTransformer model name for encoding queries. Must match the model used to build the index.
HNSW search parameter controlling speed/accuracy trade-off. Higher values improve accuracy but slow down search. Typical range: 16-128.
Initialization Behavior:
  • Loads the SentenceTransformer encoder
  • Reads the FAISS HNSW index from disk
  • Validates that the index is HNSW type
  • Sets HNSW search parameters
  • Loads solution metadata from pickle file
  • Logs successful initialization
Raises:
  • ValueError if the index is not an HNSW index
  • Exception if metadata file cannot be loaded

Methods

Search for semantically similar solutions using vector similarity.
results = retriever.search(
    query="dynamic programming coin change",
    k=5,
    return_scores=True
)
query
str
required
Natural language query describing the problem or concept
k
int
default:"3"
Number of top results to return
return_scores
bool
default:"True"
If True, returns tuples of (Solution, score). If False, returns only Solution objects.
results
List[Tuple[Solution, float]] | List[Solution]
List of search results. Format depends on return_scores:
  • If True: List of (Solution, score) tuples, where score is L2 distance (lower is better)
  • If False: List of Solution objects only
Returns empty list on search failure.
Behavior:
  • Encodes query using SentenceTransformer
  • Searches FAISS HNSW index for k nearest neighbors
  • Returns solutions ordered by similarity (ascending L2 distance)
  • Lower scores indicate higher similarity
Example:
retriever = LeetCodeRetriever()

results = retriever.search(
    "matrix distance calculation",
    k=3,
    return_scores=True
)

for solution, score in results:
    print(f"Title: {solution.title}")
    print(f"Score: {score:.3f}")
    print(f"Difficulty: {solution.difficulty}")
    print(f"Topics: {solution.topics}")
    print()

# Output:
# Title: 01 Matrix
# Score: 0.523
# Difficulty: Medium
# Topics: Array, BFS, Matrix

filter_by_metadata

Filter solutions based on company, difficulty, and topic metadata.
filtered = retriever.filter_by_metadata(
    companies=["Amazon", "Google"],
    difficulty="Medium",
    topics=["BFS", "Dynamic Programming"]
)
companies
List[str]
default:"None"
List of company names to filter by. Matches if any company is found in solution’s companies field (case-insensitive).
difficulty
str
default:"None"
Difficulty level to filter by: "Easy", "Medium", or "Hard" (case-insensitive)
topics
List[str]
default:"None"
List of topics to filter by. Matches if any topic is found in solution’s topics field (case-insensitive).
filtered_solutions
List[Solution]
List of Solution objects matching all specified criteria. Returns all solutions if no filters specified.
Behavior:
  • Applies filters sequentially (companies → difficulty → topics)
  • All filters use case-insensitive matching
  • Filters are cumulative (AND logic)
  • Within each filter type, matching uses OR logic (e.g., any company matches)
  • Returns original solution list if no filters provided
Example:
# Filter by difficulty only
medium_problems = retriever.filter_by_metadata(
    difficulty="Medium"
)
print(f"Found {len(medium_problems)} medium problems")

_load_metadata

Internal method to load solution metadata from pickle file.
solutions = retriever._load_metadata("leetcode_metadata2.pkl")
metadata_path
str
required
Path to pickled metadata file
solutions
List[Solution]
List of Solution objects loaded from file
Raises:
  • Exception on file loading or unpickling errors

Attributes

encoder
SentenceTransformer
SentenceTransformer model for encoding text queries into embeddings
index
faiss.IndexHNSWFlat
FAISS HNSW index for fast approximate nearest neighbor search
solutions
List[Solution]
Complete list of Solution objects with metadata

Usage Examples

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

# Initialize retriever
retriever = LeetCodeRetriever()

# Search for solutions
results = retriever.search(
    "How to find shortest path in a graph?",
    k=5
)

for solution, score in results:
    print(f"{solution.title} (confidence: {score:.3f})")
    print(solution.solution[:200])  # First 200 chars
    print()

Performance Characteristics

Search Complexity

Time: O(log n) average
Space: O(k) for results
HNSW enables sub-linear search time

Filter Complexity

Time: O(n) linear scan
Space: O(m) filtered results
Iterates through all solutions

ef_search Parameter Tuning

ef_searchSpeedAccuracyUse Case
16FastestLowerReal-time autocomplete
32FastGoodDefault for most cases
64ModerateBetterHigh-quality results
128SlowerBestMaximum accuracy needed

Error Handling

try:
    retriever = LeetCodeRetriever(
        index_path="invalid.index"
    )
except ValueError as e:
    print(f"Index error: {e}")
except Exception as e:
    print(f"Failed to load metadata: {e}")

# Search errors return empty list
results = retriever.search("query", k=5)
if not results:
    print("No results found or search failed")

Integration Example

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
from rag_engine import RAGEngine

# Initialize retriever with custom settings
retriever = LeetCodeRetriever(
    ef_search=64,  # Higher accuracy
    model_name="all-MiniLM-L6-v2"
)

# Use with RAG engine
rag_engine = RAGEngine(
    retriever=retriever,
    confidence_threshold=0.7
)

# Combined search and generation
answer = rag_engine.answer_question(
    "Explain the approach for the knapsack problem",
    k=10
)
print(answer)

See Also

Build docs developers (and LLMs) love