LeetCodeRetriever

Overview

The LeetCodeRetriever class provides fast semantic search over LeetCode solutions using FAISS HNSW indexing and SentenceTransformers embeddings. It supports similarity search and metadata-based filtering by company, difficulty, and topics.

Class Definition

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever, Solution

retriever = LeetCodeRetriever(
    index_path="leetcode_hnsw2.index",
    metadata_path="leetcode_metadata2.pkl",
    model_name="all-MiniLM-L6-v2",
    ef_search=32
)

Solution Dataclass

Represents a LeetCode solution with associated metadata.

@dataclass
class Solution:
    title: str          # Problem title (e.g., "Two Sum")
    solution: str       # Complete solution with explanation and code
    difficulty: str     # "Easy", "Medium", or "Hard"
    topics: str         # Comma-separated topics (e.g., "Array, Hash Table")
    companies: str      # Comma-separated companies (e.g., "Amazon, Google")

Attributes:

title

str

The problem title as it appears on LeetCode

solution

str

Complete solution text including approach explanation, code implementation, and complexity analysis

difficulty

str

Problem difficulty level: "Easy", "Medium", or "Hard"

topics

str

Comma-separated list of algorithmic topics and data structures

companies

str

Comma-separated list of companies known to ask this problem

Constructor Parameters

index_path

str

default:"leetcode_hnsw2.index"

Path to the FAISS HNSW index file. Defaults to leetcode_hnsw2.index in the component directory.

metadata_path

str

default:"leetcode_metadata2.pkl"

Path to the pickled metadata file containing Solution objects. Defaults to leetcode_metadata2.pkl in the component directory.

model_name

str

default:"all-MiniLM-L6-v2"

SentenceTransformer model name for encoding queries. Must match the model used to build the index.

ef_search

int

default:"32"

HNSW search parameter controlling speed/accuracy trade-off. Higher values improve accuracy but slow down search. Typical range: 16-128.

Initialization Behavior:

Loads the SentenceTransformer encoder
Reads the FAISS HNSW index from disk
Validates that the index is HNSW type
Sets HNSW search parameters
Loads solution metadata from pickle file
Logs successful initialization

Raises:

ValueError if the index is not an HNSW index
Exception if metadata file cannot be loaded

Methods

search

Search for semantically similar solutions using vector similarity.

results = retriever.search(
    query="dynamic programming coin change",
    k=5,
    return_scores=True
)

query

str

required

Natural language query describing the problem or concept

int

default:"3"

Number of top results to return

return_scores

bool

default:"True"

If True, returns tuples of (Solution, score). If False, returns only Solution objects.

results

List[Tuple[Solution, float]] | List[Solution]

List of search results. Format depends on return_scores:

If True: List of (Solution, score) tuples, where score is L2 distance (lower is better)
If False: List of Solution objects only

Returns empty list on search failure.

Behavior:

Encodes query using SentenceTransformer
Searches FAISS HNSW index for k nearest neighbors
Returns solutions ordered by similarity (ascending L2 distance)
Lower scores indicate higher similarity

Example:

retriever = LeetCodeRetriever()

results = retriever.search(
    "matrix distance calculation",
    k=3,
    return_scores=True
)

for solution, score in results:
    print(f"Title: {solution.title}")
    print(f"Score: {score:.3f}")
    print(f"Difficulty: {solution.difficulty}")
    print(f"Topics: {solution.topics}")
    print()

# Output:
# Title: 01 Matrix
# Score: 0.523
# Difficulty: Medium
# Topics: Array, BFS, Matrix

filter_by_metadata

Filter solutions based on company, difficulty, and topic metadata.

filtered = retriever.filter_by_metadata(
    companies=["Amazon", "Google"],
    difficulty="Medium",
    topics=["BFS", "Dynamic Programming"]
)

companies

List[str]

default:"None"

List of company names to filter by. Matches if any company is found in solution’s companies field (case-insensitive).

difficulty

str

default:"None"

Difficulty level to filter by: "Easy", "Medium", or "Hard" (case-insensitive)

topics

List[str]

default:"None"

List of topics to filter by. Matches if any topic is found in solution’s topics field (case-insensitive).

filtered_solutions

List[Solution]

List of Solution objects matching all specified criteria. Returns all solutions if no filters specified.

Behavior:

Applies filters sequentially (companies → difficulty → topics)
All filters use case-insensitive matching
Filters are cumulative (AND logic)
Within each filter type, matching uses OR logic (e.g., any company matches)
Returns original solution list if no filters provided

Example:

# Filter by difficulty only
medium_problems = retriever.filter_by_metadata(
    difficulty="Medium"
)
print(f"Found {len(medium_problems)} medium problems")

_load_metadata

Internal method to load solution metadata from pickle file.

solutions = retriever._load_metadata("leetcode_metadata2.pkl")

metadata_path

str

required

Path to pickled metadata file

solutions

List[Solution]

List of Solution objects loaded from file

Raises:

Exception on file loading or unpickling errors

Attributes

encoder

SentenceTransformer

SentenceTransformer model for encoding text queries into embeddings

index

faiss.IndexHNSWFlat

FAISS HNSW index for fast approximate nearest neighbor search

solutions

List[Solution]

Complete list of Solution objects with metadata

Usage Examples

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

# Initialize retriever
retriever = LeetCodeRetriever()

# Search for solutions
results = retriever.search(
    "How to find shortest path in a graph?",
    k=5
)

for solution, score in results:
    print(f"{solution.title} (confidence: {score:.3f})")
    print(solution.solution[:200])  # First 200 chars
    print()

Performance Characteristics

Search Complexity

Time: O(log n) average
Space: O(k) for results
HNSW enables sub-linear search time

Filter Complexity

Time: O(n) linear scan
Space: O(m) filtered results
Iterates through all solutions

ef_search Parameter Tuning

ef_search	Speed	Accuracy	Use Case
16	Fastest	Lower	Real-time autocomplete
32	Fast	Good	Default for most cases
64	Moderate	Better	High-quality results
128	Slower	Best	Maximum accuracy needed

Error Handling

try:
    retriever = LeetCodeRetriever(
        index_path="invalid.index"
    )
except ValueError as e:
    print(f"Index error: {e}")
except Exception as e:
    print(f"Failed to load metadata: {e}")

# Search errors return empty list
results = retriever.search("query", k=5)
if not results:
    print("No results found or search failed")

Integration Example

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
from rag_engine import RAGEngine

# Initialize retriever with custom settings
retriever = LeetCodeRetriever(
    ef_search=64,  # Higher accuracy
    model_name="all-MiniLM-L6-v2"
)

# Use with RAG engine
rag_engine = RAGEngine(
    retriever=retriever,
    confidence_threshold=0.7
)

# Combined search and generation
answer = rag_engine.answer_question(
    "Explain the approach for the knapsack problem",
    k=10
)
print(answer)

Core Components

Web API

LeetCodeRetriever

Overview

Class Definition

Solution Dataclass

Constructor Parameters

Methods

search

filter_by_metadata

_load_metadata

Attributes

Usage Examples

Performance Characteristics

Search Complexity

Filter Complexity

ef_search Parameter Tuning

Error Handling

Integration Example

See Also

Build docs developers (and LLMs) love

Core Components

Web API

​Overview

​Class Definition

​Solution Dataclass

​Constructor Parameters

​Methods

​search

​filter_by_metadata

​_load_metadata

​Attributes

​Usage Examples

​Performance Characteristics

Search Complexity

Filter Complexity

​ef_search Parameter Tuning

​Error Handling

​Integration Example

​See Also

Build docs developers (and LLMs) love

Overview

Class Definition

Solution Dataclass

Constructor Parameters

Methods

search

filter_by_metadata

_load_metadata

Attributes

Usage Examples

Performance Characteristics

ef_search Parameter Tuning

Error Handling

Integration Example

See Also