LeetCodeRetriever Configuration

Overview

The LeetCodeRetriever class handles semantic search over the LeetCode solution database using FAISS HNSW (Hierarchical Navigable Small World) indexing and sentence transformers for embeddings.

Initialization Parameters

index_path

string

default:"leetcode_hnsw2.index"

Path to the FAISS HNSW index file. The default path is relative to the component directory.

import os
from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

retriever = LeetCodeRetriever(
    index_path="/custom/path/to/index.index"
)

The index must be created using faiss.IndexHNSWFlat. Other index types will raise a ValueError.

metadata_path

string

default:"leetcode_metadata2.pkl"

Path to the pickled metadata file containing Solution objects. This file stores the actual problem titles, solutions, difficulty levels, topics, and companies.

retriever = LeetCodeRetriever(
    metadata_path="/custom/path/to/metadata.pkl"
)

model_name

string

default:"all-MiniLM-L6-v2"

The sentence transformer model used for encoding queries. This must match the model used to create the index.Popular alternatives:

all-MiniLM-L6-v2: Fast, lightweight (default)
all-mpnet-base-v2: More accurate, slower
multi-qa-mpnet-base-dot-v1: Optimized for Q&A tasks

retriever = LeetCodeRetriever(
    model_name="all-mpnet-base-v2"
)

Changing the model requires rebuilding the index with embeddings from the new model.

ef_search

int

default:"32"

HNSW search parameter that controls the speed/accuracy trade-off during retrieval.

Lower values (16-32): Faster search, slightly lower recall
Higher values (64-128): Slower search, higher recall

The default value of 32 provides a good balance for most use cases.

retriever = LeetCodeRetriever(
    ef_search=64  # More accurate, slower
)

Search Methods

Basic Search

results = retriever.search(
    query="dynamic programming coin change",
    k=5,
    return_scores=True
)

for solution, score in results:
    print(f"{solution.title}: {score:.3f}")

query

string

required

The search query (natural language or keywords)

int

default:"3"

Number of results to return

return_scores

bool

default:"True"

If True, returns (Solution, float) tuples with similarity scores. If False, returns only Solution objects.

Metadata Filtering

Filter solutions by difficulty, topics, or companies:

filtered = retriever.filter_by_metadata(
    companies=["Amazon", "Google"],
    difficulty="Medium",
    topics=["Dynamic Programming", "BFS"]
)

companies

List[str]

Filter by companies that ask this question (case-insensitive partial match)

difficulty

string

Filter by difficulty: "Easy", "Medium", or "Hard" (case-insensitive)

topics

List[str]

Filter by topics/tags (case-insensitive partial match)

Solution Data Structure

Each retrieved solution is a Solution dataclass with these fields:

@dataclass
class Solution:
    title: str          # Problem title (e.g., "Two Sum")
    solution: str       # Full solution with explanation and code
    difficulty: str     # "Easy", "Medium", or "Hard"
    topics: str         # Comma-separated topics
    companies: str      # Comma-separated companies

HNSW Tuning Guide

Understanding ef_search

The ef_search parameter controls the size of the dynamic candidate list during search:

Low (16)
Default (32)
High (64)
Very High (128)

Speed: Very fast
Accuracy: ~90% recall
Use case: Real-time applications, large datasets

Performance Benchmarks

Assuming 1,000 indexed solutions:

ef_search	Latency	Recall	RAM Usage
16	~2ms	89%	Low
32	~4ms	95%	Low
64	~8ms	98%	Medium
128	~15ms	99%+	High

Example Configurations

Default Configuration

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

retriever = LeetCodeRetriever()

High-Accuracy Configuration

For maximum retrieval precision:

retriever = LeetCodeRetriever(
    model_name="all-mpnet-base-v2",
    ef_search=128
)

Fast Configuration

For real-time applications:

retriever = LeetCodeRetriever(
    model_name="all-MiniLM-L6-v2",
    ef_search=16
)

Custom Paths

retriever = LeetCodeRetriever(
    index_path="/data/indices/custom.index",
    metadata_path="/data/metadata/custom.pkl",
    ef_search=32
)

Advanced Usage

Combining Search and Filtering

# First filter by metadata
filtered_solutions = retriever.filter_by_metadata(
    difficulty="Medium",
    topics=["Dynamic Programming"]
)

# Then perform semantic search within filtered results
query = "longest increasing subsequence"
query_vector = retriever.encoder.encode([query])
# Manual search on filtered subset

Inspecting Index Properties

print(f"Index type: {type(retriever.index)}")
print(f"Embedding dimension: {retriever.encoder.get_sentence_embedding_dimension()}")
print(f"Number of solutions: {len(retriever.solutions)}")
print(f"HNSW ef_search: {retriever.index.hnsw.efSearch}")

Configuration Tips

Start with defaults and adjust ef_search only if retrieval quality is insufficient.

For production systems: Use ef_search=32-64 for the best speed/accuracy balance.

Changing the embedding model requires rebuilding the entire FAISS index. Make sure embeddings are consistent.

The retriever uses cosine similarity (via L2 normalization) for semantic matching. Higher scores indicate better matches.

Get Started

Core Concepts

Guides

Configuration

LeetCodeRetriever Configuration

Overview

Initialization Parameters

Search Methods

Basic Search

Metadata Filtering

Solution Data Structure

HNSW Tuning Guide

Understanding ef_search

Performance Benchmarks

Example Configurations

Default Configuration

High-Accuracy Configuration

Fast Configuration

Custom Paths

Advanced Usage

Combining Search and Filtering

Inspecting Index Properties

Configuration Tips

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

​Overview

​Initialization Parameters

​Search Methods

​Basic Search

​Metadata Filtering

​Solution Data Structure

​HNSW Tuning Guide

​Understanding ef_search

​Performance Benchmarks

​Example Configurations

​Default Configuration

​High-Accuracy Configuration

​Fast Configuration

​Custom Paths

​Advanced Usage

​Combining Search and Filtering

​Inspecting Index Properties

​Configuration Tips

Build docs developers (and LLMs) love

Overview

Initialization Parameters

Search Methods

Basic Search

Metadata Filtering

Solution Data Structure

HNSW Tuning Guide

Understanding ef_search

Performance Benchmarks

Example Configurations

Default Configuration

High-Accuracy Configuration

Fast Configuration

Custom Paths

Advanced Usage

Combining Search and Filtering

Inspecting Index Properties

Configuration Tips