Retrieval System

Overview

Quest’s retrieval system uses FAISS (Facebook AI Similarity Search) with HNSW (Hierarchical Navigable Small World) indexing to perform fast, accurate semantic search over coding solutions. The LeetCodeRetriever class handles all retrieval operations.

LeetCodeRetriever Architecture

Class Structure

class LeetCodeRetriever:
    def __init__(
        self,
        index_path: str = "leetcode_hnsw2.index",
        metadata_path: str = "leetcode_metadata2.pkl",
        model_name: str = "all-MiniLM-L6-v2",
        ef_search: int = 32
    ):
        """Initialize the retriever with an HNSW index and metadata."""
        self.encoder = SentenceTransformer(model_name)
        self.index = faiss.read_index(index_path)
        self.index.hnsw.efSearch = ef_search
        self.solutions = self._load_metadata(metadata_path)

Key Components:

Encoder: all-MiniLM-L6-v2 sentence transformer (384-dimensional embeddings)
Index: FAISS HNSW index for approximate nearest neighbor search
Metadata: Pickle file containing solution details (title, code, difficulty, topics)
ef_search: HNSW parameter controlling search accuracy vs speed

HNSW Indexing Explained

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search.

Why HNSW?

Fast Search

Logarithmic search complexity O(log n) instead of linear O(n)

High Recall

Finds the most relevant results with 95%+ accuracy

Memory Efficient

Compact graph structure with tunable memory usage

No Training

Index construction is fast and requires no training phase

HNSW Parameters

ef_search (Search Beam Width)

The ef_search parameter controls the trade-off between speed and accuracy during search:

self.index.hnsw.efSearch = 32  # Default value

ef_search	Speed	Accuracy	Use Case
16	Very Fast	~90%	Real-time, approximate results
32	Fast	~95%	Default, balanced performance
64	Moderate	~98%	High accuracy requirements
128+	Slow	~99%+	Maximum accuracy

Quest uses ef_search=32 by default, providing an excellent balance between speed and accuracy for on-device retrieval.

How HNSW Works

HNSW builds a multi-layer graph structure:

Layer 2:  Entry → Node A → Node B (Sparse, long-range connections)
           ↓       ↓       ↓
Layer 1:  Node C → Node D → Node E (Medium density)
           ↓       ↓       ↓
Layer 0:  All nodes with fine-grained connections (Dense)

Search Process:

Start at the top layer (sparse)
Greedily navigate to the closest node
Move down to the next layer
Repeat until reaching layer 0
Perform local search to find exact k neighbors

Similarity Search

The search method performs semantic search over the solution database:

def search(
    self,
    query: str,
    k: int = 3,
    return_scores: bool = True
) -> List[Tuple[Solution, float]]:
    """Search for similar solutions using the HNSW index."""
    try:
        # Encode query
        query_vector = self.encoder.encode(
            [query], show_progress_bar=False)
        query_vector = query_vector.astype(np.float32)
        
        # Search index
        distances, indices = self.index.search(query_vector, k)
        
        # Return results
        if return_scores:
            return [
                (self.solutions[idx], float(score))
                for idx, score in zip(indices[0], distances[0])
            ]
        return [self.solutions[idx] for idx in indices[0]]
    except Exception as e:
        logger.error(f"Search failed: {e}")
        return []

Search Pipeline

Query Encoding

Convert the text query into a 384-dimensional vector using all-MiniLM-L6-v2

Vector Search

FAISS searches the HNSW index to find the k nearest neighbors

Metadata Retrieval

Map indices back to full solution objects from the metadata

Score Assignment

Return solutions with similarity scores (lower distance = higher similarity)

Understanding Similarity Scores

FAISS returns L2 distances (Euclidean distance in vector space):

distances, indices = self.index.search(query_vector, k)
# distances: [0.45, 0.67, 0.89] - lower is more similar
# indices: [42, 156, 7] - solution IDs in metadata

Interpreting Scores:

< 0.5: Highly relevant (exact match or near-duplicate)
0.5 - 0.7: Relevant (similar concepts or patterns)
0.7 - 0.9: Somewhat relevant (related topics)
> 0.9: Weakly relevant (may be off-topic)

The RAG engine uses a minimum confidence threshold of 0.6 by default, filtering out solutions with scores > 0.6.

Metadata Filtering

Quest supports filtering solutions by metadata attributes:

def filter_by_metadata(
    self,
    companies: List[str] = None,
    difficulty: str = None,
    topics: List[str] = None
) -> List[Solution]:
    """Filter solutions based on metadata."""
    filtered_solutions = self.solutions
    
    # Filter by companies
    if companies:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if any(company.lower() in sol.companies.lower() 
                   for company in companies)
        ]
    
    # Filter by difficulty
    if difficulty:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if sol.difficulty.lower() == difficulty.lower()
        ]
    
    # Filter by topics
    if topics:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if any(topic.lower() in sol.topics.lower() 
                   for topic in topics)
        ]
    
    return filtered_solutions

Solution Dataclass

Each solution contains rich metadata:

@dataclass
class Solution:
    title: str          # Problem name (e.g., "Two Sum")
    solution: str       # Full solution with code and explanation
    difficulty: str     # "Easy", "Medium", or "Hard"
    topics: str         # Comma-separated topics (e.g., "Array, Hash Table")
    companies: str      # Companies that ask this question

Filtering Examples

Filter Amazon Medium problems

retriever = LeetCodeRetriever()
filtered = retriever.filter_by_metadata(
    companies=["Amazon"],
    difficulty="Medium"
)

for sol in filtered:
    print(f"{sol.title} - {sol.difficulty}")

Filter BFS problems

filtered = retriever.filter_by_metadata(
    topics=["BFS"]
)

for sol in filtered:
    print(f"{sol.title}: {sol.topics}")

Combine multiple filters

filtered = retriever.filter_by_metadata(
    companies=["Google", "Facebook"],
    difficulty="Hard",
    topics=["Dynamic Programming"]
)

Performance Characteristics

Search Speed

On typical hardware (8-core CPU):

Index Size	ef_search	Search Time
1K solutions	32	~5ms
10K solutions	32	~15ms
100K solutions	32	~50ms

HNSW indexing enables sub-second search even with tens of thousands of solutions, making Quest responsive on low-end devices.

Memory Usage

Memory footprint depends on:

Embedding dimension: 384 floats per solution (1.5 KB)
HNSW graph: ~2-4 KB per solution (depends on M parameter)
Metadata: Variable (typically 2-10 KB per solution)

Total: ~5-15 KB per solution For 10,000 solutions: 50-150 MB total memory usage

Usage Example

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

# Initialize retriever
retriever = LeetCodeRetriever(
    index_path="leetcode_hnsw2.index",
    metadata_path="leetcode_metadata2.pkl",
    ef_search=32
)

# Search for similar solutions
query = "How do I traverse a binary tree in level order?"
results = retriever.search(query, k=5, return_scores=True)

# Display results
for solution, score in results:
    print(f"\nTitle: {solution.title}")
    print(f"Score: {score:.3f}")
    print(f"Topics: {solution.topics}")
    print(f"Difficulty: {solution.difficulty}")

RAG Engine

How retrieval integrates with generation

Inference Modes

Using retrieved context in prompts

Get Started

Core Concepts

Guides

Configuration

Retrieval System

Overview

LeetCodeRetriever Architecture

Class Structure

HNSW Indexing Explained

Why HNSW?

Fast Search

High Recall

Memory Efficient

No Training

HNSW Parameters

ef_search (Search Beam Width)

How HNSW Works

Similarity Search

Search Pipeline

Understanding Similarity Scores

Metadata Filtering

Solution Dataclass

Filtering Examples

Performance Characteristics

Search Speed

Memory Usage

Usage Example

RAG Engine

Inference Modes

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

​Overview

​LeetCodeRetriever Architecture

​Class Structure

​HNSW Indexing Explained

​Why HNSW?

Fast Search

High Recall

Memory Efficient

No Training

​HNSW Parameters

​ef_search (Search Beam Width)

​How HNSW Works

​Similarity Search

​Search Pipeline

​Understanding Similarity Scores

​Metadata Filtering

​Solution Dataclass

​Filtering Examples

​Performance Characteristics

​Search Speed

​Memory Usage

​Usage Example

​Related Documentation

RAG Engine

Inference Modes

Build docs developers (and LLMs) love

Overview

LeetCodeRetriever Architecture

Class Structure

HNSW Indexing Explained

Why HNSW?

HNSW Parameters

ef_search (Search Beam Width)

How HNSW Works

Similarity Search

Search Pipeline

Understanding Similarity Scores

Metadata Filtering

Solution Dataclass

Filtering Examples

Performance Characteristics

Search Speed

Memory Usage

Usage Example

Related Documentation