Skip to main content

Overview

Quest’s retrieval system uses FAISS (Facebook AI Similarity Search) with HNSW (Hierarchical Navigable Small World) indexing to perform fast, accurate semantic search over coding solutions. The LeetCodeRetriever class handles all retrieval operations.

LeetCodeRetriever Architecture

Class Structure

class LeetCodeRetriever:
    def __init__(
        self,
        index_path: str = "leetcode_hnsw2.index",
        metadata_path: str = "leetcode_metadata2.pkl",
        model_name: str = "all-MiniLM-L6-v2",
        ef_search: int = 32
    ):
        """Initialize the retriever with an HNSW index and metadata."""
        self.encoder = SentenceTransformer(model_name)
        self.index = faiss.read_index(index_path)
        self.index.hnsw.efSearch = ef_search
        self.solutions = self._load_metadata(metadata_path)
Key Components:
  • Encoder: all-MiniLM-L6-v2 sentence transformer (384-dimensional embeddings)
  • Index: FAISS HNSW index for approximate nearest neighbor search
  • Metadata: Pickle file containing solution details (title, code, difficulty, topics)
  • ef_search: HNSW parameter controlling search accuracy vs speed

HNSW Indexing Explained

HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search.

Why HNSW?

Fast Search

Logarithmic search complexity O(log n) instead of linear O(n)

High Recall

Finds the most relevant results with 95%+ accuracy

Memory Efficient

Compact graph structure with tunable memory usage

No Training

Index construction is fast and requires no training phase

HNSW Parameters

ef_search (Search Beam Width)

The ef_search parameter controls the trade-off between speed and accuracy during search:
self.index.hnsw.efSearch = 32  # Default value
ef_searchSpeedAccuracyUse Case
16Very Fast~90%Real-time, approximate results
32Fast~95%Default, balanced performance
64Moderate~98%High accuracy requirements
128+Slow~99%+Maximum accuracy
Quest uses ef_search=32 by default, providing an excellent balance between speed and accuracy for on-device retrieval.

How HNSW Works

HNSW builds a multi-layer graph structure:
Layer 2:  Entry → Node A → Node B (Sparse, long-range connections)
           ↓       ↓       ↓
Layer 1:  Node C → Node D → Node E (Medium density)
           ↓       ↓       ↓
Layer 0:  All nodes with fine-grained connections (Dense)
Search Process:
  1. Start at the top layer (sparse)
  2. Greedily navigate to the closest node
  3. Move down to the next layer
  4. Repeat until reaching layer 0
  5. Perform local search to find exact k neighbors
The search method performs semantic search over the solution database:
def search(
    self,
    query: str,
    k: int = 3,
    return_scores: bool = True
) -> List[Tuple[Solution, float]]:
    """Search for similar solutions using the HNSW index."""
    try:
        # Encode query
        query_vector = self.encoder.encode(
            [query], show_progress_bar=False)
        query_vector = query_vector.astype(np.float32)
        
        # Search index
        distances, indices = self.index.search(query_vector, k)
        
        # Return results
        if return_scores:
            return [
                (self.solutions[idx], float(score))
                for idx, score in zip(indices[0], distances[0])
            ]
        return [self.solutions[idx] for idx in indices[0]]
    except Exception as e:
        logger.error(f"Search failed: {e}")
        return []

Search Pipeline

1

Query Encoding

Convert the text query into a 384-dimensional vector using all-MiniLM-L6-v2
2

Vector Search

FAISS searches the HNSW index to find the k nearest neighbors
3

Metadata Retrieval

Map indices back to full solution objects from the metadata
4

Score Assignment

Return solutions with similarity scores (lower distance = higher similarity)

Understanding Similarity Scores

FAISS returns L2 distances (Euclidean distance in vector space):
distances, indices = self.index.search(query_vector, k)
# distances: [0.45, 0.67, 0.89] - lower is more similar
# indices: [42, 156, 7] - solution IDs in metadata
Interpreting Scores:
  • < 0.5: Highly relevant (exact match or near-duplicate)
  • 0.5 - 0.7: Relevant (similar concepts or patterns)
  • 0.7 - 0.9: Somewhat relevant (related topics)
  • > 0.9: Weakly relevant (may be off-topic)
The RAG engine uses a minimum confidence threshold of 0.6 by default, filtering out solutions with scores > 0.6.

Metadata Filtering

Quest supports filtering solutions by metadata attributes:
def filter_by_metadata(
    self,
    companies: List[str] = None,
    difficulty: str = None,
    topics: List[str] = None
) -> List[Solution]:
    """Filter solutions based on metadata."""
    filtered_solutions = self.solutions
    
    # Filter by companies
    if companies:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if any(company.lower() in sol.companies.lower() 
                   for company in companies)
        ]
    
    # Filter by difficulty
    if difficulty:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if sol.difficulty.lower() == difficulty.lower()
        ]
    
    # Filter by topics
    if topics:
        filtered_solutions = [
            sol for sol in filtered_solutions
            if any(topic.lower() in sol.topics.lower() 
                   for topic in topics)
        ]
    
    return filtered_solutions

Solution Dataclass

Each solution contains rich metadata:
@dataclass
class Solution:
    title: str          # Problem name (e.g., "Two Sum")
    solution: str       # Full solution with code and explanation
    difficulty: str     # "Easy", "Medium", or "Hard"
    topics: str         # Comma-separated topics (e.g., "Array, Hash Table")
    companies: str      # Companies that ask this question

Filtering Examples

retriever = LeetCodeRetriever()
filtered = retriever.filter_by_metadata(
    companies=["Amazon"],
    difficulty="Medium"
)

for sol in filtered:
    print(f"{sol.title} - {sol.difficulty}")
filtered = retriever.filter_by_metadata(
    topics=["BFS"]
)

for sol in filtered:
    print(f"{sol.title}: {sol.topics}")
filtered = retriever.filter_by_metadata(
    companies=["Google", "Facebook"],
    difficulty="Hard",
    topics=["Dynamic Programming"]
)

Performance Characteristics

Search Speed

On typical hardware (8-core CPU):
Index Sizeef_searchSearch Time
1K solutions32~5ms
10K solutions32~15ms
100K solutions32~50ms
HNSW indexing enables sub-second search even with tens of thousands of solutions, making Quest responsive on low-end devices.

Memory Usage

Memory footprint depends on:
  • Embedding dimension: 384 floats per solution (1.5 KB)
  • HNSW graph: ~2-4 KB per solution (depends on M parameter)
  • Metadata: Variable (typically 2-10 KB per solution)
Total: ~5-15 KB per solution For 10,000 solutions: 50-150 MB total memory usage

Usage Example

from src.DSAAssistant.components.retriever2 import LeetCodeRetriever

# Initialize retriever
retriever = LeetCodeRetriever(
    index_path="leetcode_hnsw2.index",
    metadata_path="leetcode_metadata2.pkl",
    ef_search=32
)

# Search for similar solutions
query = "How do I traverse a binary tree in level order?"
results = retriever.search(query, k=5, return_scores=True)

# Display results
for solution, score in results:
    print(f"\nTitle: {solution.title}")
    print(f"Score: {score:.3f}")
    print(f"Topics: {solution.topics}")
    print(f"Difficulty: {solution.difficulty}")

RAG Engine

How retrieval integrates with generation

Inference Modes

Using retrieved context in prompts

Build docs developers (and LLMs) love