Overview
Quest’s retrieval system uses FAISS (Facebook AI Similarity Search) with HNSW (Hierarchical Navigable Small World) indexing to perform fast, accurate semantic search over coding solutions. The LeetCodeRetriever class handles all retrieval operations.
LeetCodeRetriever Architecture
Class Structure
class LeetCodeRetriever :
def __init__ (
self ,
index_path : str = "leetcode_hnsw2.index" ,
metadata_path : str = "leetcode_metadata2.pkl" ,
model_name : str = "all-MiniLM-L6-v2" ,
ef_search : int = 32
):
"""Initialize the retriever with an HNSW index and metadata."""
self .encoder = SentenceTransformer(model_name)
self .index = faiss.read_index(index_path)
self .index.hnsw.efSearch = ef_search
self .solutions = self ._load_metadata(metadata_path)
Key Components:
Encoder : all-MiniLM-L6-v2 sentence transformer (384-dimensional embeddings)
Index : FAISS HNSW index for approximate nearest neighbor search
Metadata : Pickle file containing solution details (title, code, difficulty, topics)
ef_search : HNSW parameter controlling search accuracy vs speed
HNSW Indexing Explained
HNSW (Hierarchical Navigable Small World) is a graph-based algorithm for approximate nearest neighbor search.
Why HNSW?
Fast Search Logarithmic search complexity O(log n) instead of linear O(n)
High Recall Finds the most relevant results with 95%+ accuracy
Memory Efficient Compact graph structure with tunable memory usage
No Training Index construction is fast and requires no training phase
HNSW Parameters
ef_search (Search Beam Width)
The ef_search parameter controls the trade-off between speed and accuracy during search:
self .index.hnsw.efSearch = 32 # Default value
ef_search Speed Accuracy Use Case 16 Very Fast ~90% Real-time, approximate results 32 Fast ~95% Default, balanced performance 64 Moderate ~98% High accuracy requirements 128+ Slow ~99%+ Maximum accuracy
Quest uses ef_search=32 by default, providing an excellent balance between speed and accuracy for on-device retrieval.
How HNSW Works
HNSW builds a multi-layer graph structure:
Layer 2: Entry → Node A → Node B (Sparse, long-range connections)
↓ ↓ ↓
Layer 1: Node C → Node D → Node E (Medium density)
↓ ↓ ↓
Layer 0: All nodes with fine-grained connections (Dense)
Search Process:
Start at the top layer (sparse)
Greedily navigate to the closest node
Move down to the next layer
Repeat until reaching layer 0
Perform local search to find exact k neighbors
Similarity Search
The search method performs semantic search over the solution database:
def search (
self ,
query : str ,
k : int = 3 ,
return_scores : bool = True
) -> List[Tuple[Solution, float ]]:
"""Search for similar solutions using the HNSW index."""
try :
# Encode query
query_vector = self .encoder.encode(
[query], show_progress_bar = False )
query_vector = query_vector.astype(np.float32)
# Search index
distances, indices = self .index.search(query_vector, k)
# Return results
if return_scores:
return [
( self .solutions[idx], float (score))
for idx, score in zip (indices[ 0 ], distances[ 0 ])
]
return [ self .solutions[idx] for idx in indices[ 0 ]]
except Exception as e:
logger.error( f "Search failed: { e } " )
return []
Search Pipeline
Query Encoding
Convert the text query into a 384-dimensional vector using all-MiniLM-L6-v2
Vector Search
FAISS searches the HNSW index to find the k nearest neighbors
Metadata Retrieval
Map indices back to full solution objects from the metadata
Score Assignment
Return solutions with similarity scores (lower distance = higher similarity)
Understanding Similarity Scores
FAISS returns L2 distances (Euclidean distance in vector space):
distances, indices = self .index.search(query_vector, k)
# distances: [0.45, 0.67, 0.89] - lower is more similar
# indices: [42, 156, 7] - solution IDs in metadata
Interpreting Scores:
< 0.5: Highly relevant (exact match or near-duplicate)
0.5 - 0.7: Relevant (similar concepts or patterns)
0.7 - 0.9: Somewhat relevant (related topics)
> 0.9: Weakly relevant (may be off-topic)
The RAG engine uses a minimum confidence threshold of 0.6 by default, filtering out solutions with scores > 0.6.
Quest supports filtering solutions by metadata attributes:
def filter_by_metadata (
self ,
companies : List[ str ] = None ,
difficulty : str = None ,
topics : List[ str ] = None
) -> List[Solution]:
"""Filter solutions based on metadata."""
filtered_solutions = self .solutions
# Filter by companies
if companies:
filtered_solutions = [
sol for sol in filtered_solutions
if any (company.lower() in sol.companies.lower()
for company in companies)
]
# Filter by difficulty
if difficulty:
filtered_solutions = [
sol for sol in filtered_solutions
if sol.difficulty.lower() == difficulty.lower()
]
# Filter by topics
if topics:
filtered_solutions = [
sol for sol in filtered_solutions
if any (topic.lower() in sol.topics.lower()
for topic in topics)
]
return filtered_solutions
Solution Dataclass
Each solution contains rich metadata:
@dataclass
class Solution :
title: str # Problem name (e.g., "Two Sum")
solution: str # Full solution with code and explanation
difficulty: str # "Easy", "Medium", or "Hard"
topics: str # Comma-separated topics (e.g., "Array, Hash Table")
companies: str # Companies that ask this question
Filtering Examples
Filter Amazon Medium problems
retriever = LeetCodeRetriever()
filtered = retriever.filter_by_metadata(
companies = [ "Amazon" ],
difficulty = "Medium"
)
for sol in filtered:
print ( f " { sol.title } - { sol.difficulty } " )
filtered = retriever.filter_by_metadata(
topics = [ "BFS" ]
)
for sol in filtered:
print ( f " { sol.title } : { sol.topics } " )
filtered = retriever.filter_by_metadata(
companies = [ "Google" , "Facebook" ],
difficulty = "Hard" ,
topics = [ "Dynamic Programming" ]
)
Search Speed
On typical hardware (8-core CPU):
Index Size ef_search Search Time 1K solutions 32 ~5ms 10K solutions 32 ~15ms 100K solutions 32 ~50ms
HNSW indexing enables sub-second search even with tens of thousands of solutions, making Quest responsive on low-end devices.
Memory Usage
Memory footprint depends on:
Embedding dimension : 384 floats per solution (1.5 KB)
HNSW graph : ~2-4 KB per solution (depends on M parameter)
Metadata : Variable (typically 2-10 KB per solution)
Total : ~5-15 KB per solution
For 10,000 solutions: 50-150 MB total memory usage
Usage Example
from src.DSAAssistant.components.retriever2 import LeetCodeRetriever
# Initialize retriever
retriever = LeetCodeRetriever(
index_path = "leetcode_hnsw2.index" ,
metadata_path = "leetcode_metadata2.pkl" ,
ef_search = 32
)
# Search for similar solutions
query = "How do I traverse a binary tree in level order?"
results = retriever.search(query, k = 5 , return_scores = True )
# Display results
for solution, score in results:
print ( f " \n Title: { solution.title } " )
print ( f "Score: { score :.3f} " )
print ( f "Topics: { solution.topics } " )
print ( f "Difficulty: { solution.difficulty } " )
RAG Engine How retrieval integrates with generation
Inference Modes Using retrieved context in prompts