Overview
TheLeetCodeRetriever class handles semantic search over the LeetCode solution database using FAISS HNSW (Hierarchical Navigable Small World) indexing and sentence transformers for embeddings.
Initialization Parameters
Path to the FAISS HNSW index file. The default path is relative to the component directory.
The index must be created using
faiss.IndexHNSWFlat. Other index types will raise a ValueError.Path to the pickled metadata file containing
Solution objects. This file stores the actual problem titles, solutions, difficulty levels, topics, and companies.The sentence transformer model used for encoding queries. This must match the model used to create the index.Popular alternatives:
all-MiniLM-L6-v2: Fast, lightweight (default)all-mpnet-base-v2: More accurate, slowermulti-qa-mpnet-base-dot-v1: Optimized for Q&A tasks
HNSW search parameter that controls the speed/accuracy trade-off during retrieval.
- Lower values (16-32): Faster search, slightly lower recall
- Higher values (64-128): Slower search, higher recall
Search Methods
Basic Search
The search query (natural language or keywords)
Number of results to return
If
True, returns (Solution, float) tuples with similarity scores. If False, returns only Solution objects.Metadata Filtering
Filter solutions by difficulty, topics, or companies:Filter by companies that ask this question (case-insensitive partial match)
Filter by difficulty:
"Easy", "Medium", or "Hard" (case-insensitive)Filter by topics/tags (case-insensitive partial match)
Solution Data Structure
Each retrieved solution is aSolution dataclass with these fields:
HNSW Tuning Guide
Understanding ef_search
Theef_search parameter controls the size of the dynamic candidate list during search:
- Low (16)
- Default (32)
- High (64)
- Very High (128)
Speed: Very fast
Accuracy: ~90% recall
Use case: Real-time applications, large datasets
Accuracy: ~90% recall
Use case: Real-time applications, large datasets
Performance Benchmarks
Assuming 1,000 indexed solutions:| ef_search | Latency | Recall | RAM Usage |
|---|---|---|---|
| 16 | ~2ms | 89% | Low |
| 32 | ~4ms | 95% | Low |
| 64 | ~8ms | 98% | Medium |
| 128 | ~15ms | 99%+ | High |
Example Configurations
Default Configuration
High-Accuracy Configuration
For maximum retrieval precision:Fast Configuration
For real-time applications:Custom Paths
Advanced Usage
Combining Search and Filtering
Inspecting Index Properties
Configuration Tips
The retriever uses cosine similarity (via L2 normalization) for semantic matching. Higher scores indicate better matches.