FastEmbed Local Embeddings
The retrieval package uses FastEmbed for local embedding generation. No external API calls required - all models run locally on your machine.Overview
FastEmbed provides fast, efficient embedding generation using optimized ONNX models. Perfect for RAG systems that need:- Local-first embedding generation
- No API costs or rate limits
- Privacy and security (data never leaves your machine)
- Consistent, reproducible embeddings
Basic Usage
Configuration Options
Model Selection
Batch Size
Cache Directory
Available Models
FastEmbed supports several high-quality embedding models:BGESmallENV15 (Default)
- Dimensions: 384
- Speed: Fast
- Quality: Good
- Best for: General-purpose embeddings, fast inference
BGEBaseENV15
- Dimensions: 768
- Speed: Medium
- Quality: Better
- Best for: Higher quality embeddings, balanced performance
BGESmallEN
- Dimensions: 384
- Speed: Fast
- Quality: Good
- Best for: Alternative to BGESmallENV15
BGEBaseEN
- Dimensions: 768
- Speed: Medium
- Quality: Better
- Best for: Higher quality, v1.0 model
AllMiniLML6V2
- Dimensions: 384
- Speed: Fast
- Quality: Good
- Best for: Lightweight, fast embeddings
MLE5Large
- Dimensions: 1024
- Speed: Slower
- Quality: Best
- Best for: Maximum quality, multilingual support
BGESmallZH
- Dimensions: 512
- Speed: Fast
- Quality: Good
- Best for: Chinese language text
Model Download
Models are automatically downloaded on first use:- Default: System cache directory
- Custom: Specified via
cacheDiroption
Embedder Function
The embedder returns a function with this signature:Input
Array of document strings:Output
Object containing embeddings and dimensions:Integration with Ingestion
Use embedder with ingestion:Batching
FastEmbed processes documents in batches for efficiency:Performance Tips
Choose the Right Model Smaller models (384 dims) are faster. Larger models (768-1024 dims) are more accurate. Adjust Batch Size Larger batches are faster but use more memory. Default is usually optimal. Cache Models Locally Store models in a persistent location to avoid re-downloading:Model Lazy Loading
FastEmbed uses lazy loading for efficiency:Error Handling
- Model download failure (network issues)
- Insufficient memory (large models)
- Invalid input (empty strings, non-text data)
Comparing Models
| Model | Dimensions | Speed | Quality | Use Case |
|---|---|---|---|---|
| BGESmallENV15 | 384 | Fast | Good | General purpose |
| BGEBaseENV15 | 768 | Medium | Better | Higher quality |
| AllMiniLML6V2 | 384 | Fast | Good | Lightweight |
| MLE5Large | 1024 | Slow | Best | Maximum quality |
| BGESmallZH | 512 | Fast | Good | Chinese text |
Example: Complete Setup
Next Steps
Ingestion
Use embeddings for ingestion
Search
Search with embeddings
Vector Store
Learn about SQLite vector storage