Overview
Embedding providers convert text into dense vector representations that enable semantic search. Iqra AI’s modular architecture supports multiple embedding providers through a unified interface.Embedding quality directly impacts retrieval accuracy. Choose providers based on your language requirements, domain specificity, and performance needs.
Supported providers
Google Gemini
Currently, Iqra AI supports Google’s Gemini embedding models:- text-embedding-004: Latest model with improved multilingual support
- Supports variable vector dimensions (128, 256, 512, 768, 1024)
- Optimized for both retrieval and semantic similarity tasks
Additional providers
The modular architecture allows adding additional embedding providers:- OpenAI embeddings (text-embedding-3-small, text-embedding-3-large)
- Azure OpenAI embeddings
- Cohere embeddings
- Custom embedding endpoints
Setting up an embedding provider
Create integration
Navigate to Integrations in your business dashboard and create a new embedding integration.
Configure provider
Select Google Gemini and provide:
- API Key: Your Google AI API key
- Integration Name: Descriptive name for this configuration
Embedding configuration
Vector dimensions
When configuring a knowledge base, select the appropriate vector dimension:- 128-256: Faster search, lower storage, may sacrifice quality
- 512-768: Balanced performance and quality (recommended)
- 1024: Maximum quality, higher computational cost
Higher dimensions capture more semantic nuance but increase storage requirements and query latency. Test to find the optimal balance.
Model selection
Choose embedding models based on:- Language support: Ensure the model handles your content languages
- Domain alignment: Some models are optimized for specific domains
- Dimension requirements: Match your vector database configuration
- Cost: Balance quality against operational expenses
Embedding cache
Iqra AI implements intelligent embedding caching to optimize performance and reduce costs:How caching works
Cache key generation
Each embedding request generates a cache key based on:
- Input text
- Provider type (e.g., GoogleGemini)
- Model configuration (model name, dimensions)
Cache lookup
Before calling the embedding API:
- System checks if embedding exists in Redis cache
- If found (cache hit), returns cached embedding
- If not found (cache miss), calls provider API
Cache benefits
- Cost reduction: Avoid redundant API calls for repeated queries
- Latency improvement: Cache hits are 10-100x faster than API calls
- Quota management: Reduce usage against provider rate limits
Cache configuration
The system automatically manages cache based on:Provider implementation
For developers extending Iqra AI with custom providers:Interface requirements
Implement theIEmbeddingService interface:
Example: Google Gemini implementation
The Google Gemini service demonstrates the pattern:Configuration model
ImplementIEmbeddingConfig for cache keying:
The configuration is serialized to generate cache keys, ensuring embeddings with different parameters are cached separately.
Cost optimization
Batch processing
When indexing documents, the system batches embedding requests:- Reduces API overhead
- Improves throughput
- May offer cost savings with some providers
Pricing tracking
Configure pricing in the provider model:- Total embedding API calls
- Estimated token usage
- Calculated costs per knowledge base
Cache hit optimization
Maximize cache effectiveness:- Normalize queries: Clean and standardize text before embedding
- Group by context: Use embedding groups for related queries
- Monitor hit rate: Track cache performance in analytics
Troubleshooting
API key errors
- Verify API key is correct and active
- Check provider account has sufficient quota
- Ensure API access is enabled for embedding models
Dimension mismatch
- Ensure knowledge base vector dimension matches model output
- Recreate Milvus collection with correct dimension
- Re-index all documents
Rate limiting
- Implement exponential backoff (automatic in system)
- Upgrade provider quota/tier
- Reduce batch size in processing configuration
- Enable and optimize embedding cache
Cache not working
- Verify Redis connection is healthy
- Check embedding group configuration
- Ensure cache keys are being generated correctly
- Confirm MongoDB cache persistence is working
Best practices
- Test embeddings: Validate quality with sample queries before full indexing
- Monitor costs: Track embedding API usage and optimize accordingly
- Use caching: Enable cache for retrieval to reduce latency and costs
- Batch wisely: Balance batch size against rate limits and timeout constraints
- Version carefully: Changing embedding models requires re-indexing all content
Next steps
Setup guide
Create your first knowledge base with embeddings
Retrieval strategies
Configure retrieval to maximize embedding effectiveness