Embedding provider integrations

Overview

Embedding providers convert text into dense vector representations that enable semantic search. Iqra AI’s modular architecture supports multiple embedding providers through a unified interface.

Embedding quality directly impacts retrieval accuracy. Choose providers based on your language requirements, domain specificity, and performance needs.

Supported providers

Google Gemini

Currently, Iqra AI supports Google’s Gemini embedding models:

text-embedding-004: Latest model with improved multilingual support
Supports variable vector dimensions (128, 256, 512, 768, 1024)
Optimized for both retrieval and semantic similarity tasks

Gemini embeddings provide excellent multilingual support, making them ideal for Arabic and other non-English content in Iqra AI.

Additional providers

The modular architecture allows adding additional embedding providers:

OpenAI embeddings (text-embedding-3-small, text-embedding-3-large)
Azure OpenAI embeddings
Cohere embeddings
Custom embedding endpoints

Setting up an embedding provider

Create integration

Navigate to Integrations in your business dashboard and create a new embedding integration.

Configure provider

Select Google Gemini and provide:

API Key: Your Google AI API key
Integration Name: Descriptive name for this configuration

# Get your API key from Google AI Studio
https://aistudio.google.com/app/apikey

Add models

Configure the embedding models you want to use:

{
  id: "text-embedding-004",
  name: "Text Embedding 004",
  disabled: false,
  price: 0.00001,              // Per 1,000 tokens
  priceTokenUnit: 1000,
  availableVectorDimensions: [128, 256, 512, 768, 1024]
}

Test connection

Use the test interface to verify:

API key is valid
Model access is working
Embeddings are generated successfully

Embedding configuration

Vector dimensions

When configuring a knowledge base, select the appropriate vector dimension:

128-256: Faster search, lower storage, may sacrifice quality
512-768: Balanced performance and quality (recommended)
1024: Maximum quality, higher computational cost

Higher dimensions capture more semantic nuance but increase storage requirements and query latency. Test to find the optimal balance.

Model selection

Choose embedding models based on:

Language support: Ensure the model handles your content languages
Domain alignment: Some models are optimized for specific domains
Dimension requirements: Match your vector database configuration
Cost: Balance quality against operational expenses

Embedding cache

Iqra AI implements intelligent embedding caching to optimize performance and reduce costs:

How caching works

Cache key generation

Each embedding request generates a cache key based on:

Input text
Provider type (e.g., GoogleGemini)
Model configuration (model name, dimensions)

Cache lookup

Before calling the embedding API:

System checks if embedding exists in Redis cache
If found (cache hit), returns cached embedding
If not found (cache miss), calls provider API

Cache storage

New embeddings are stored in:

Redis: For fast retrieval
MongoDB: For persistence and analytics

Organized by:

Business ID
Embedding group ID
Language
Reference context

Cache benefits

Cost reduction: Avoid redundant API calls for repeated queries
Latency improvement: Cache hits are 10-100x faster than API calls
Quota management: Reduce usage against provider rate limits

Embedding cache is particularly effective for:

Common user queries
Repeated indexing operations
Testing and development workflows

Cache configuration

The system automatically manages cache based on:

{
  // Cache is enabled by default for retrieval queries
  checkEmbeddingCache: true,
  
  // Group embeddings by language and reference
  cacheEmbeddingGroupLanguage: "en",
  cacheReference: "agent-123"
}

Provider implementation

For developers extending Iqra AI with custom providers:

Interface requirements

Implement the IEmbeddingService interface:

public interface IEmbeddingService : IDisposable
{
    // Generate embedding for single text
    Task<FunctionReturnResult<float[]?>> GenerateEmbeddingForTextAsync(
        string text
    );
    
    // Generate embeddings for multiple texts (batched)
    Task<FunctionReturnResult<List<float[]>>> GenerateEmbeddingForTextListAsync(
        List<string> texts
    );
    
    // Get provider type for caching
    InterfaceEmbeddingProviderEnum GetProviderType();
    
    // Get cacheable configuration
    IEmbeddingConfig GetCacheableConfig();
}

Example: Google Gemini implementation

The Google Gemini service demonstrates the pattern:

public class GoogleGeminiEmbeddingService : IEmbeddingService
{
    private readonly HttpClient _httpClient;
    private readonly string _apiKey;
    private readonly GoogleGeminiEmbeddingServiceConfig _config;
    
    public async Task<FunctionReturnResult<float[]?>> 
        GenerateEmbeddingForTextAsync(string text)
    {
        var request = new GeminiEmbeddingRequest
        {
            Model = $"models/{_config.Model}",
            Content = new { Parts = new[] { new { Text = text } } },
            OutputDimensionality = _config.VectorDimension
        };
        
        // Make API call, handle errors, return embedding
    }
    
    public InterfaceEmbeddingProviderEnum GetProviderType() 
        => InterfaceEmbeddingProviderEnum.GoogleGemini;
}

Configuration model

Implement IEmbeddingConfig for cache keying:

public class GoogleGeminiEmbeddingServiceConfig : IEmbeddingConfig
{
    public required string Model { get; set; }
    public required int VectorDimension { get; set; }
}

The configuration is serialized to generate cache keys, ensuring embeddings with different parameters are cached separately.

Cost optimization

Batch processing

When indexing documents, the system batches embedding requests:

Reduces API overhead
Improves throughput
May offer cost savings with some providers

// Batch all chunks from a document
var textsToEmbed = chunks.Select(c => c.Text).ToList();
var embeddings = await embeddingService
    .GenerateEmbeddingForTextListAsync(textsToEmbed);

Pricing tracking

Configure pricing in the provider model:

{
  price: 0.00001,           // Cost per priceTokenUnit
  priceTokenUnit: 1000      // Typically per 1,000 tokens
}

The system tracks:

Total embedding API calls
Estimated token usage
Calculated costs per knowledge base

Pricing is for estimation only. Verify actual costs with your provider’s billing dashboard.

Cache hit optimization

Maximize cache effectiveness:

Normalize queries: Clean and standardize text before embedding
Group by context: Use embedding groups for related queries
Monitor hit rate: Track cache performance in analytics

Troubleshooting

API key errors

Embedding generation failed: Unauthorized

Solution:

Verify API key is correct and active
Check provider account has sufficient quota
Ensure API access is enabled for embedding models

Dimension mismatch

Vector dimension mismatch: expected 768, got 1024

Solution:

Ensure knowledge base vector dimension matches model output
Recreate Milvus collection with correct dimension
Re-index all documents

Changing vector dimensions requires full re-indexing and cannot be done in-place.

Rate limiting

Embedding provider rate limit exceeded

Solution:

Implement exponential backoff (automatic in system)
Upgrade provider quota/tier
Reduce batch size in processing configuration
Enable and optimize embedding cache

Cache not working

Cache hit rate: 0%

Solution:

Verify Redis connection is healthy
Check embedding group configuration
Ensure cache keys are being generated correctly
Confirm MongoDB cache persistence is working

Best practices

Test embeddings: Validate quality with sample queries before full indexing
Monitor costs: Track embedding API usage and optimize accordingly
Use caching: Enable cache for retrieval to reduce latency and costs
Batch wisely: Balance batch size against rate limits and timeout constraints
Version carefully: Changing embedding models requires re-indexing all content

Getting Started

Core Concepts

Building Agents

Integrations

Knowledge Base & RAG

Deployment

Channels

Embedding provider integrations

Overview

Supported providers

Google Gemini

Additional providers

Setting up an embedding provider

Embedding configuration

Vector dimensions

Model selection

Embedding cache

How caching works

Cache benefits

Cache configuration

Provider implementation

Interface requirements

Example: Google Gemini implementation

Configuration model

Cost optimization

Batch processing

Pricing tracking

Cache hit optimization

Troubleshooting

API key errors

Dimension mismatch

Rate limiting

Cache not working

Best practices

Next steps

Setup guide

Retrieval strategies

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Building Agents

Integrations

Knowledge Base & RAG

Deployment

Channels

​Overview

​Supported providers

​Google Gemini

​Additional providers

​Setting up an embedding provider

​Embedding configuration

​Vector dimensions

​Model selection

​Embedding cache

​How caching works

​Cache benefits

​Cache configuration

​Provider implementation

​Interface requirements

​Example: Google Gemini implementation

​Configuration model

​Cost optimization

​Batch processing

​Pricing tracking

​Cache hit optimization

​Troubleshooting

​API key errors

​Dimension mismatch

​Rate limiting

​Cache not working

​Best practices

​Next steps

Setup guide

Retrieval strategies

Build docs developers (and LLMs) love

Overview

Supported providers

Google Gemini

Additional providers

Setting up an embedding provider

Embedding configuration

Vector dimensions

Model selection

Embedding cache

How caching works

Cache benefits

Cache configuration

Provider implementation

Interface requirements

Example: Google Gemini implementation

Configuration model

Cost optimization

Batch processing

Pricing tracking

Cache hit optimization

Troubleshooting

API key errors

Dimension mismatch

Rate limiting

Cache not working

Best practices

Next steps