Skip to main content

Overview

The embeddings module provides a unified interface for generating text embeddings through multiple providers (cloud APIs or local models), with intelligent caching and automatic fallback strategies. Key Features:
  • Multi-provider support: Cloud (via LiteLLM) or local (sentence-transformers)
  • Mode flexibility: AUTO, LOCAL, CLOUD, or HYBRID with automatic fallback
  • Built-in caching: Thread-safe LRU cache with fingerprint-based invalidation
  • Batch processing: Efficient batch embedding with per-text cache lookups
  • Async/sync APIs: Both async and synchronous wrappers available
Source: src/utils/embeddings/

EmbeddingManager

The EmbeddingManager is the primary interface for all embedding operations. Always use this class rather than instantiating providers directly.

Constructor

from src.utils.embeddings.manager import EmbeddingManager, EmbeddingMode

manager = EmbeddingManager(
    mode=EmbeddingMode.AUTO,
    cloud_config=None,
    local_config=None,
    domain="guantanamo"
)
mode
EmbeddingMode
default:"None"
Embedding generation mode. If None, reads from EMBEDDING_MODE env var or domain config.Options:
  • AUTO: Auto-detect (local if sentence-transformers available, else cloud)
  • LOCAL: Force local embedding with sentence-transformers
  • CLOUD: Force cloud embedding via LiteLLM
  • HYBRID: Try cloud first, fallback to local on failure
cloud_config
EmbeddingConfig
default:"None"
Cloud provider configuration. If None, loads from domain config.
local_config
EmbeddingConfig
default:"None"
Local provider configuration. If None, loads from domain config.
domain
str
default:"guantanamo"
Domain name for loading configuration and cache settings.

Core Methods

embed_text

await manager.embed_text(text: str, use_cache: bool = True) -> List[float]
Embed a single text asynchronously with automatic caching.
text
str
required
The text to embed.
use_cache
bool
default:"True"
Whether to use LRU cache for lookup and storage.
embeddings
List[float]
The embedding vector for the input text.
import asyncio
from src.utils.embeddings.manager import EmbeddingManager

manager = EmbeddingManager()
vector = await manager.embed_text("The Guantanamo Bay detention camp")
print(f"Embedding dimension: {len(vector)}")

embed_batch

await manager.embed_batch(
    texts: List[str],
    use_cache: bool = True
) -> List[List[float]]
Embed multiple texts efficiently with per-text cache lookups.
Batch embedding is significantly faster than multiple single embeddings, especially for local models which process batches in a single GPU/CPU pass.
texts
List[str]
required
List of texts to embed.
use_cache
bool
default:"True"
Whether to use LRU cache. Cache hits skip embedding computation.
embeddings
List[List[float]]
List of embedding vectors, one per input text (in same order).
from src.utils.embeddings.manager import EmbeddingManager

manager = EmbeddingManager()

texts = [
    "Abdul Rahman Ahmed",
    "Mohamedou Ould Slahi",
    "Abdul Rahman Ahmed",  # Duplicate - will be cached
]

vectors = await manager.embed_batch(texts)
print(f"Generated {len(vectors)} embeddings")
print(f"Cache stats: {manager.cache_stats}")
# Output: {'hits': 1, 'misses': 2, 'hit_rate': 0.33, 'size': 2}

embed_text_result

await manager.embed_text_result(text: str) -> EmbeddingResult
Embed a single text and return full metadata (model, dimension, usage).
EmbeddingResult
object

Utility Methods

get_active_model_name

manager.get_active_model_name() -> str
Return the model name of the currently active provider.
manager = EmbeddingManager(mode=EmbeddingMode.CLOUD)
model = manager.get_active_model_name()
print(model)  # "jinaai/jina-embeddings-v3"

cache_stats

manager.cache_stats -> dict
Return LRU cache hit/miss statistics for diagnostics.
cache_stats
dict
manager = EmbeddingManager()
await manager.embed_batch(["text1", "text2", "text1"])

stats = manager.cache_stats
print(f"Hit rate: {stats['hit_rate']:.1%}")  # "Hit rate: 33.3%"

fingerprint_from_result

EmbeddingManager.fingerprint_from_result(
    result: EmbeddingResult
) -> Optional[str]
Build a stable fingerprint string from an EmbeddingResult. Format: "{model}:{dimension}" (e.g., "jinaai/jina-embeddings-v3:1024"). Used to detect when an entity’s stored embedding was produced by a different model than the currently active one.
result = await manager.embed_text_result("Some text")
fingerprint = EmbeddingManager.fingerprint_from_result(result)
print(fingerprint)  # "jinaai/jina-embeddings-v3:1024"

Synchronous Wrappers

All async methods have synchronous equivalents:
Async MethodSync Wrapper
embed_text()embed_text_sync()
embed_batch()embed_batch_sync()
embed_text_result()embed_text_result_sync()
embed_batch_result_sync()
Sync wrappers use a persistent event loop to avoid asyncio.run() overhead per call.

Embedding Providers

Base Classes

EmbeddingConfig

from src.utils.embeddings.base import EmbeddingConfig

config = EmbeddingConfig(
    model_name="jinaai/jina-embeddings-v3",
    batch_size=32,
    max_retries=3,
    timeout=30,
    device="auto",
    metadata={"project": "hinbox"}
)
model_name
str
required
Model identifier (e.g., "jinaai/jina-embeddings-v3", "text-embedding-3-small").
batch_size
int
default:"32"
Number of texts to process in a single batch.
max_retries
int
default:"3"
Maximum retry attempts for cloud API calls.
timeout
int
default:"30"
Request timeout in seconds (cloud providers only).
device
str
default:"auto"
Device for local models: "auto", "cpu", "cuda", or "mps". "auto" lets sentence-transformers choose the best available device.
metadata
Dict[str, Any]
default:"{}"
Additional metadata to attach to API calls.

EmbeddingResult

Returned by embed_text_result() and provider embed_batch() methods. See embed_text_result for field details.

CloudEmbeddingProvider

Source: src/utils/embeddings/cloud.py Cloud embedding provider using LiteLLM for multi-provider API access.
Supports any embedding model available through LiteLLM, including OpenAI, Jina AI, Cohere, and others.
from src.utils.embeddings.cloud import CloudEmbeddingProvider
from src.utils.embeddings.base import EmbeddingConfig

config = EmbeddingConfig(
    model_name="jinaai/jina-embeddings-v3",
    batch_size=100,
    max_retries=3,
    metadata={"project": "hinbox"}
)

provider = CloudEmbeddingProvider(config)
vector = await provider.embed_single("The detention camp at Guantanamo Bay")
Key Features:
  • Exponential backoff retry with configurable attempts
  • Empty text filtering (returns empty vectors)
  • Usage tracking (token counts)
  • Metadata propagation to LiteLLM callbacks

LocalEmbeddingProvider

Source: src/utils/embeddings/local.py Local embedding provider using sentence-transformers (PyTorch).
Requires the local-embeddings extra: pip install 'hinbox[local-embeddings]'The provider lazily imports sentence-transformers only when first used, so environments without PyTorch can still run the application in cloud mode.
from src.utils.embeddings.local import LocalEmbeddingProvider
from src.utils.embeddings.base import EmbeddingConfig

config = EmbeddingConfig(
    model_name="jinaai/jina-embeddings-v3",  # or "huggingface/..."
    batch_size=32,
    device="auto"  # or "cuda", "cpu", "mps"
)

provider = LocalEmbeddingProvider(config)
vectors = await provider.embed_batch(["text1", "text2"])
Model Name Transformations:
  • huggingface/... → removes prefix (sentence-transformers default)
  • jina_ai/... → converts to jinaai/... (sentence-transformers format)
Device Resolution:
  • "auto": Let sentence-transformers auto-detect (CUDA > MPS > CPU)
  • "cuda", "cpu", "mps": Explicit device selection

Global Manager

get_default_manager

from src.utils.embeddings.manager import get_default_manager

manager = get_default_manager()
vector = await manager.embed_text("Some text")
Returns a singleton EmbeddingManager instance with default configuration.
The default manager is lazily initialized on first call. Subsequent calls return the same instance.

Configuration

Domain Config

Embedding configuration is typically loaded from domain YAML:
configs/guantanamo/config.yaml
embeddings:
  mode: auto  # auto | local | cloud | hybrid
  
  cloud:
    model: jinaai/jina-embeddings-v3
    batch_size: 100
    max_retries: 3
    timeout: 30
  
  local:
    model: jinaai/jina-embeddings-v3
    batch_size: 32
    device: auto

cache:
  enabled: true
  embeddings:
    lru_max_items: 4096  # Set to 0 to disable LRU cache

Environment Variables

EMBEDDING_MODE
str
Override embedding mode: auto, local, cloud, or hybrid. Takes precedence over domain config.
export EMBEDDING_MODE=local
python process_and_extract.py

Advanced Usage

Hybrid Mode with Fallback

from src.utils.embeddings.manager import EmbeddingManager, EmbeddingMode

# Try cloud first, fallback to local if cloud fails
manager = EmbeddingManager(mode=EmbeddingMode.HYBRID)

try:
    vector = await manager.embed_text("Important text")
    # Cloud provider used
except Exception:
    # Automatically fell back to local provider
    pass

Custom Provider Configuration

from src.utils.embeddings.manager import EmbeddingManager, EmbeddingMode
from src.utils.embeddings.base import EmbeddingConfig

cloud_config = EmbeddingConfig(
    model_name="text-embedding-3-large",
    batch_size=50,
    metadata={"project": "hinbox", "domain": "custom"}
)

local_config = EmbeddingConfig(
    model_name="huggingface/sentence-transformers/all-MiniLM-L6-v2",
    batch_size=64,
    device="cuda"
)

manager = EmbeddingManager(
    mode=EmbeddingMode.HYBRID,
    cloud_config=cloud_config,
    local_config=local_config,
    domain="custom"
)

Cache Fingerprint Validation

from src.utils.embeddings.manager import EmbeddingManager

manager = EmbeddingManager()

# Embed and get fingerprint
result = await manager.embed_text_result("Entity name")
fingerprint = EmbeddingManager.fingerprint_from_result(result)

# Later: check if stored embedding matches current model
stored_fingerprint = "jinaai/jina-embeddings-v3:1024"
current_fingerprint = manager.fingerprint_from_result(result)

if stored_fingerprint != current_fingerprint:
    print("Re-embedding required: model changed")

See Also

Build docs developers (and LLMs) love