Overview
TypeAgent’s embedding system separates provider interfaces from consumer interfaces, with automatic caching built in.
Architecture:
- IEmbedder - Minimal provider interface (what providers implement)
- CachingEmbeddingModel - Adds caching layer
- IEmbeddingModel - Complete consumer interface (what applications use)
Type Aliases
from typeagent.aitools.embeddings import (
NormalizedEmbedding,
NormalizedEmbeddings
)
type NormalizedEmbedding = NDArray[np.float32] # A single embedding vector
type NormalizedEmbeddings = NDArray[np.float32] # Array of embedding vectors
All embeddings are L2-normalized (unit length) for cosine similarity computation.
IEmbedder Protocol
from typeagent.aitools.embeddings import IEmbedder
Minimal provider interface for embedding models. Implement this protocol to add support for a new embedding provider.
Properties
The name/identifier of the embedding model.Example: "text-embedding-3-small"
Methods
get_embedding_nocache
async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding
Compute a single embedding without caching.
L2-normalized embedding vector (NDArray[np.float32]).
get_embeddings_nocache
async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings
Compute embeddings for a batch of strings without caching.
List of texts to embed. Must not be empty.
2D array of L2-normalized embeddings (shape: [len(input), embedding_dim]).
Raises: ValueError if input is empty.
Example Implementation
See PydanticAIEmbedder in model_adapters for the production implementation.
CachingEmbeddingModel
from typeagent.aitools.embeddings import CachingEmbeddingModel
Wraps an IEmbedder with an in-memory embedding cache. This is the standard way to use embeddings in TypeAgent.
Constructor
def __init__(self, embedder: IEmbedder) -> None
The underlying embedder to wrap.
Properties
Delegates to the underlying embedder’s model name.
Methods
add_embedding
def add_embedding(self, key: str, embedding: NormalizedEmbedding) -> None
Manually add a pre-computed embedding to the cache.
The text key to cache the embedding under.
embedding
NormalizedEmbedding
required
The L2-normalized embedding vector.
Use Case: Pre-populating cache from persistent storage.
get_embedding_nocache
async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding
Compute embedding without using cache (delegates to underlying embedder).
get_embeddings_nocache
async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings
Compute batch embeddings without using cache (delegates to underlying embedder).
get_embedding
async def get_embedding(self, key: str) -> NormalizedEmbedding
Retrieve a single embedding, using cache if available.
Cached embedding if available, otherwise computes and caches it.
Behavior:
- Check cache for
key
- If hit: return cached embedding
- If miss: compute via
embedder.get_embedding_nocache(), cache it, return it
get_embeddings
async def get_embeddings(self, keys: list[str]) -> NormalizedEmbeddings
Retrieve embeddings for multiple keys, using cache if available.
List of texts to embed. Must not be empty.
2D array of embeddings in the same order as keys.
Raises: ValueError if keys is empty.
Behavior:
- Identify cache misses
- Batch compute missing embeddings via
embedder.get_embeddings_nocache()
- Cache new embeddings
- Return all embeddings in original order
Example Usage
from typeagent.aitools.model_adapters import create_embedding_model
# Create model with caching (recommended way)
model = create_embedding_model("openai:text-embedding-3-small")
# Single embedding (cached)
embedding = await model.get_embedding("Hello world")
print(embedding.shape) # (1536,) for text-embedding-3-small
# Batch embeddings (partially cached)
texts = [
"Hello world", # Cache hit
"How are you?", # Cache miss - computed
"TypeAgent rocks", # Cache miss - computed
]
embeddings = await model.get_embeddings(texts)
print(embeddings.shape) # (3, 1536)
# Second call - all cache hits
embeddings2 = await model.get_embeddings(texts)
assert np.array_equal(embeddings, embeddings2)
IEmbeddingModel Protocol
from typeagent.aitools.embeddings import IEmbeddingModel
Complete consumer-facing interface combining IEmbedder methods with caching methods.
Protocol Methods:
- All methods from
IEmbedder
add_embedding() - Manual cache insertion
get_embedding() - Cached single embedding
get_embeddings() - Cached batch embeddings
Implementations:
CachingEmbeddingModel - Production implementation
Normalization
All embeddings are automatically L2-normalized to unit length:
embedding = raw_embedding / np.linalg.norm(raw_embedding)
Why normalize?
- Enables fast cosine similarity via dot product:
similarity = embedding1 @ embedding2
- Ensures consistent similarity scores across different text lengths
- Standard practice for semantic search
Model Selection
Supported embedding models:
OpenAI Models
model = create_embedding_model("openai:text-embedding-3-small") # 1536 dims
model = create_embedding_model("openai:text-embedding-3-large") # 3072 dims
model = create_embedding_model("openai:text-embedding-ada-002") # 1536 dims (legacy)
Azure OpenAI
Automatically detected when AZURE_OPENAI_API_KEY is set:
# Uses Azure OpenAI if OPENAI_API_KEY not set but AZURE_OPENAI_API_KEY is set
model = create_embedding_model("openai:text-embedding-3-small")
Model-specific Azure endpoints:
AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL
AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE
AZURE_OPENAI_ENDPOINT_EMBEDDING (fallback)
Other Providers
Via pydantic-ai:
model = create_embedding_model("cohere:embed-english-v3.0")
model = create_embedding_model("google:text-embedding-004")
See Model Adapters for full provider list.
Test Embeddings
from typeagent.aitools.model_adapters import create_test_embedding_model
Deterministic fake embeddings for testing - no API keys or network required.
def create_test_embedding_model(
embedding_size: int = 3,
) -> CachingEmbeddingModel
Dimension of fake embeddings (default: 3 for easy debugging).
Example:
from typeagent.aitools.model_adapters import create_test_embedding_model
# Create test model
test_model = create_test_embedding_model(embedding_size=8)
# Get deterministic embeddings
emb1 = await test_model.get_embedding("hello")
emb2 = await test_model.get_embedding("hello")
assert np.array_equal(emb1, emb2) # Deterministic
# No network calls, no API keys needed
assert test_model.model_name == "test"
Use Cases:
- Unit tests
- Integration tests without external dependencies
- Local development without API keys
DO NOT use TEST_MODEL_NAME or test embeddings in production code. Test embeddings are not semantically meaningful.
Environment Variables
Embedding model selection via environment variables:
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small" # Optional
# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING="https://...openai.azure.com/openai/deployments/.../embeddings?api-version=..."
# Alternative: model-specific Azure endpoints
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL="https://..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE="https://..."
See Environment Variables for details.
Batch Requests
Use get_embeddings() for multiple texts to reduce API calls.
Cache Warmup
Pre-compute embeddings for common queries using add_embedding().
Dimension Tradeoff
Smaller models (3-small) are faster; larger models (3-large) are more accurate.
Shared Models
Use the same model instance across indexes to share the cache.
Complete Example
import numpy as np
from typeagent.aitools.model_adapters import create_embedding_model
# Create embedding model
model = create_embedding_model("openai:text-embedding-3-small")
# Single embedding
query_emb = await model.get_embedding("What is TypeAgent?")
print(f"Embedding dimension: {query_emb.shape[0]}")
print(f"Normalized: {np.linalg.norm(query_emb):.6f}") # Should be ~1.0
# Batch embeddings
documents = [
"TypeAgent is a knowledge processing library.",
"It extracts entities and actions from conversations.",
"TypeAgent supports transcripts, emails, and chats.",
]
doc_embeddings = await model.get_embeddings(documents)
print(f"Batch shape: {doc_embeddings.shape}") # (3, 1536)
# Compute similarities
similarities = doc_embeddings @ query_emb # Dot product = cosine similarity
for i, (doc, sim) in enumerate(zip(documents, similarities)):
print(f"Doc {i} similarity: {sim:.4f}")
print(f" {doc[:60]}...")
# Find most similar
best_idx = np.argmax(similarities)
print(f"\nMost relevant: Doc {best_idx} (score: {similarities[best_idx]:.4f})")
# Verify caching
query_emb2 = await model.get_embedding("What is TypeAgent?") # Cache hit
assert np.array_equal(query_emb, query_emb2)
print("Cache working!")