Skip to main content

Overview

TypeAgent’s embedding system separates provider interfaces from consumer interfaces, with automatic caching built in. Architecture:
  • IEmbedder - Minimal provider interface (what providers implement)
  • CachingEmbeddingModel - Adds caching layer
  • IEmbeddingModel - Complete consumer interface (what applications use)

Type Aliases

from typeagent.aitools.embeddings import (
    NormalizedEmbedding,
    NormalizedEmbeddings
)

type NormalizedEmbedding = NDArray[np.float32]   # A single embedding vector
type NormalizedEmbeddings = NDArray[np.float32]  # Array of embedding vectors
All embeddings are L2-normalized (unit length) for cosine similarity computation.

IEmbedder Protocol

from typeagent.aitools.embeddings import IEmbedder
Minimal provider interface for embedding models. Implement this protocol to add support for a new embedding provider.

Properties

model_name
str
The name/identifier of the embedding model.Example: "text-embedding-3-small"

Methods

get_embedding_nocache

async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding
Compute a single embedding without caching.
input
str
required
The text to embed.
embedding
NormalizedEmbedding
L2-normalized embedding vector (NDArray[np.float32]).

get_embeddings_nocache

async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings
Compute embeddings for a batch of strings without caching.
input
list[str]
required
List of texts to embed. Must not be empty.
embeddings
NormalizedEmbeddings
2D array of L2-normalized embeddings (shape: [len(input), embedding_dim]).
Raises: ValueError if input is empty.

Example Implementation

See PydanticAIEmbedder in model_adapters for the production implementation.

CachingEmbeddingModel

from typeagent.aitools.embeddings import CachingEmbeddingModel
Wraps an IEmbedder with an in-memory embedding cache. This is the standard way to use embeddings in TypeAgent.

Constructor

def __init__(self, embedder: IEmbedder) -> None
embedder
IEmbedder
required
The underlying embedder to wrap.

Properties

model_name
str
Delegates to the underlying embedder’s model name.

Methods

add_embedding

def add_embedding(self, key: str, embedding: NormalizedEmbedding) -> None
Manually add a pre-computed embedding to the cache.
key
str
required
The text key to cache the embedding under.
embedding
NormalizedEmbedding
required
The L2-normalized embedding vector.
Use Case: Pre-populating cache from persistent storage.

get_embedding_nocache

async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding
Compute embedding without using cache (delegates to underlying embedder).

get_embeddings_nocache

async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings
Compute batch embeddings without using cache (delegates to underlying embedder).

get_embedding

async def get_embedding(self, key: str) -> NormalizedEmbedding
Retrieve a single embedding, using cache if available.
key
str
required
The text to embed.
embedding
NormalizedEmbedding
Cached embedding if available, otherwise computes and caches it.
Behavior:
  1. Check cache for key
  2. If hit: return cached embedding
  3. If miss: compute via embedder.get_embedding_nocache(), cache it, return it

get_embeddings

async def get_embeddings(self, keys: list[str]) -> NormalizedEmbeddings
Retrieve embeddings for multiple keys, using cache if available.
keys
list[str]
required
List of texts to embed. Must not be empty.
embeddings
NormalizedEmbeddings
2D array of embeddings in the same order as keys.
Raises: ValueError if keys is empty. Behavior:
  1. Identify cache misses
  2. Batch compute missing embeddings via embedder.get_embeddings_nocache()
  3. Cache new embeddings
  4. Return all embeddings in original order

Example Usage

from typeagent.aitools.model_adapters import create_embedding_model

# Create model with caching (recommended way)
model = create_embedding_model("openai:text-embedding-3-small")

# Single embedding (cached)
embedding = await model.get_embedding("Hello world")
print(embedding.shape)  # (1536,) for text-embedding-3-small

# Batch embeddings (partially cached)
texts = [
    "Hello world",      # Cache hit
    "How are you?",     # Cache miss - computed
    "TypeAgent rocks",  # Cache miss - computed
]
embeddings = await model.get_embeddings(texts)
print(embeddings.shape)  # (3, 1536)

# Second call - all cache hits
embeddings2 = await model.get_embeddings(texts)
assert np.array_equal(embeddings, embeddings2)

IEmbeddingModel Protocol

from typeagent.aitools.embeddings import IEmbeddingModel
Complete consumer-facing interface combining IEmbedder methods with caching methods. Protocol Methods:
  • All methods from IEmbedder
  • add_embedding() - Manual cache insertion
  • get_embedding() - Cached single embedding
  • get_embeddings() - Cached batch embeddings
Implementations:
  • CachingEmbeddingModel - Production implementation

Normalization

All embeddings are automatically L2-normalized to unit length:
embedding = raw_embedding / np.linalg.norm(raw_embedding)
Why normalize?
  • Enables fast cosine similarity via dot product: similarity = embedding1 @ embedding2
  • Ensures consistent similarity scores across different text lengths
  • Standard practice for semantic search

Model Selection

Supported embedding models:

OpenAI Models

model = create_embedding_model("openai:text-embedding-3-small")   # 1536 dims
model = create_embedding_model("openai:text-embedding-3-large")   # 3072 dims
model = create_embedding_model("openai:text-embedding-ada-002")   # 1536 dims (legacy)

Azure OpenAI

Automatically detected when AZURE_OPENAI_API_KEY is set:
# Uses Azure OpenAI if OPENAI_API_KEY not set but AZURE_OPENAI_API_KEY is set
model = create_embedding_model("openai:text-embedding-3-small")
Model-specific Azure endpoints:
  • AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL
  • AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE
  • AZURE_OPENAI_ENDPOINT_EMBEDDING (fallback)

Other Providers

Via pydantic-ai:
model = create_embedding_model("cohere:embed-english-v3.0")
model = create_embedding_model("google:text-embedding-004")
See Model Adapters for full provider list.

Test Embeddings

from typeagent.aitools.model_adapters import create_test_embedding_model
Deterministic fake embeddings for testing - no API keys or network required.
def create_test_embedding_model(
    embedding_size: int = 3,
) -> CachingEmbeddingModel
embedding_size
int
default:"3"
Dimension of fake embeddings (default: 3 for easy debugging).
Example:
from typeagent.aitools.model_adapters import create_test_embedding_model

# Create test model
test_model = create_test_embedding_model(embedding_size=8)

# Get deterministic embeddings
emb1 = await test_model.get_embedding("hello")
emb2 = await test_model.get_embedding("hello")
assert np.array_equal(emb1, emb2)  # Deterministic

# No network calls, no API keys needed
assert test_model.model_name == "test"
Use Cases:
  • Unit tests
  • Integration tests without external dependencies
  • Local development without API keys
DO NOT use TEST_MODEL_NAME or test embeddings in production code. Test embeddings are not semantically meaningful.

Environment Variables

Embedding model selection via environment variables:
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"  # Optional

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING="https://...openai.azure.com/openai/deployments/.../embeddings?api-version=..."

# Alternative: model-specific Azure endpoints
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL="https://..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE="https://..."
See Environment Variables for details.

Performance Tips

Batch Requests

Use get_embeddings() for multiple texts to reduce API calls.

Cache Warmup

Pre-compute embeddings for common queries using add_embedding().

Dimension Tradeoff

Smaller models (3-small) are faster; larger models (3-large) are more accurate.

Shared Models

Use the same model instance across indexes to share the cache.

Complete Example

import numpy as np
from typeagent.aitools.model_adapters import create_embedding_model

# Create embedding model
model = create_embedding_model("openai:text-embedding-3-small")

# Single embedding
query_emb = await model.get_embedding("What is TypeAgent?")
print(f"Embedding dimension: {query_emb.shape[0]}")
print(f"Normalized: {np.linalg.norm(query_emb):.6f}")  # Should be ~1.0

# Batch embeddings
documents = [
    "TypeAgent is a knowledge processing library.",
    "It extracts entities and actions from conversations.",
    "TypeAgent supports transcripts, emails, and chats.",
]
doc_embeddings = await model.get_embeddings(documents)
print(f"Batch shape: {doc_embeddings.shape}")  # (3, 1536)

# Compute similarities
similarities = doc_embeddings @ query_emb  # Dot product = cosine similarity
for i, (doc, sim) in enumerate(zip(documents, similarities)):
    print(f"Doc {i} similarity: {sim:.4f}")
    print(f"  {doc[:60]}...")

# Find most similar
best_idx = np.argmax(similarities)
print(f"\nMost relevant: Doc {best_idx} (score: {similarities[best_idx]:.4f})")

# Verify caching
query_emb2 = await model.get_embedding("What is TypeAgent?")  # Cache hit
assert np.array_equal(query_emb, query_emb2)
print("Cache working!")

Build docs developers (and LLMs) love