Embedding Models

Overview

TypeAgent’s embedding system separates provider interfaces from consumer interfaces, with automatic caching built in. Architecture:

IEmbedder - Minimal provider interface (what providers implement)
CachingEmbeddingModel - Adds caching layer
IEmbeddingModel - Complete consumer interface (what applications use)

Type Aliases

from typeagent.aitools.embeddings import (
    NormalizedEmbedding,
    NormalizedEmbeddings
)

type NormalizedEmbedding = NDArray[np.float32]   # A single embedding vector
type NormalizedEmbeddings = NDArray[np.float32]  # Array of embedding vectors

All embeddings are L2-normalized (unit length) for cosine similarity computation.

IEmbedder Protocol

from typeagent.aitools.embeddings import IEmbedder

Minimal provider interface for embedding models. Implement this protocol to add support for a new embedding provider.

Properties

model_name

str

The name/identifier of the embedding model.Example: "text-embedding-3-small"

Methods

get_embedding_nocache

async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding

Compute a single embedding without caching.

input

str

required

The text to embed.

embedding

NormalizedEmbedding

L2-normalized embedding vector (NDArray[np.float32]).

get_embeddings_nocache

async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings

Compute embeddings for a batch of strings without caching.

input

list[str]

required

List of texts to embed. Must not be empty.

embeddings

NormalizedEmbeddings

2D array of L2-normalized embeddings (shape: [len(input), embedding_dim]).

Raises: ValueError if input is empty.

Example Implementation

See PydanticAIEmbedder in model_adapters for the production implementation.

CachingEmbeddingModel

from typeagent.aitools.embeddings import CachingEmbeddingModel

Wraps an IEmbedder with an in-memory embedding cache. This is the standard way to use embeddings in TypeAgent.

Constructor

def __init__(self, embedder: IEmbedder) -> None

embedder

IEmbedder

required

The underlying embedder to wrap.

Properties

model_name

str

Delegates to the underlying embedder’s model name.

Methods

add_embedding

def add_embedding(self, key: str, embedding: NormalizedEmbedding) -> None

Manually add a pre-computed embedding to the cache.

key

str

required

The text key to cache the embedding under.

embedding

NormalizedEmbedding

required

The L2-normalized embedding vector.

Use Case: Pre-populating cache from persistent storage.

get_embedding_nocache

async def get_embedding_nocache(self, input: str) -> NormalizedEmbedding

Compute embedding without using cache (delegates to underlying embedder).

get_embeddings_nocache

async def get_embeddings_nocache(self, input: list[str]) -> NormalizedEmbeddings

Compute batch embeddings without using cache (delegates to underlying embedder).

get_embedding

async def get_embedding(self, key: str) -> NormalizedEmbedding

Retrieve a single embedding, using cache if available.

key

str

required

The text to embed.

embedding

NormalizedEmbedding

Cached embedding if available, otherwise computes and caches it.

Behavior:

Check cache for key
If hit: return cached embedding
If miss: compute via embedder.get_embedding_nocache(), cache it, return it

get_embeddings

async def get_embeddings(self, keys: list[str]) -> NormalizedEmbeddings

Retrieve embeddings for multiple keys, using cache if available.

keys

list[str]

required

List of texts to embed. Must not be empty.

embeddings

NormalizedEmbeddings

2D array of embeddings in the same order as keys.

Raises: ValueError if keys is empty. Behavior:

Identify cache misses
Batch compute missing embeddings via embedder.get_embeddings_nocache()
Cache new embeddings
Return all embeddings in original order

Example Usage

from typeagent.aitools.model_adapters import create_embedding_model

# Create model with caching (recommended way)
model = create_embedding_model("openai:text-embedding-3-small")

# Single embedding (cached)
embedding = await model.get_embedding("Hello world")
print(embedding.shape)  # (1536,) for text-embedding-3-small

# Batch embeddings (partially cached)
texts = [
    "Hello world",      # Cache hit
    "How are you?",     # Cache miss - computed
    "TypeAgent rocks",  # Cache miss - computed
]
embeddings = await model.get_embeddings(texts)
print(embeddings.shape)  # (3, 1536)

# Second call - all cache hits
embeddings2 = await model.get_embeddings(texts)
assert np.array_equal(embeddings, embeddings2)

IEmbeddingModel Protocol

from typeagent.aitools.embeddings import IEmbeddingModel

Complete consumer-facing interface combining IEmbedder methods with caching methods. Protocol Methods:

All methods from IEmbedder
add_embedding() - Manual cache insertion
get_embedding() - Cached single embedding
get_embeddings() - Cached batch embeddings

Implementations:

CachingEmbeddingModel - Production implementation

Normalization

All embeddings are automatically L2-normalized to unit length:

embedding = raw_embedding / np.linalg.norm(raw_embedding)

Why normalize?

Enables fast cosine similarity via dot product: similarity = embedding1 @ embedding2
Ensures consistent similarity scores across different text lengths
Standard practice for semantic search

Model Selection

Supported embedding models:

OpenAI Models

model = create_embedding_model("openai:text-embedding-3-small")   # 1536 dims
model = create_embedding_model("openai:text-embedding-3-large")   # 3072 dims
model = create_embedding_model("openai:text-embedding-ada-002")   # 1536 dims (legacy)

Azure OpenAI

Automatically detected when AZURE_OPENAI_API_KEY is set:

# Uses Azure OpenAI if OPENAI_API_KEY not set but AZURE_OPENAI_API_KEY is set
model = create_embedding_model("openai:text-embedding-3-small")

Model-specific Azure endpoints:

AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL
AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE
AZURE_OPENAI_ENDPOINT_EMBEDDING (fallback)

Other Providers

Via pydantic-ai:

model = create_embedding_model("cohere:embed-english-v3.0")
model = create_embedding_model("google:text-embedding-004")

See Model Adapters for full provider list.

Test Embeddings

from typeagent.aitools.model_adapters import create_test_embedding_model

Deterministic fake embeddings for testing - no API keys or network required.

def create_test_embedding_model(
    embedding_size: int = 3,
) -> CachingEmbeddingModel

embedding_size

int

default:"3"

Dimension of fake embeddings (default: 3 for easy debugging).

Example:

from typeagent.aitools.model_adapters import create_test_embedding_model

# Create test model
test_model = create_test_embedding_model(embedding_size=8)

# Get deterministic embeddings
emb1 = await test_model.get_embedding("hello")
emb2 = await test_model.get_embedding("hello")
assert np.array_equal(emb1, emb2)  # Deterministic

# No network calls, no API keys needed
assert test_model.model_name == "test"

Use Cases:

Unit tests
Integration tests without external dependencies
Local development without API keys

DO NOT use TEST_MODEL_NAME or test embeddings in production code. Test embeddings are not semantically meaningful.

Environment Variables

Embedding model selection via environment variables:

# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"  # Optional

# Azure OpenAI
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING="https://...openai.azure.com/openai/deployments/.../embeddings?api-version=..."

# Alternative: model-specific Azure endpoints
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_SMALL="https://..."
export AZURE_OPENAI_ENDPOINT_EMBEDDING_3_LARGE="https://..."

See Environment Variables for details.

Performance Tips

Batch Requests

Use get_embeddings() for multiple texts to reduce API calls.

Cache Warmup

Pre-compute embeddings for common queries using add_embedding().

Dimension Tradeoff

Smaller models (3-small) are faster; larger models (3-large) are more accurate.

Shared Models

Use the same model instance across indexes to share the cache.

Complete Example

import numpy as np
from typeagent.aitools.model_adapters import create_embedding_model

# Create embedding model
model = create_embedding_model("openai:text-embedding-3-small")

# Single embedding
query_emb = await model.get_embedding("What is TypeAgent?")
print(f"Embedding dimension: {query_emb.shape[0]}")
print(f"Normalized: {np.linalg.norm(query_emb):.6f}")  # Should be ~1.0

# Batch embeddings
documents = [
    "TypeAgent is a knowledge processing library.",
    "It extracts entities and actions from conversations.",
    "TypeAgent supports transcripts, emails, and chats.",
]
doc_embeddings = await model.get_embeddings(documents)
print(f"Batch shape: {doc_embeddings.shape}")  # (3, 1536)

# Compute similarities
similarities = doc_embeddings @ query_emb  # Dot product = cosine similarity
for i, (doc, sim) in enumerate(zip(documents, similarities)):
    print(f"Doc {i} similarity: {sim:.4f}")
    print(f"  {doc[:60]}...")

# Find most similar
best_idx = np.argmax(similarities)
print(f"\nMost relevant: Doc {best_idx} (score: {similarities[best_idx]:.4f})")

# Verify caching
query_emb2 = await model.get_embedding("What is TypeAgent?")  # Cache hit
assert np.array_equal(query_emb, query_emb2)
print("Cache working!")

Model Adapters - Creating embedding models
ConversationSettings - Configuring embeddings
Environment Variables - API key setup

Core API

Storage

AI Tools

Settings

Overview

Type Aliases

IEmbedder Protocol

Properties

Methods

get_embedding_nocache

get_embeddings_nocache

Example Implementation

CachingEmbeddingModel

Constructor

Properties

Methods

add_embedding

get_embedding_nocache

get_embeddings_nocache

get_embedding

get_embeddings

Example Usage

IEmbeddingModel Protocol

Normalization

Model Selection

OpenAI Models

Azure OpenAI

Other Providers

Test Embeddings

Environment Variables

Performance Tips

Batch Requests

Cache Warmup

Dimension Tradeoff

Shared Models

Complete Example

Build docs developers (and LLMs) love

Core API

Storage

AI Tools

Settings

​Overview

​Type Aliases

​IEmbedder Protocol

​Properties

​Methods

​get_embedding_nocache

​get_embeddings_nocache

​Example Implementation

​CachingEmbeddingModel

​Constructor

​Properties

​Methods

​add_embedding

​get_embedding_nocache

​get_embeddings_nocache

​get_embedding

​get_embeddings

​Example Usage

​IEmbeddingModel Protocol

​Normalization

​Model Selection

​OpenAI Models

​Azure OpenAI

​Other Providers

​Test Embeddings

​Environment Variables

​Performance Tips

Batch Requests

Cache Warmup

Dimension Tradeoff

Shared Models

​Complete Example

​Related

Build docs developers (and LLMs) love

Overview

Type Aliases

IEmbedder Protocol

Properties

Methods

get_embedding_nocache

get_embeddings_nocache

Example Implementation

CachingEmbeddingModel

Constructor

Properties

Methods

add_embedding

get_embedding_nocache

get_embeddings_nocache

get_embedding

get_embeddings

Example Usage

IEmbeddingModel Protocol

Normalization

Model Selection

OpenAI Models

Azure OpenAI

Other Providers

Test Embeddings

Environment Variables

Performance Tips

Complete Example

Related