Overview
Graphiti’s embedder architecture provides a unified interface for generating vector embeddings from text. All embedders extend the base EmbedderClient class and support both single and batch embedding generation.
Key Features
Unified Interface : Single API across OpenAI, Voyage, Gemini, and other providers
Batch Processing : Efficient batch embedding generation
Configurable Dimensions : Control embedding dimensionality
Provider Flexibility : Easy switching between embedding providers
Type Safety : Full type hints and Pydantic configuration
Base Client Architecture
All embedders inherit from EmbedderClient (defined in graphiti_core/embedder/client.py) which provides:
Core Methods
create()
Generate a single embedding vector.
input_data
str | list[str] | Iterable[int] | Iterable[Iterable[int]]
required
Input text or token sequence to embed
Returns : list[float] - Embedding vector of length embedding_dim
create_batch()
Generate embeddings for multiple inputs efficiently.
List of text strings to embed
Returns : list[list[float]] - List of embedding vectors
Configuration
All embedders use EmbedderConfig or provider-specific config classes:
Output embedding dimensionality. Can be set via EMBEDDING_DIM environment variable.
Available Embedders
OpenAI text-embedding-3-small/large models
Voyage AI voyage-3 and other Voyage models
Gemini Google’s text-embedding models
Azure OpenAI OpenAI embeddings via Azure
Basic Usage Pattern
All embedders follow this pattern:
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
# Initialize embedder
embedder = OpenAIEmbedder(
config = OpenAIEmbedderConfig(
api_key = "your-key" ,
embedding_model = "text-embedding-3-small" ,
embedding_dim = 1024
)
)
# Single embedding
vector = await embedder.create( "Hello, world!" )
print ( len (vector)) # 1024
# Batch embeddings
texts = [ "First text" , "Second text" , "Third text" ]
vectors = await embedder.create_batch(texts)
print ( len (vectors)) # 3
print ( len (vectors[ 0 ])) # 1024
Embedding Dimensions
Control output dimensionality across all providers:
Environment Variable (Global Default)
Configuration (Per-Instance)
config = OpenAIEmbedderConfig(
embedding_dim = 768 # Override global default
)
Dimension Truncation
Embedders truncate to the configured dimension:
# OpenAI text-embedding-3-small returns 1536 dims by default
config = OpenAIEmbedderConfig( embedding_dim = 1024 )
embedder = OpenAIEmbedder( config = config)
vector = await embedder.create( "text" )
print ( len (vector)) # 1024 (truncated from 1536)
Batch Processing
All embedders support efficient batch processing:
# Single API call for multiple texts
texts = [ f "Document { i } " for i in range ( 100 )]
vectors = await embedder.create_batch(texts)
# Equivalent to 100 individual calls, but much faster:
# vectors = [await embedder.create(text) for text in texts]
Batch Size Limits
Some providers have batch size limits:
OpenAI : No strict limit, but recommend < 2048 inputs
Voyage AI : Provider-specific limits
Gemini : Configurable batch size (default 100, some models limited to 1)
# For large datasets, chunk the input
from itertools import islice
def chunk_list ( lst , n ):
"""Yield successive n-sized chunks from lst."""
for i in range ( 0 , len (lst), n):
yield lst[i:i + n]
all_vectors = []
for chunk in chunk_list(large_text_list, 100 ):
vectors = await embedder.create_batch(chunk)
all_vectors.extend(vectors)
Provider Comparison
Provider Default Model Default Dims Native Dims Batch Support OpenAI text-embedding-3-small 1024 1536 Yes Voyage AI voyage-3 1024 1024 Yes Gemini text-embedding-001 1024 768 Yes (limited) Azure OpenAI text-embedding-3-small 1024 1536 Yes
Use with Graphiti
Embedders are typically passed to the Graphiti client:
from graphiti_core import Graphiti
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
embedder = OpenAIEmbedder(
config = OpenAIEmbedderConfig(
api_key = "your-key" ,
embedding_dim = 1024
)
)
graphiti = Graphiti(
uri = "neo4j://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
embedder = embedder # Pass embedder instance
)
# Graphiti uses embedder internally for:
# - Node embedding generation
# - Edge embedding generation
# - Similarity search
# - Community detection
Custom Embedder Implementation
Create custom embedders by extending EmbedderClient:
from graphiti_core.embedder.client import EmbedderClient, EmbedderConfig
from collections.abc import Iterable
class CustomEmbedder ( EmbedderClient ):
def __init__ ( self , config : EmbedderConfig | None = None ):
if config is None :
config = EmbedderConfig()
self .config = config
# Initialize your embedding model/service
async def create (
self , input_data : str | list[ str ] | Iterable[ int ] | Iterable[Iterable[ int ]]
) -> list[ float ]:
# Generate embedding
embedding = self ._your_embedding_logic(input_data)
# Truncate to configured dimension
return embedding[: self .config.embedding_dim]
async def create_batch ( self , input_data_list : list[ str ]) -> list[list[ float ]]:
# Generate embeddings for batch
embeddings = self ._your_batch_logic(input_data_list)
# Truncate each to configured dimension
return [emb[: self .config.embedding_dim] for emb in embeddings]
Use batch processing : Always prefer create_batch() for multiple inputs
Set appropriate dimensions : Lower dimensions = faster similarity search
Consider costs : Different providers have different pricing
Cache embeddings : Store embeddings in your graph database
Monitor API limits : Implement rate limiting for large batches
Error Handling
All embedders may raise exceptions:
try :
vector = await embedder.create( "text" )
except Exception as e:
# Handle API errors, rate limits, etc.
print ( f "Embedding failed: { e } " )
Common errors:
Authentication errors : Invalid API key
Rate limit errors : Too many requests
Input validation errors : Empty or invalid input
Network errors : Connection issues
Type Support
Embedders accept multiple input types:
# String input (most common)
vector = await embedder.create( "Hello, world!" )
# List of strings (for some providers)
vector = await embedder.create([ "token1" , "token2" , "token3" ])
# Token IDs (for some providers)
vector = await embedder.create([ 101 , 2023 , 2003 , 102 ])
# Nested token IDs
vector = await embedder.create([[ 101 , 2023 ], [ 2003 , 102 ]])
Not all providers support all input types. Consult provider-specific documentation.