Overview
The OpenAIEmbedder provides embeddings using OpenAI’s text-embedding models, including the latest text-embedding-3-small and text-embedding-3-large models.
Installation
pip install graphiti-core
The OpenAI SDK is included by default.
Basic Usage
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
# Initialize embedder
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
api_key="sk-...",
embedding_model="text-embedding-3-small",
embedding_dim=1024
)
)
# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector)) # 1024
# Batch embeddings
texts = [
"First document",
"Second document",
"Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors)) # 3
print(len(vectors[0])) # 1024
Configuration
OpenAIEmbedderConfig
embedding_model
str
default:"'text-embedding-3-small'"
OpenAI embedding model to use. Options:
text-embedding-3-small (default, 1536 dims)
text-embedding-3-large (3072 dims)
text-embedding-ada-002 (legacy, 1536 dims)
Output embedding dimensionality. Truncates native dimensions to this size.
OpenAI API key. If not provided, uses OPENAI_API_KEY environment variable.
Custom API endpoint for OpenAI-compatible services.
Constructor
config
OpenAIEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config.
client
AsyncOpenAI | AsyncAzureOpenAI | None
default:"None"
Optional pre-configured client. If not provided, creates AsyncOpenAI from config.
Supports both AsyncOpenAI and AsyncAzureOpenAI instances.
Supported Models
text-embedding-3-small (Recommended)
- Native dimensions: 1536
- Cost: $0.02 / 1M tokens
- Best for: General purpose, cost-effective
config = OpenAIEmbedderConfig(
embedding_model="text-embedding-3-small",
embedding_dim=1024 # Truncate to 1024
)
text-embedding-3-large
- Native dimensions: 3072
- Cost: $0.13 / 1M tokens
- Best for: High-quality embeddings, better performance
config = OpenAIEmbedderConfig(
embedding_model="text-embedding-3-large",
embedding_dim=1024 # Truncate to 1024
)
text-embedding-ada-002 (Legacy)
- Native dimensions: 1536
- Cost: $0.10 / 1M tokens
- Best for: Backwards compatibility
config = OpenAIEmbedderConfig(
embedding_model="text-embedding-ada-002",
embedding_dim=1024
)
Methods
create()
Generate a single embedding vector.
vector = await embedder.create("Your text here")
print(len(vector)) # embedding_dim
Parameters:
input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector
create_batch()
Generate embeddings for multiple texts in a single API call.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
print(len(vectors)) # 3
Parameters:
input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors
Dimension Truncation
OpenAI models return embeddings in their native dimensions, which are truncated to embedding_dim:
# text-embedding-3-small returns 1536 dimensions
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
embedding_model="text-embedding-3-small",
embedding_dim=512 # Truncate to 512
)
)
vector = await embedder.create("text")
print(len(vector)) # 512 (truncated from 1536)
Implementation:
return result.data[0].embedding[:self.config.embedding_dim]
Using with Azure OpenAI
The embedder supports Azure OpenAI through the client parameter:
from openai import AsyncAzureOpenAI
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
# Create Azure client
azure_client = AsyncAzureOpenAI(
api_key="your-azure-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com"
)
# Create embedder with Azure client
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
embedding_model="text-embedding-3-small", # Your deployment name
embedding_dim=1024
),
client=azure_client
)
vector = await embedder.create("text")
For Azure, use your deployment name as the embedding_model, not the base model name.
Alternatively, use the dedicated AzureOpenAIEmbedderClient (see Azure OpenAI Embedder).
Custom Base URL
Use OpenAI-compatible embedding services:
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
api_key="your-key",
base_url="https://api.your-provider.com/v1",
embedding_model="custom-model"
)
)
Batch Processing Best Practices
Optimal Batch Sizes
OpenAI recommends batching for efficiency:
# Good: Single API call for 100 texts
texts = [f"Document {i}" for i in range(100)]
vectors = await embedder.create_batch(texts)
# Less efficient: 100 individual API calls
vectors = [await embedder.create(text) for text in texts]
Large Dataset Processing
For very large datasets, chunk the input:
def chunk_list(lst, n):
"""Yield successive n-sized chunks."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
all_vectors = []
for chunk in chunk_list(large_text_list, 2048):
vectors = await embedder.create_batch(chunk)
all_vectors.extend(vectors)
With Progress Tracking
from tqdm.asyncio import tqdm
async def embed_with_progress(texts, batch_size=2048):
chunks = list(chunk_list(texts, batch_size))
all_vectors = []
for chunk in tqdm(chunks, desc="Embedding"):
vectors = await embedder.create_batch(chunk)
all_vectors.extend(vectors)
return all_vectors
vectors = await embed_with_progress(large_text_list)
The embedder accepts various input formats:
# String (most common)
vector = await embedder.create("Hello, world!")
# List of strings (converted to single string)
vector = await embedder.create(["Hello", "world"])
# Token IDs (for advanced use)
vector = await embedder.create([15496, 11, 995, 0]) # "Hello, world!"
# Nested iterables
vector = await embedder.create([[15496, 11], [995, 0]])
Error Handling
import openai
try:
vector = await embedder.create("text")
except openai.AuthenticationError:
print("Invalid API key")
except openai.RateLimitError:
print("Rate limit exceeded")
except openai.APIError as e:
print(f"OpenAI API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Example: Document Embedding
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
import numpy as np
# Initialize embedder
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
api_key="sk-...",
embedding_model="text-embedding-3-small",
embedding_dim=1024
)
)
# Prepare documents
documents = [
"Machine learning is a subset of artificial intelligence.",
"Deep learning uses neural networks with multiple layers.",
"Natural language processing enables computers to understand text.",
"Computer vision allows machines to interpret visual information."
]
# Generate embeddings
vectors = await embedder.create_batch(documents)
# Convert to numpy for similarity computation
vectors_np = np.array(vectors)
# Compute cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_matrix = cosine_similarity(vectors_np)
print(similarity_matrix)
# Find most similar to first document
query_vector = vectors[0]
similarities = cosine_similarity([query_vector], vectors_np)[0]
most_similar_idx = np.argsort(similarities)[::-1][1] # Skip self
print(f"Most similar to '{documents[0]}':")
print(f" -> '{documents[most_similar_idx]}'")
| Model | Dims | Tokens/sec | Cost/1M tokens | Quality |
|---|
| text-embedding-3-small | 1536 | ~1M | $0.02 | Good |
| text-embedding-3-large | 3072 | ~500K | $0.13 | Excellent |
| text-embedding-ada-002 | 1536 | ~800K | $0.10 | Good |
Use with Graphiti
from graphiti_core import Graphiti
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
embedder = OpenAIEmbedder(
config=OpenAIEmbedderConfig(
api_key="sk-...",
embedding_dim=1024
)
)
graphiti = Graphiti(
uri="neo4j://localhost:7687",
user="neo4j",
password="password",
embedder=embedder
)
# Embeddings are generated automatically for:
# - Entity nodes
# - Relationship edges
# - Community summaries
await graphiti.add_episode(
name="episode1",
episode_body="Your text here...",
source_description="source1"
)