Skip to main content

Overview

The OpenAIEmbedder provides embeddings using OpenAI’s text-embedding models, including the latest text-embedding-3-small and text-embedding-3-large models.

Installation

pip install graphiti-core
The OpenAI SDK is included by default.

Basic Usage

from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig

# Initialize embedder
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        api_key="sk-...",
        embedding_model="text-embedding-3-small",
        embedding_dim=1024
    )
)

# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector))  # 1024

# Batch embeddings
texts = [
    "First document",
    "Second document",
    "Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors))  # 3
print(len(vectors[0]))  # 1024

Configuration

OpenAIEmbedderConfig

embedding_model
str
default:"'text-embedding-3-small'"
OpenAI embedding model to use. Options:
  • text-embedding-3-small (default, 1536 dims)
  • text-embedding-3-large (3072 dims)
  • text-embedding-ada-002 (legacy, 1536 dims)
embedding_dim
int
default:"1024"
Output embedding dimensionality. Truncates native dimensions to this size.
api_key
str | None
default:"None"
OpenAI API key. If not provided, uses OPENAI_API_KEY environment variable.
base_url
str | None
default:"None"
Custom API endpoint for OpenAI-compatible services.

Constructor

config
OpenAIEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config.
client
AsyncOpenAI | AsyncAzureOpenAI | None
default:"None"
Optional pre-configured client. If not provided, creates AsyncOpenAI from config. Supports both AsyncOpenAI and AsyncAzureOpenAI instances.

Supported Models

  • Native dimensions: 1536
  • Cost: $0.02 / 1M tokens
  • Best for: General purpose, cost-effective
config = OpenAIEmbedderConfig(
    embedding_model="text-embedding-3-small",
    embedding_dim=1024  # Truncate to 1024
)

text-embedding-3-large

  • Native dimensions: 3072
  • Cost: $0.13 / 1M tokens
  • Best for: High-quality embeddings, better performance
config = OpenAIEmbedderConfig(
    embedding_model="text-embedding-3-large",
    embedding_dim=1024  # Truncate to 1024
)

text-embedding-ada-002 (Legacy)

  • Native dimensions: 1536
  • Cost: $0.10 / 1M tokens
  • Best for: Backwards compatibility
config = OpenAIEmbedderConfig(
    embedding_model="text-embedding-ada-002",
    embedding_dim=1024
)

Methods

create()

Generate a single embedding vector.
vector = await embedder.create("Your text here")
print(len(vector))  # embedding_dim
Parameters:
  • input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector

create_batch()

Generate embeddings for multiple texts in a single API call.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
print(len(vectors))  # 3
Parameters:
  • input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors

Dimension Truncation

OpenAI models return embeddings in their native dimensions, which are truncated to embedding_dim:
# text-embedding-3-small returns 1536 dimensions
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        embedding_model="text-embedding-3-small",
        embedding_dim=512  # Truncate to 512
    )
)

vector = await embedder.create("text")
print(len(vector))  # 512 (truncated from 1536)
Implementation:
return result.data[0].embedding[:self.config.embedding_dim]

Using with Azure OpenAI

The embedder supports Azure OpenAI through the client parameter:
from openai import AsyncAzureOpenAI
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig

# Create Azure client
azure_client = AsyncAzureOpenAI(
    api_key="your-azure-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"
)

# Create embedder with Azure client
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        embedding_model="text-embedding-3-small",  # Your deployment name
        embedding_dim=1024
    ),
    client=azure_client
)

vector = await embedder.create("text")
For Azure, use your deployment name as the embedding_model, not the base model name.
Alternatively, use the dedicated AzureOpenAIEmbedderClient (see Azure OpenAI Embedder).

Custom Base URL

Use OpenAI-compatible embedding services:
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        api_key="your-key",
        base_url="https://api.your-provider.com/v1",
        embedding_model="custom-model"
    )
)

Batch Processing Best Practices

Optimal Batch Sizes

OpenAI recommends batching for efficiency:
# Good: Single API call for 100 texts
texts = [f"Document {i}" for i in range(100)]
vectors = await embedder.create_batch(texts)

# Less efficient: 100 individual API calls
vectors = [await embedder.create(text) for text in texts]

Large Dataset Processing

For very large datasets, chunk the input:
def chunk_list(lst, n):
    """Yield successive n-sized chunks."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

all_vectors = []
for chunk in chunk_list(large_text_list, 2048):
    vectors = await embedder.create_batch(chunk)
    all_vectors.extend(vectors)

With Progress Tracking

from tqdm.asyncio import tqdm

async def embed_with_progress(texts, batch_size=2048):
    chunks = list(chunk_list(texts, batch_size))
    all_vectors = []
    
    for chunk in tqdm(chunks, desc="Embedding"):
        vectors = await embedder.create_batch(chunk)
        all_vectors.extend(vectors)
    
    return all_vectors

vectors = await embed_with_progress(large_text_list)

Input Types

The embedder accepts various input formats:
# String (most common)
vector = await embedder.create("Hello, world!")

# List of strings (converted to single string)
vector = await embedder.create(["Hello", "world"])

# Token IDs (for advanced use)
vector = await embedder.create([15496, 11, 995, 0])  # "Hello, world!"

# Nested iterables
vector = await embedder.create([[15496, 11], [995, 0]])

Error Handling

import openai

try:
    vector = await embedder.create("text")
except openai.AuthenticationError:
    print("Invalid API key")
except openai.RateLimitError:
    print("Rate limit exceeded")
except openai.APIError as e:
    print(f"OpenAI API error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Example: Document Embedding

from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig
import numpy as np

# Initialize embedder
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        api_key="sk-...",
        embedding_model="text-embedding-3-small",
        embedding_dim=1024
    )
)

# Prepare documents
documents = [
    "Machine learning is a subset of artificial intelligence.",
    "Deep learning uses neural networks with multiple layers.",
    "Natural language processing enables computers to understand text.",
    "Computer vision allows machines to interpret visual information."
]

# Generate embeddings
vectors = await embedder.create_batch(documents)

# Convert to numpy for similarity computation
vectors_np = np.array(vectors)

# Compute cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

similarity_matrix = cosine_similarity(vectors_np)
print(similarity_matrix)

# Find most similar to first document
query_vector = vectors[0]
similarities = cosine_similarity([query_vector], vectors_np)[0]
most_similar_idx = np.argsort(similarities)[::-1][1]  # Skip self
print(f"Most similar to '{documents[0]}':")
print(f"  -> '{documents[most_similar_idx]}'")

Performance Comparison

ModelDimsTokens/secCost/1M tokensQuality
text-embedding-3-small1536~1M$0.02Good
text-embedding-3-large3072~500K$0.13Excellent
text-embedding-ada-0021536~800K$0.10Good

Use with Graphiti

from graphiti_core import Graphiti
from graphiti_core.embedder import OpenAIEmbedder
from graphiti_core.embedder.openai import OpenAIEmbedderConfig

embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        api_key="sk-...",
        embedding_dim=1024
    )
)

graphiti = Graphiti(
    uri="neo4j://localhost:7687",
    user="neo4j",
    password="password",
    embedder=embedder
)

# Embeddings are generated automatically for:
# - Entity nodes
# - Relationship edges
# - Community summaries
await graphiti.add_episode(
    name="episode1",
    episode_body="Your text here...",
    source_description="source1"
)

Build docs developers (and LLMs) love