Skip to main content

Overview

The VoyageAIEmbedder provides embeddings using Voyage AI’s embedding models, including voyage-3 which offers state-of-the-art performance for retrieval tasks.

Installation

pip install graphiti-core[voyageai]

Basic Usage

from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig

# Initialize embedder
embedder = VoyageAIEmbedder(
    config=VoyageAIEmbedderConfig(
        api_key="your-voyage-api-key",
        embedding_model="voyage-3",
        embedding_dim=1024
    )
)

# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector))  # 1024

# Batch embeddings
texts = [
    "First document",
    "Second document",
    "Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors))  # 3

Configuration

VoyageAIEmbedderConfig

embedding_model
str
default:"'voyage-3'"
Voyage AI model to use. Options:
  • voyage-3 (default, latest model)
  • voyage-2
  • voyage-large-2
  • voyage-code-2
  • voyage-lite-02-instruct
embedding_dim
int
default:"1024"
Output embedding dimensionality. Truncates native dimensions to this size.
api_key
str | None
default:"None"
Voyage AI API key. If not provided, uses VOYAGE_API_KEY environment variable.

Constructor

config
VoyageAIEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config with voyage-3 model.

Supported Models

  • Native dimensions: 1024
  • Context length: 32K tokens
  • Best for: General purpose, state-of-the-art performance
config = VoyageAIEmbedderConfig(
    embedding_model="voyage-3",
    embedding_dim=1024
)

voyage-2

  • Native dimensions: 1024
  • Context length: 16K tokens
  • Best for: Backwards compatibility

voyage-large-2

  • Native dimensions: 1536
  • Context length: 16K tokens
  • Best for: Maximum quality
config = VoyageAIEmbedderConfig(
    embedding_model="voyage-large-2",
    embedding_dim=1024  # Truncate from 1536
)

voyage-code-2

  • Native dimensions: 1536
  • Context length: 16K tokens
  • Best for: Code search and retrieval
config = VoyageAIEmbedderConfig(
    embedding_model="voyage-code-2",
    embedding_dim=1024
)

voyage-lite-02-instruct

  • Native dimensions: 1024
  • Context length: 4K tokens
  • Best for: Fast, lightweight tasks

Methods

create()

Generate a single embedding vector.
vector = await embedder.create("Your text here")
Parameters:
  • input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector Special handling:
  • Converts non-string inputs to string
  • Filters out empty strings
  • Returns empty list if no valid input

create_batch()

Generate embeddings for multiple texts.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
Parameters:
  • input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors

Input Handling

The Voyage embedder has special input handling:
# String input (standard)
vector = await embedder.create("Hello, world!")

# List of strings (joined)
input_list = ["Hello", "world"]
vector = await embedder.create(input_list)
# Processes each string separately, returns first

# Non-string iterables (converted to string)
vector = await embedder.create([1, 2, 3, 4])
# Converts each to string: ["1", "2", "3", "4"]

# Empty input handling
vector = await embedder.create("")
print(vector)  # []

vector = await embedder.create([])  
print(vector)  # []
Implementation:
if isinstance(input_data, str):
    input_list = [input_data]
elif isinstance(input_data, list):
    input_list = [str(i) for i in input_data if i]
else:
    input_list = [str(i) for i in input_data if i is not None]

input_list = [i for i in input_list if i]
if len(input_list) == 0:
    return []

Dimension Truncation

Voyage embeddings are truncated to embedding_dim:
# voyage-large-2 returns 1536 dimensions
embedder = VoyageAIEmbedder(
    config=VoyageAIEmbedderConfig(
        embedding_model="voyage-large-2",
        embedding_dim=768  # Truncate to 768
    )
)

vector = await embedder.create("text")
print(len(vector))  # 768 (truncated from 1536)
Implementation:
return [float(x) for x in result.embeddings[0][:self.config.embedding_dim]]

Batch Processing

Voyage AI efficiently processes batches:
# Batch embedding
texts = [f"Document {i}" for i in range(100)]
vectors = await embedder.create_batch(texts)

# All vectors truncated to embedding_dim
for vector in vectors:
    print(len(vector))  # 1024 (or configured dim)
Implementation:
result = await self.client.embed(input_data_list, model=self.config.embedding_model)
return [
    [float(x) for x in embedding[:self.config.embedding_dim]]
    for embedding in result.embeddings
]

Error Handling

try:
    vector = await embedder.create("text")
except Exception as e:
    # Handle API errors
    print(f"Embedding failed: {e}")
Common errors:
  • Authentication error: Invalid API key
  • Rate limit error: Too many requests
  • Input validation error: Invalid input format
  • Network error: Connection issues
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Initialize embedder
embedder = VoyageAIEmbedder(
    config=VoyageAIEmbedderConfig(
        api_key="your-key",
        embedding_model="voyage-3",
        embedding_dim=1024
    )
)

# Corpus of documents
documents = [
    "Python is a high-level programming language.",
    "JavaScript is used for web development.",
    "Machine learning models learn from data.",
    "Neural networks are inspired by biological neurons.",
    "Databases store and manage structured data."
]

# Generate embeddings
doc_vectors = await embedder.create_batch(documents)

# Query
query = "What is a programming language?"
query_vector = await embedder.create(query)

# Compute similarities
similarities = cosine_similarity([query_vector], doc_vectors)[0]

# Find most relevant documents
top_indices = np.argsort(similarities)[::-1][:3]

print(f"Query: {query}\n")
for idx in top_indices:
    print(f"Similarity: {similarities[idx]:.4f}")
    print(f"Document: {documents[idx]}\n")
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig

# Use voyage-code-2 for code embeddings
embedder = VoyageAIEmbedder(
    config=VoyageAIEmbedderConfig(
        api_key="your-key",
        embedding_model="voyage-code-2",
        embedding_dim=1024
    )
)

# Code snippets
code_snippets = [
    "def factorial(n): return 1 if n <= 1 else n * factorial(n-1)",
    "function fibonacci(n) { return n <= 1 ? n : fibonacci(n-1) + fibonacci(n-2); }",
    "class BinaryTree { constructor(value) { this.value = value; } }",
    "async function fetchData(url) { const response = await fetch(url); return response.json(); }"
]

# Embed code
code_vectors = await embedder.create_batch(code_snippets)

# Search query
query = "recursive function implementation"
query_vector = await embedder.create(query)

# Find most similar code
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_vector], code_vectors)[0]
best_match_idx = np.argmax(similarities)

print(f"Best match: {code_snippets[best_match_idx]}")

Use with Graphiti

from graphiti_core import Graphiti
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig

embedder = VoyageAIEmbedder(
    config=VoyageAIEmbedderConfig(
        api_key="your-voyage-key",
        embedding_model="voyage-3",
        embedding_dim=1024
    )
)

graphiti = Graphiti(
    uri="neo4j://localhost:7687",
    user="neo4j",
    password="password",
    embedder=embedder
)

# Voyage embeddings used for all graph operations
await graphiti.add_episode(
    name="episode1",
    episode_body="Your text here...",
    source_description="source1"
)

Performance Tips

  1. Use voyage-3 for general tasks: Best performance/speed balance
  2. Use voyage-code-2 for code: Optimized for code similarity
  3. Use voyage-lite for speed: Faster but lower quality
  4. Batch requests: Always use create_batch() for multiple inputs
  5. Set appropriate dimensions: Lower dims = faster search

Model Comparison

ModelDimsContextBest For
voyage-3102432KGeneral purpose
voyage-2102416KBackwards compatibility
voyage-large-2153616KMaximum quality
voyage-code-2153616KCode search
voyage-lite-02-instruct10244KSpeed

API Key Setup

Get your Voyage AI API key from https://www.voyageai.com:
# Set environment variable
export VOYAGE_API_KEY="your-key"
# Or pass directly in config
config = VoyageAIEmbedderConfig(
    api_key="your-key",
    embedding_model="voyage-3"
)

Build docs developers (and LLMs) love