Overview
The VoyageAIEmbedder provides embeddings using Voyage AI’s embedding models, including voyage-3 which offers state-of-the-art performance for retrieval tasks.
Installation
pip install graphiti-core[voyageai]
Basic Usage
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig
# Initialize embedder
embedder = VoyageAIEmbedder(
config=VoyageAIEmbedderConfig(
api_key="your-voyage-api-key",
embedding_model="voyage-3",
embedding_dim=1024
)
)
# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector)) # 1024
# Batch embeddings
texts = [
"First document",
"Second document",
"Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors)) # 3
Configuration
VoyageAIEmbedderConfig
Voyage AI model to use. Options:
voyage-3 (default, latest model)
voyage-2
voyage-large-2
voyage-code-2
voyage-lite-02-instruct
Output embedding dimensionality. Truncates native dimensions to this size.
Voyage AI API key. If not provided, uses VOYAGE_API_KEY environment variable.
Constructor
config
VoyageAIEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config with voyage-3 model.
Supported Models
voyage-3 (Recommended)
- Native dimensions: 1024
- Context length: 32K tokens
- Best for: General purpose, state-of-the-art performance
config = VoyageAIEmbedderConfig(
embedding_model="voyage-3",
embedding_dim=1024
)
voyage-2
- Native dimensions: 1024
- Context length: 16K tokens
- Best for: Backwards compatibility
voyage-large-2
- Native dimensions: 1536
- Context length: 16K tokens
- Best for: Maximum quality
config = VoyageAIEmbedderConfig(
embedding_model="voyage-large-2",
embedding_dim=1024 # Truncate from 1536
)
voyage-code-2
- Native dimensions: 1536
- Context length: 16K tokens
- Best for: Code search and retrieval
config = VoyageAIEmbedderConfig(
embedding_model="voyage-code-2",
embedding_dim=1024
)
voyage-lite-02-instruct
- Native dimensions: 1024
- Context length: 4K tokens
- Best for: Fast, lightweight tasks
Methods
create()
Generate a single embedding vector.
vector = await embedder.create("Your text here")
Parameters:
input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector
Special handling:
- Converts non-string inputs to string
- Filters out empty strings
- Returns empty list if no valid input
create_batch()
Generate embeddings for multiple texts.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
Parameters:
input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors
The Voyage embedder has special input handling:
# String input (standard)
vector = await embedder.create("Hello, world!")
# List of strings (joined)
input_list = ["Hello", "world"]
vector = await embedder.create(input_list)
# Processes each string separately, returns first
# Non-string iterables (converted to string)
vector = await embedder.create([1, 2, 3, 4])
# Converts each to string: ["1", "2", "3", "4"]
# Empty input handling
vector = await embedder.create("")
print(vector) # []
vector = await embedder.create([])
print(vector) # []
Implementation:
if isinstance(input_data, str):
input_list = [input_data]
elif isinstance(input_data, list):
input_list = [str(i) for i in input_data if i]
else:
input_list = [str(i) for i in input_data if i is not None]
input_list = [i for i in input_list if i]
if len(input_list) == 0:
return []
Dimension Truncation
Voyage embeddings are truncated to embedding_dim:
# voyage-large-2 returns 1536 dimensions
embedder = VoyageAIEmbedder(
config=VoyageAIEmbedderConfig(
embedding_model="voyage-large-2",
embedding_dim=768 # Truncate to 768
)
)
vector = await embedder.create("text")
print(len(vector)) # 768 (truncated from 1536)
Implementation:
return [float(x) for x in result.embeddings[0][:self.config.embedding_dim]]
Batch Processing
Voyage AI efficiently processes batches:
# Batch embedding
texts = [f"Document {i}" for i in range(100)]
vectors = await embedder.create_batch(texts)
# All vectors truncated to embedding_dim
for vector in vectors:
print(len(vector)) # 1024 (or configured dim)
Implementation:
result = await self.client.embed(input_data_list, model=self.config.embedding_model)
return [
[float(x) for x in embedding[:self.config.embedding_dim]]
for embedding in result.embeddings
]
Error Handling
try:
vector = await embedder.create("text")
except Exception as e:
# Handle API errors
print(f"Embedding failed: {e}")
Common errors:
- Authentication error: Invalid API key
- Rate limit error: Too many requests
- Input validation error: Invalid input format
- Network error: Connection issues
Example: Semantic Search
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Initialize embedder
embedder = VoyageAIEmbedder(
config=VoyageAIEmbedderConfig(
api_key="your-key",
embedding_model="voyage-3",
embedding_dim=1024
)
)
# Corpus of documents
documents = [
"Python is a high-level programming language.",
"JavaScript is used for web development.",
"Machine learning models learn from data.",
"Neural networks are inspired by biological neurons.",
"Databases store and manage structured data."
]
# Generate embeddings
doc_vectors = await embedder.create_batch(documents)
# Query
query = "What is a programming language?"
query_vector = await embedder.create(query)
# Compute similarities
similarities = cosine_similarity([query_vector], doc_vectors)[0]
# Find most relevant documents
top_indices = np.argsort(similarities)[::-1][:3]
print(f"Query: {query}\n")
for idx in top_indices:
print(f"Similarity: {similarities[idx]:.4f}")
print(f"Document: {documents[idx]}\n")
Example: Code Search
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig
# Use voyage-code-2 for code embeddings
embedder = VoyageAIEmbedder(
config=VoyageAIEmbedderConfig(
api_key="your-key",
embedding_model="voyage-code-2",
embedding_dim=1024
)
)
# Code snippets
code_snippets = [
"def factorial(n): return 1 if n <= 1 else n * factorial(n-1)",
"function fibonacci(n) { return n <= 1 ? n : fibonacci(n-1) + fibonacci(n-2); }",
"class BinaryTree { constructor(value) { this.value = value; } }",
"async function fetchData(url) { const response = await fetch(url); return response.json(); }"
]
# Embed code
code_vectors = await embedder.create_batch(code_snippets)
# Search query
query = "recursive function implementation"
query_vector = await embedder.create(query)
# Find most similar code
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([query_vector], code_vectors)[0]
best_match_idx = np.argmax(similarities)
print(f"Best match: {code_snippets[best_match_idx]}")
Use with Graphiti
from graphiti_core import Graphiti
from graphiti_core.embedder import VoyageAIEmbedder
from graphiti_core.embedder.voyage import VoyageAIEmbedderConfig
embedder = VoyageAIEmbedder(
config=VoyageAIEmbedderConfig(
api_key="your-voyage-key",
embedding_model="voyage-3",
embedding_dim=1024
)
)
graphiti = Graphiti(
uri="neo4j://localhost:7687",
user="neo4j",
password="password",
embedder=embedder
)
# Voyage embeddings used for all graph operations
await graphiti.add_episode(
name="episode1",
episode_body="Your text here...",
source_description="source1"
)
- Use voyage-3 for general tasks: Best performance/speed balance
- Use voyage-code-2 for code: Optimized for code similarity
- Use voyage-lite for speed: Faster but lower quality
- Batch requests: Always use
create_batch() for multiple inputs
- Set appropriate dimensions: Lower dims = faster search
Model Comparison
| Model | Dims | Context | Best For |
|---|
| voyage-3 | 1024 | 32K | General purpose |
| voyage-2 | 1024 | 16K | Backwards compatibility |
| voyage-large-2 | 1536 | 16K | Maximum quality |
| voyage-code-2 | 1536 | 16K | Code search |
| voyage-lite-02-instruct | 1024 | 4K | Speed |
API Key Setup
Get your Voyage AI API key from https://www.voyageai.com:
# Set environment variable
export VOYAGE_API_KEY="your-key"
# Or pass directly in config
config = VoyageAIEmbedderConfig(
api_key="your-key",
embedding_model="voyage-3"
)