Graphiti uses large language models (LLMs) to extract entities, relationships, and generate summaries from your content. Configure your preferred LLM provider to power these operations.
Supported Providers
OpenAI GPT-4o, GPT-4o-mini, GPT-5, and more
Azure OpenAI Enterprise OpenAI models on Azure
Anthropic Claude 3.5 Sonnet, Claude 3 Opus, and more
Google Gemini Gemini Pro and Gemini Flash
Groq Fast inference with Llama, Mixtral, and more
Default Provider (OpenAI)
By default, Graphiti uses OpenAI’s GPT-4o-mini:
from graphiti_core import Graphiti
import os
# Set your API key
os.environ[ "OPENAI_API_KEY" ] = "sk-..."
# Uses OpenAI by default
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password"
)
OpenAI Configuration
Basic Setup
from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig
# Configure OpenAI client
llm_config = LLMConfig(
api_key = "sk-..." ,
model = "gpt-4o" ,
small_model = "gpt-4o-mini" ,
temperature = 0.7 ,
max_tokens = 4096
)
llm_client = OpenAIClient( config = llm_config)
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
llm_client = llm_client
)
Configuration Options
Parameter Type Default Description api_keystrFrom env OpenAI API key modelstr"gpt-4o-mini"Primary model for extraction small_modelstr"gpt-4o-nano"Model for smaller tasks temperaturefloat0.0Sampling temperature (0-2) max_tokensint4096Maximum tokens per request base_urlstrNoneCustom API endpoint
Environment Variables
Recommended Models
gpt-4o - Best quality for complex extraction
gpt-4o-mini - Balanced performance and cost
gpt-5-mini - Fast extraction with good quality
o1-mini - Reasoning model for complex relationships
Azure OpenAI
Use OpenAI models deployed on Azure:
from graphiti_core import Graphiti
from graphiti_core.llm_client.azure_openai_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncOpenAI
# Create Azure OpenAI client
azure_client = AsyncOpenAI(
base_url = "https://your-resource.openai.azure.com/openai/v1/" ,
api_key = "your-azure-api-key"
)
# Configure LLM client
llm_client = AzureOpenAILLMClient(
azure_client = azure_client,
config = LLMConfig(
model = "gpt-4.1" , # Your Azure deployment name
small_model = "gpt-4.1"
)
)
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
llm_client = llm_client
)
Environment Variables
AZURE_OPENAI_ENDPOINT = https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY = your-key
AZURE_OPENAI_DEPLOYMENT = gpt-4.1
Anthropic (Claude)
Use Claude models for extraction:
from graphiti_core import Graphiti
from graphiti_core.llm_client import AnthropicClient, LLMConfig
llm_config = LLMConfig(
api_key = "sk-ant-..." ,
model = "claude-3-5-sonnet-20241022" ,
small_model = "claude-3-5-haiku-20241022" ,
temperature = 0.0 ,
max_tokens = 4096
)
llm_client = AnthropicClient( config = llm_config)
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
llm_client = llm_client
)
Environment Variables
ANTHROPIC_API_KEY = sk-ant-...
Recommended Models
claude-3-5-sonnet-20241022 - Best quality and reasoning
claude-3-5-haiku-20241022 - Fast and cost-effective
claude-3-opus-20240229 - Maximum capability
Google Gemini
Use Google’s Gemini models:
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient
from graphiti_core.llm_client.config import LLMConfig
llm_config = LLMConfig(
api_key = "your-google-api-key" ,
model = "gemini-1.5-pro" ,
small_model = "gemini-1.5-flash" ,
temperature = 0.0 ,
max_tokens = 4096
)
llm_client = GeminiClient( config = llm_config)
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
llm_client = llm_client
)
Environment Variables
Recommended Models
gemini-1.5-pro - Best quality
gemini-1.5-flash - Fast inference
gemini-2.0-flash - Latest fast model
Groq
Use Groq for ultra-fast inference:
from graphiti_core import Graphiti
from graphiti_core.llm_client.groq_client import GroqClient
from graphiti_core.llm_client.config import LLMConfig
llm_config = LLMConfig(
api_key = "gsk_..." ,
model = "llama-3.3-70b-versatile" ,
small_model = "llama-3.1-8b-instant" ,
temperature = 0.0 ,
max_tokens = 4096
)
llm_client = GroqClient( config = llm_config)
graphiti = Graphiti(
uri = "bolt://localhost:7687" ,
user = "neo4j" ,
password = "password" ,
llm_client = llm_client
)
Environment Variables
Recommended Models
llama-3.3-70b-versatile - Best Llama model
llama-3.1-8b-instant - Fast inference
mixtral-8x7b-32768 - Good for long contexts
Custom Base URLs
Use custom endpoints for OpenAI-compatible APIs:
from graphiti_core.llm_client import OpenAIClient, LLMConfig
llm_config = LLMConfig(
api_key = "your-key" ,
model = "custom-model-name" ,
base_url = "https://api.your-provider.com/v1"
)
llm_client = OpenAIClient( config = llm_config)
This works with:
OpenRouter
Together AI
Local LLM servers (Ollama, vLLM, etc.)
Any OpenAI-compatible API
Token Tracking
Graphiti tracks token usage across all LLM calls:
# Add some episodes
await graphiti.add_episode( ... )
await graphiti.add_episode( ... )
# Get token usage summary
graphiti.token_tracker.print_summary( sort_by = 'prompt_name' )
# Or access programmatically
usage = graphiti.token_tracker.get_total_usage()
print ( f "Total tokens: { usage[ 'total_tokens' ] } " )
print ( f "Total cost: $ { usage[ 'total_cost' ] :.4f} " )
# Get usage by prompt type
by_prompt = graphiti.token_tracker.get_usage()
for prompt_name, stats in by_prompt.items():
print ( f " { prompt_name } : { stats[ 'total_tokens' ] } tokens" )
# Reset tracking
graphiti.token_tracker.reset()
Model Selection Strategy
Graphiti uses two model types:
Primary Model
Small Model
Used for:
Entity extraction
Relationship extraction
Complex reasoning
Recommended:
OpenAI: gpt-4o
Anthropic: claude-3-5-sonnet-20241022
Gemini: gemini-1.5-pro
Used for:
Simple classifications
Quick summaries
Lightweight tasks
Recommended:
OpenAI: gpt-4o-mini
Anthropic: claude-3-5-haiku-20241022
Gemini: gemini-1.5-flash
Cost Optimization
Use Small Models Set small_model to a cost-effective option like gpt-4o-mini or claude-3-5-haiku-20241022
Batch Episodes Use add_episode_bulk() to process multiple episodes efficiently
Lower Temperature Use temperature=0.0 for deterministic, focused outputs
Track Usage Monitor token usage with token_tracker to identify optimization opportunities
Reasoning Models
For OpenAI’s reasoning models (o1, o3 series), configure reasoning effort:
from graphiti_core.llm_client import OpenAIClient, LLMConfig
llm_client = OpenAIClient(
config = LLMConfig(
model = "o1-mini" ,
small_model = "gpt-4o-mini"
),
reasoning = "medium" , # low, medium, high
verbosity = "concise" # concise, standard, detailed
)
Reasoning models don’t support the temperature parameter. It’s automatically set to None.
Caching
Enable LLM response caching to reduce costs and latency:
from graphiti_core.llm_client import OpenAIClient, LLMConfig
llm_client = OpenAIClient(
config = LLMConfig( model = "gpt-4o-mini" ),
cache = True # Enable caching
)
Caching stores responses in memory. Use with caution in production environments.
Error Handling
try :
result = await graphiti.add_episode(
name = "Test" ,
episode_body = "Content" ,
source = EpisodeType.text,
source_description = "Test" ,
reference_time = datetime.now(timezone.utc)
)
except Exception as e:
print ( f "LLM error: { e } " )
# Handle rate limits, API errors, etc.
Next Steps
Embeddings Configure embedding providers for semantic search
Graph Drivers Choose and configure your graph database
Adding Episodes Start adding content to your knowledge graph