Skip to main content
Graphiti uses large language models (LLMs) to extract entities, relationships, and generate summaries from your content. Configure your preferred LLM provider to power these operations.

Supported Providers

OpenAI

GPT-4o, GPT-4o-mini, GPT-5, and more

Azure OpenAI

Enterprise OpenAI models on Azure

Anthropic

Claude 3.5 Sonnet, Claude 3 Opus, and more

Google Gemini

Gemini Pro and Gemini Flash

Groq

Fast inference with Llama, Mixtral, and more

Default Provider (OpenAI)

By default, Graphiti uses OpenAI’s GPT-4o-mini:
from graphiti_core import Graphiti
import os

# Set your API key
os.environ["OPENAI_API_KEY"] = "sk-..."

# Uses OpenAI by default
graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password"
)

OpenAI Configuration

Basic Setup

from graphiti_core import Graphiti
from graphiti_core.llm_client import OpenAIClient, LLMConfig

# Configure OpenAI client
llm_config = LLMConfig(
    api_key="sk-...",
    model="gpt-4o",
    small_model="gpt-4o-mini",
    temperature=0.7,
    max_tokens=4096
)

llm_client = OpenAIClient(config=llm_config)

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

Configuration Options

ParameterTypeDefaultDescription
api_keystrFrom envOpenAI API key
modelstr"gpt-4o-mini"Primary model for extraction
small_modelstr"gpt-4o-nano"Model for smaller tasks
temperaturefloat0.0Sampling temperature (0-2)
max_tokensint4096Maximum tokens per request
base_urlstrNoneCustom API endpoint

Environment Variables

OPENAI_API_KEY=sk-...
  • gpt-4o - Best quality for complex extraction
  • gpt-4o-mini - Balanced performance and cost
  • gpt-5-mini - Fast extraction with good quality
  • o1-mini - Reasoning model for complex relationships

Azure OpenAI

Use OpenAI models deployed on Azure:
from graphiti_core import Graphiti
from graphiti_core.llm_client.azure_openai_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncOpenAI

# Create Azure OpenAI client
azure_client = AsyncOpenAI(
    base_url="https://your-resource.openai.azure.com/openai/v1/",
    api_key="your-azure-api-key"
)

# Configure LLM client
llm_client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(
        model="gpt-4.1",  # Your Azure deployment name
        small_model="gpt-4.1"
    )
)

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

Environment Variables

AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_DEPLOYMENT=gpt-4.1

Anthropic (Claude)

Use Claude models for extraction:
from graphiti_core import Graphiti
from graphiti_core.llm_client import AnthropicClient, LLMConfig

llm_config = LLMConfig(
    api_key="sk-ant-...",
    model="claude-3-5-sonnet-20241022",
    small_model="claude-3-5-haiku-20241022",
    temperature=0.0,
    max_tokens=4096
)

llm_client = AnthropicClient(config=llm_config)

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...
  • claude-3-5-sonnet-20241022 - Best quality and reasoning
  • claude-3-5-haiku-20241022 - Fast and cost-effective
  • claude-3-opus-20240229 - Maximum capability

Google Gemini

Use Google’s Gemini models:
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient
from graphiti_core.llm_client.config import LLMConfig

llm_config = LLMConfig(
    api_key="your-google-api-key",
    model="gemini-1.5-pro",
    small_model="gemini-1.5-flash",
    temperature=0.0,
    max_tokens=4096
)

llm_client = GeminiClient(config=llm_config)

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

Environment Variables

GOOGLE_API_KEY=your-key
  • gemini-1.5-pro - Best quality
  • gemini-1.5-flash - Fast inference
  • gemini-2.0-flash - Latest fast model

Groq

Use Groq for ultra-fast inference:
from graphiti_core import Graphiti
from graphiti_core.llm_client.groq_client import GroqClient
from graphiti_core.llm_client.config import LLMConfig

llm_config = LLMConfig(
    api_key="gsk_...",
    model="llama-3.3-70b-versatile",
    small_model="llama-3.1-8b-instant",
    temperature=0.0,
    max_tokens=4096
)

llm_client = GroqClient(config=llm_config)

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

Environment Variables

GROQ_API_KEY=gsk_...
  • llama-3.3-70b-versatile - Best Llama model
  • llama-3.1-8b-instant - Fast inference
  • mixtral-8x7b-32768 - Good for long contexts

Custom Base URLs

Use custom endpoints for OpenAI-compatible APIs:
from graphiti_core.llm_client import OpenAIClient, LLMConfig

llm_config = LLMConfig(
    api_key="your-key",
    model="custom-model-name",
    base_url="https://api.your-provider.com/v1"
)

llm_client = OpenAIClient(config=llm_config)
This works with:
  • OpenRouter
  • Together AI
  • Local LLM servers (Ollama, vLLM, etc.)
  • Any OpenAI-compatible API

Token Tracking

Graphiti tracks token usage across all LLM calls:
# Add some episodes
await graphiti.add_episode(...)
await graphiti.add_episode(...)

# Get token usage summary
graphiti.token_tracker.print_summary(sort_by='prompt_name')

# Or access programmatically
usage = graphiti.token_tracker.get_total_usage()
print(f"Total tokens: {usage['total_tokens']}")
print(f"Total cost: ${usage['total_cost']:.4f}")

# Get usage by prompt type
by_prompt = graphiti.token_tracker.get_usage()
for prompt_name, stats in by_prompt.items():
    print(f"{prompt_name}: {stats['total_tokens']} tokens")

# Reset tracking
graphiti.token_tracker.reset()

Model Selection Strategy

Graphiti uses two model types:
Used for:
  • Entity extraction
  • Relationship extraction
  • Complex reasoning
Recommended:
  • OpenAI: gpt-4o
  • Anthropic: claude-3-5-sonnet-20241022
  • Gemini: gemini-1.5-pro

Cost Optimization

Use Small Models

Set small_model to a cost-effective option like gpt-4o-mini or claude-3-5-haiku-20241022

Batch Episodes

Use add_episode_bulk() to process multiple episodes efficiently

Lower Temperature

Use temperature=0.0 for deterministic, focused outputs

Track Usage

Monitor token usage with token_tracker to identify optimization opportunities

Reasoning Models

For OpenAI’s reasoning models (o1, o3 series), configure reasoning effort:
from graphiti_core.llm_client import OpenAIClient, LLMConfig

llm_client = OpenAIClient(
    config=LLMConfig(
        model="o1-mini",
        small_model="gpt-4o-mini"
    ),
    reasoning="medium",  # low, medium, high
    verbosity="concise"  # concise, standard, detailed
)
Reasoning models don’t support the temperature parameter. It’s automatically set to None.

Caching

Enable LLM response caching to reduce costs and latency:
from graphiti_core.llm_client import OpenAIClient, LLMConfig

llm_client = OpenAIClient(
    config=LLMConfig(model="gpt-4o-mini"),
    cache=True  # Enable caching
)
Caching stores responses in memory. Use with caution in production environments.

Error Handling

try:
    result = await graphiti.add_episode(
        name="Test",
        episode_body="Content",
        source=EpisodeType.text,
        source_description="Test",
        reference_time=datetime.now(timezone.utc)
    )
except Exception as e:
    print(f"LLM error: {e}")
    # Handle rate limits, API errors, etc.

Next Steps

Embeddings

Configure embedding providers for semantic search

Graph Drivers

Choose and configure your graph database

Adding Episodes

Start adding content to your knowledge graph

Build docs developers (and LLMs) love