LLM Client Architecture

Overview

Graphiti’s LLM client architecture provides a unified interface for interacting with multiple language model providers. All clients extend the base LLMClient class and support structured output generation using Pydantic models.

Key Features

Unified Interface: Single API across OpenAI, Anthropic, Gemini, and Azure OpenAI
Structured Output: Automatic JSON schema generation from Pydantic models
Automatic Retries: Built-in retry logic for transient failures
Token Tracking: Monitor input/output token usage across providers
Response Caching: Optional caching of LLM responses
Tracing Support: OpenTelemetry-compatible tracing
Multilingual Support: Automatic language preservation in extractions

Base Client Architecture

All LLM clients inherit from LLMClient (defined in graphiti_core/llm_client/client.py) which provides:

Core Methods

generate_response() Generate a structured response from the language model.

messages

list[Message]

required

List of message objects with role and content fields

response_model

type[BaseModel] | None

Optional Pydantic model for structured output validation

max_tokens

int | None

Maximum tokens to generate (uses config default if not specified)

model_size

ModelSize

default:"ModelSize.medium"

Size of model to use (small or medium)

group_id

str | None

Optional partition identifier for the graph

prompt_name

str | None

Optional name for tracing and token tracking

Returns: dict[str, Any] - Parsed response matching the response_model schema

Configuration

config

LLMConfig

Configuration object with the following fields:

api_key (str | None): Authentication key for the LLM API
model (str | None): Primary model name
small_model (str | None): Model for simpler prompts
base_url (str | None): Custom API endpoint
temperature (float): Sampling temperature (default: 1.0)
max_tokens (int): Maximum output tokens (default: 16384)

cache

bool

default:"false"

Enable response caching (stored in ./llm_cache)

Available Clients

OpenAI

GPT-4.1, GPT-5, and compatible models

Anthropic

Claude 3.7, 4.5, and Haiku models

Gemini

Gemini 2.5, 3.0 Flash, and Pro models

Azure OpenAI

OpenAI models via Azure endpoint

Error Handling

All clients implement consistent error handling:

RateLimitError: Rate limit exceeded (no retry)
RefusalError: Model refused to respond (no retry)
Transient Errors: Automatic retry with exponential backoff (max 4 attempts)
Validation Errors: Retry with error context for models to self-correct

Token Usage Tracking

All clients track token usage through the token_tracker attribute:

from graphiti_core.llm_client import OpenAIClient
from graphiti_core.llm_client.config import LLMConfig

client = OpenAIClient(config=LLMConfig(api_key="sk-..."))

# After making requests
usage = client.token_tracker.get_usage()
print(f"Total tokens: {usage['total_tokens']}")
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")

Tracing

Set a custom tracer for observability:

from graphiti_core.tracer import OpenTelemetryTracer

client = OpenAIClient()
client.set_tracer(OpenTelemetryTracer())

All generate_response() calls will create spans with attributes:

llm.provider: Provider name
model.size: Model size used
max_tokens: Token limit
cache.hit: Whether response was cached
prompt.name: Custom prompt identifier

Model Size Selection

Graphiti uses two-tier model sizing:

ModelSize.medium: Primary model for complex reasoning (default)
ModelSize.small: Faster, cheaper model for simple tasks

Configure both in LLMConfig:

config = LLMConfig(
    model="gpt-4.1-mini",        # Medium model
    small_model="gpt-4.1-nano"   # Small model
)

Input Sanitization

All clients automatically clean input text:

Removes invalid Unicode characters
Strips zero-width characters (\u200b, \u200c, \u200d, \ufeff, \u2060)
Filters control characters (except newlines, tabs, carriage returns)

This ensures reliable processing across all providers.

Core

Data Models

Drivers

LLM Clients

Embedders

LLM Client Architecture

Overview

Key Features

Base Client Architecture

Core Methods

Configuration

Available Clients

OpenAI

Anthropic

Gemini

Azure OpenAI

Error Handling

Token Usage Tracking

Tracing

Model Size Selection

Input Sanitization

Build docs developers (and LLMs) love

Core

Data Models

Drivers

LLM Clients

Embedders

​Overview

​Key Features

​Base Client Architecture

​Core Methods

​Configuration

​Available Clients

OpenAI

Anthropic

Gemini

Azure OpenAI

​Error Handling

​Token Usage Tracking

​Tracing

​Model Size Selection

​Input Sanitization

Build docs developers (and LLMs) love

Overview

Key Features

Base Client Architecture

Core Methods

Configuration

Available Clients

Error Handling

Token Usage Tracking

Tracing

Model Size Selection

Input Sanitization