Overview
Graphiti’s LLM client architecture provides a unified interface for interacting with multiple language model providers. All clients extend the baseLLMClient class and support structured output generation using Pydantic models.
Key Features
- Unified Interface: Single API across OpenAI, Anthropic, Gemini, and Azure OpenAI
- Structured Output: Automatic JSON schema generation from Pydantic models
- Automatic Retries: Built-in retry logic for transient failures
- Token Tracking: Monitor input/output token usage across providers
- Response Caching: Optional caching of LLM responses
- Tracing Support: OpenTelemetry-compatible tracing
- Multilingual Support: Automatic language preservation in extractions
Base Client Architecture
All LLM clients inherit fromLLMClient (defined in graphiti_core/llm_client/client.py) which provides:
Core Methods
generate_response()
Generate a structured response from the language model.
List of message objects with
role and content fieldsOptional Pydantic model for structured output validation
Maximum tokens to generate (uses config default if not specified)
Size of model to use (
small or medium)Optional partition identifier for the graph
Optional name for tracing and token tracking
dict[str, Any] - Parsed response matching the response_model schema
Configuration
Configuration object with the following fields:
api_key(str | None): Authentication key for the LLM APImodel(str | None): Primary model namesmall_model(str | None): Model for simpler promptsbase_url(str | None): Custom API endpointtemperature(float): Sampling temperature (default: 1.0)max_tokens(int): Maximum output tokens (default: 16384)
Enable response caching (stored in
./llm_cache)Available Clients
OpenAI
GPT-4.1, GPT-5, and compatible models
Anthropic
Claude 3.7, 4.5, and Haiku models
Gemini
Gemini 2.5, 3.0 Flash, and Pro models
Azure OpenAI
OpenAI models via Azure endpoint
Error Handling
All clients implement consistent error handling:RateLimitError: Rate limit exceeded (no retry)RefusalError: Model refused to respond (no retry)- Transient Errors: Automatic retry with exponential backoff (max 4 attempts)
- Validation Errors: Retry with error context for models to self-correct
Token Usage Tracking
All clients track token usage through thetoken_tracker attribute:
Tracing
Set a custom tracer for observability:generate_response() calls will create spans with attributes:
llm.provider: Provider namemodel.size: Model size usedmax_tokens: Token limitcache.hit: Whether response was cachedprompt.name: Custom prompt identifier
Model Size Selection
Graphiti uses two-tier model sizing:ModelSize.medium: Primary model for complex reasoning (default)ModelSize.small: Faster, cheaper model for simple tasks
LLMConfig:
Input Sanitization
All clients automatically clean input text:- Removes invalid Unicode characters
- Strips zero-width characters (
\u200b,\u200c,\u200d,\ufeff,\u2060) - Filters control characters (except newlines, tabs, carriage returns)