Skip to main content

OpenAIClient

The primary client for OpenAI’s GPT models with support for structured outputs using the responses.parse API.

Installation

pip install graphiti-core
The OpenAI SDK is included by default.

Basic Usage

from graphiti_core.llm_client import OpenAIClient
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.prompts.models import Message
from pydantic import BaseModel

# Initialize client
client = OpenAIClient(
    config=LLMConfig(
        api_key="sk-...",
        model="gpt-4.1-mini",
        temperature=1.0,
        max_tokens=16384
    )
)

# Define response structure
class ExtractedInfo(BaseModel):
    name: str
    age: int
    occupation: str

# Generate structured response
messages = [
    Message(role="system", content="Extract person information from text."),
    Message(role="user", content="John is a 30 year old software engineer.")
]

response = await client.generate_response(
    messages=messages,
    response_model=ExtractedInfo
)

print(response)  # {'name': 'John', 'age': 30, 'occupation': 'software engineer'}

Constructor

config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
cache
bool
default:"False"
Enable response caching (not currently implemented, raises NotImplementedError if True)
client
Any | None
default:"None"
Optional pre-configured AsyncOpenAI client instance. If not provided, creates one from config.
max_tokens
int
default:"16384"
Maximum output tokens. Defaults to 16384 for compatibility.
reasoning
str
default:"'minimal'"
Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'
verbosity
str
default:"'low'"
Verbosity level for reasoning models. Options: 'low', 'medium', 'high'

Supported Models

Reasoning Models (via responses.parse API):
  • gpt-5-* series
  • o1-* series
  • o3-* series
Standard Models (via chat.completions.create):
  • gpt-4.1-mini (recommended)
  • gpt-4.1-nano
  • gpt-4o
  • gpt-4-turbo
  • All other GPT models
Reasoning models (GPT-5, o1, o3) do not support temperature settings. The client automatically omits temperature for these models.

Reasoning Model Configuration

For GPT-5 and o-series models, configure reasoning depth:
client = OpenAIClient(
    config=LLMConfig(
        api_key="sk-...",
        model="gpt-5-preview"
    ),
    reasoning="high",      # More thorough reasoning
    verbosity="medium"     # Detailed output
)

Custom Base URL

Use OpenAI-compatible endpoints:
client = OpenAIClient(
    config=LLMConfig(
        api_key="your-key",
        base_url="https://api.your-provider.com/v1"
    )
)

Response Format

The client uses different APIs based on model capabilities: Reasoning Models (responses.parse):
response = await client.responses.parse(
    model="gpt-5-preview",
    input=messages,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': 'minimal'},
    text={'verbosity': 'low'}
)
Standard Models (chat.completions.create):
response = await client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages,
    temperature=1.0,
    max_tokens=max_tokens,
    response_format={'type': 'json_object'}
)

OpenAIGenericClient

A simplified OpenAI client designed for local and third-party OpenAI-compatible models. Does not support caching or the responses.parse API.

When to Use

  • Local models (e.g., Ollama, LM Studio)
  • Third-party OpenAI-compatible APIs
  • Models with higher token limits
  • Simpler integration requirements

Basic Usage

from graphiti_core.llm_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig

# For local Ollama instance
client = OpenAIGenericClient(
    config=LLMConfig(
        base_url="http://localhost:11434/v1",
        model="llama3",
        api_key="not-needed"  # Ollama doesn't require key
    ),
    max_tokens=32000  # Higher limit for local models
)

Constructor

config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
cache
bool
default:"False"
Caching is not supported. Raises NotImplementedError if True.
client
Any | None
default:"None"
Optional pre-configured AsyncOpenAI client instance.
max_tokens
int
default:"16384"
Maximum output tokens. Default increased to 16384 for better local model compatibility.

Key Differences from OpenAIClient

FeatureOpenAIClientOpenAIGenericClient
CachingSupported (not implemented)Not supported
responses.parse APIYes (reasoning models)No
Structured outputsVia responses.parseVia json_schema
Max retries2 (configurable)2 (fixed)
Default max_tokens1638416384
Reasoning/verbosityYesNo

Structured Output Handling

Uses json_schema in response format:
response_format = {
    'type': 'json_schema',
    'json_schema': {
        'name': 'structured_response',
        'schema': response_model.model_json_schema()
    }
}

Error Handling

Implements custom retry logic:
  • Max 2 retries on validation/parsing errors
  • No retry for rate limits or refusals
  • Automatic retry for OpenAI client errors (timeout, connection, server errors)
  • Appends error context to messages for model self-correction

Example: Local Model

from graphiti_core.llm_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig
from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    key_points: list[str]

client = OpenAIGenericClient(
    config=LLMConfig(
        base_url="http://localhost:11434/v1",
        model="llama3:70b"
    ),
    max_tokens=8192
)

messages = [
    Message(role="system", content="Summarize the following text."),
    Message(role="user", content="Long article text...")
]

summary = await client.generate_response(
    messages=messages,
    response_model=Summary
)

Compatibility Notes

  • Works with any OpenAI-compatible API
  • Does not use provider-specific features
  • JSON schema support required for structured outputs
  • Temperature and max_tokens always included in requests

Build docs developers (and LLMs) love