Overview
The AzureOpenAILLMClient provides integration with OpenAI models hosted on Azure, supporting both the native Azure OpenAI SDK and OpenAI’s v1 API compatibility endpoint.
Installation
pip install graphiti-core
The OpenAI SDK (which includes Azure support) is included by default.
Basic Usage
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncAzureOpenAI
from pydantic import BaseModel
# Create Azure OpenAI client
azure_client = AsyncAzureOpenAI(
api_key="your-azure-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com"
)
# Initialize Graphiti client
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(
model="gpt-4o", # Your Azure deployment name
temperature=1.0
),
max_tokens=16384
)
# Define response structure
class Analysis(BaseModel):
summary: str
sentiment: str
key_points: list[str]
# Generate structured response
from graphiti_core.prompts.models import Message
messages = [
Message(role="system", content="Analyze the following text."),
Message(role="user", content="Product review text...")
]
response = await client.generate_response(
messages=messages,
response_model=Analysis
)
Constructor
azure_client
AsyncAzureOpenAI | AsyncOpenAI
required
Pre-configured Azure OpenAI client. Must be either:
AsyncAzureOpenAI for native Azure SDK
AsyncOpenAI with Azure v1 API endpoint
config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
Maximum output tokens for responses.
Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'
Verbosity level for reasoning models. Options: 'low', 'medium', 'high'
Caching is not supported. The cache parameter in the base class is always False.
Azure SDK Setup
Option 1: AsyncAzureOpenAI (Recommended)
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
azure_client = AsyncAzureOpenAI(
api_key="your-azure-api-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com"
)
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="gpt-4o-deployment") # Your deployment name
)
Option 2: AsyncOpenAI with Azure v1 Endpoint
from openai import AsyncOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
# Using Azure's OpenAI v1 compatibility endpoint
openai_client = AsyncOpenAI(
api_key="your-azure-api-key",
base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment"
)
client = AzureOpenAILLMClient(
azure_client=openai_client,
config=LLMConfig(model="gpt-4o")
)
Supported Models
All OpenAI models available on Azure are supported:
Reasoning Models (via responses.parse):
gpt-5-* deployments
o1-* deployments
o3-* deployments
Standard Models (via chat.completions or beta.chat.completions.parse):
gpt-4o deployments
gpt-4-turbo deployments
gpt-4 deployments
gpt-3.5-turbo deployments
Use your Azure deployment name as the model parameter, not the base model name.
Structured Output Handling
The client automatically selects the appropriate API based on model type:
Reasoning Models (GPT-5, o1, o3)
Uses responses.parse API:
response = await client.responses.parse(
model="gpt-5-deployment",
input=messages,
max_output_tokens=max_tokens,
text_format=response_model,
reasoning={'effort': 'minimal'},
text={'verbosity': 'low'}
)
Standard Models (GPT-4o, etc.)
Uses beta.chat.completions.parse API:
response = await client.beta.chat.completions.parse(
model="gpt-4o-deployment",
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
response_format=response_model # Structured output
)
Response Parsing
The client handles different response formats:
ParsedChatCompletion (Standard Models)
# From beta.chat.completions.parse
if hasattr(message, 'parsed') and message.parsed:
return message.parsed.model_dump() # Already a Pydantic model
elif hasattr(message, 'refusal') and message.refusal:
raise RefusalError(message.refusal)
Responses.parse (Reasoning Models)
# From responses.parse
if hasattr(response, 'output_text'):
return json.loads(response.output_text)
elif hasattr(response, 'refusal') and response.refusal:
raise RefusalError(response.refusal)
Reasoning Model Configuration
For GPT-5 and o-series deployments:
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="o1-deployment"),
reasoning="high", # More thorough reasoning
verbosity="medium" # Detailed output
)
Reasoning parameters:
reasoning: 'minimal', 'low', 'medium', 'high'
verbosity: 'low', 'medium', 'high'
Reasoning models do not support temperature. The client automatically omits temperature for these models.
Error Handling
Refusals
from graphiti_core.llm_client.errors import RefusalError
try:
response = await client.generate_response(messages=messages)
except RefusalError as e:
print(f"Model refused to respond: {e}")
# No retry - request was rejected
Rate Limits
from graphiti_core.llm_client.errors import RateLimitError
try:
response = await client.generate_response(messages=messages)
except RateLimitError as e:
print(f"Rate limited: {e}")
# Implement exponential backoff
Automatic Retries
The client retries up to 2 times for:
- Validation errors
- JSON parsing errors
- Transient API failures
Error context is appended for model self-correction:
error_context = (
f'The previous response attempt was invalid. '
f'Error type: {e.__class__.__name__}. '
f'Please try again with a valid response.'
)
messages.append(Message(role='user', content=error_context))
Token Usage Tracking
Track token consumption across requests:
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="gpt-4o-deployment")
)
response = await client.generate_response(
messages=messages,
prompt_name="entity_extraction"
)
# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")
# By prompt name
usage = client.token_tracker.get_usage_by_prompt("entity_extraction")
Model Detection
The client automatically detects reasoning models:
@staticmethod
def _supports_reasoning_features(model: str) -> bool:
"""Return True when the Azure model supports reasoning/verbosity options."""
reasoning_prefixes = ('o1', 'o3', 'gpt-5')
return model.startswith(reasoning_prefixes)
Behavior changes for reasoning models:
- Uses
responses.parse instead of beta.chat.completions.parse
- Omits
temperature parameter
- Includes
reasoning and verbosity options
Example: Complete Integration
import os
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig, ModelSize
from graphiti_core.prompts.models import Message
from pydantic import BaseModel
# Setup Azure client
azure_client = AsyncAzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-02-15-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Create Graphiti client
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(
model="gpt-4o", # Your deployment name
small_model="gpt-4o-mini", # Smaller deployment
temperature=0.7
),
max_tokens=8192
)
# Define schema
class ExtractedEntities(BaseModel):
people: list[str]
organizations: list[str]
locations: list[str]
# Extract entities
messages = [
Message(
role="system",
content="Extract named entities from the text."
),
Message(
role="user",
content="Apple CEO Tim Cook announced a new facility in Cupertino."
)
]
result = await client.generate_response(
messages=messages,
response_model=ExtractedEntities,
prompt_name="entity_extraction"
)
print(result)
# {
# 'people': ['Tim Cook'],
# 'organizations': ['Apple'],
# 'locations': ['Cupertino']
# }
# Check token usage
usage = client.token_tracker.get_usage()
print(f"Tokens used: {usage['total_tokens']}")
- Use appropriate deployment sizes: Deploy both large and small models
- Set reasonable max_tokens: Azure charges per token
- Monitor quotas: Azure has deployment-specific rate limits
- Use model_size parameter: Let Graphiti choose optimal deployment
# Automatic deployment selection
response = await client.generate_response(
messages=messages,
model_size=ModelSize.small # Uses small_model deployment
)
Differences from OpenAIClient
| Feature | OpenAIClient | AzureOpenAILLMClient |
|---|
| Client type | AsyncOpenAI | AsyncAzureOpenAI or AsyncOpenAI |
| Model parameter | Base model name | Azure deployment name |
| API version | Latest | Configurable |
| Endpoint | api.openai.com | Azure resource endpoint |
| Caching | Not implemented | Not supported |
| Structured outputs | responses.parse | responses.parse + beta.parse |
Troubleshooting
Authentication Errors
# Ensure API key and endpoint are correct
azure_client = AsyncAzureOpenAI(
api_key="your-key", # From Azure portal
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com" # Full URL
)
Deployment Not Found
# Use deployment name, not base model
config = LLMConfig(
model="my-gpt4o-deployment" # Your custom deployment name
)
Rate Limiting
# Azure has per-deployment quotas
# Check Azure portal for:
# - Tokens per minute (TPM)
# - Requests per minute (RPM)
# Implement backoff or use multiple deployments