Skip to main content

Overview

The AzureOpenAILLMClient provides integration with OpenAI models hosted on Azure, supporting both the native Azure OpenAI SDK and OpenAI’s v1 API compatibility endpoint.

Installation

pip install graphiti-core
The OpenAI SDK (which includes Azure support) is included by default.

Basic Usage

from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncAzureOpenAI
from pydantic import BaseModel

# Create Azure OpenAI client
azure_client = AsyncAzureOpenAI(
    api_key="your-azure-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"
)

# Initialize Graphiti client
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(
        model="gpt-4o",  # Your Azure deployment name
        temperature=1.0
    ),
    max_tokens=16384
)

# Define response structure
class Analysis(BaseModel):
    summary: str
    sentiment: str
    key_points: list[str]

# Generate structured response
from graphiti_core.prompts.models import Message

messages = [
    Message(role="system", content="Analyze the following text."),
    Message(role="user", content="Product review text...")
]

response = await client.generate_response(
    messages=messages,
    response_model=Analysis
)

Constructor

azure_client
AsyncAzureOpenAI | AsyncOpenAI
required
Pre-configured Azure OpenAI client. Must be either:
  • AsyncAzureOpenAI for native Azure SDK
  • AsyncOpenAI with Azure v1 API endpoint
config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
max_tokens
int
default:"16384"
Maximum output tokens for responses.
reasoning
str | None
default:"None"
Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'
verbosity
str | None
default:"None"
Verbosity level for reasoning models. Options: 'low', 'medium', 'high'
Caching is not supported. The cache parameter in the base class is always False.

Azure SDK Setup

from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig

azure_client = AsyncAzureOpenAI(
    api_key="your-azure-api-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"
)

client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="gpt-4o-deployment")  # Your deployment name
)

Option 2: AsyncOpenAI with Azure v1 Endpoint

from openai import AsyncOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig

# Using Azure's OpenAI v1 compatibility endpoint
openai_client = AsyncOpenAI(
    api_key="your-azure-api-key",
    base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment"
)

client = AzureOpenAILLMClient(
    azure_client=openai_client,
    config=LLMConfig(model="gpt-4o")
)

Supported Models

All OpenAI models available on Azure are supported: Reasoning Models (via responses.parse):
  • gpt-5-* deployments
  • o1-* deployments
  • o3-* deployments
Standard Models (via chat.completions or beta.chat.completions.parse):
  • gpt-4o deployments
  • gpt-4-turbo deployments
  • gpt-4 deployments
  • gpt-3.5-turbo deployments
Use your Azure deployment name as the model parameter, not the base model name.

Structured Output Handling

The client automatically selects the appropriate API based on model type:

Reasoning Models (GPT-5, o1, o3)

Uses responses.parse API:
response = await client.responses.parse(
    model="gpt-5-deployment",
    input=messages,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': 'minimal'},
    text={'verbosity': 'low'}
)

Standard Models (GPT-4o, etc.)

Uses beta.chat.completions.parse API:
response = await client.beta.chat.completions.parse(
    model="gpt-4o-deployment",
    messages=messages,
    max_tokens=max_tokens,
    temperature=temperature,
    response_format=response_model  # Structured output
)

Response Parsing

The client handles different response formats:

ParsedChatCompletion (Standard Models)

# From beta.chat.completions.parse
if hasattr(message, 'parsed') and message.parsed:
    return message.parsed.model_dump()  # Already a Pydantic model
elif hasattr(message, 'refusal') and message.refusal:
    raise RefusalError(message.refusal)

Responses.parse (Reasoning Models)

# From responses.parse
if hasattr(response, 'output_text'):
    return json.loads(response.output_text)
elif hasattr(response, 'refusal') and response.refusal:
    raise RefusalError(response.refusal)

Reasoning Model Configuration

For GPT-5 and o-series deployments:
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="o1-deployment"),
    reasoning="high",      # More thorough reasoning
    verbosity="medium"     # Detailed output
)
Reasoning parameters:
  • reasoning: 'minimal', 'low', 'medium', 'high'
  • verbosity: 'low', 'medium', 'high'
Reasoning models do not support temperature. The client automatically omits temperature for these models.

Error Handling

Refusals

from graphiti_core.llm_client.errors import RefusalError

try:
    response = await client.generate_response(messages=messages)
except RefusalError as e:
    print(f"Model refused to respond: {e}")
    # No retry - request was rejected

Rate Limits

from graphiti_core.llm_client.errors import RateLimitError

try:
    response = await client.generate_response(messages=messages)
except RateLimitError as e:
    print(f"Rate limited: {e}")
    # Implement exponential backoff

Automatic Retries

The client retries up to 2 times for:
  • Validation errors
  • JSON parsing errors
  • Transient API failures
Error context is appended for model self-correction:
error_context = (
    f'The previous response attempt was invalid. '
    f'Error type: {e.__class__.__name__}. '
    f'Please try again with a valid response.'
)
messages.append(Message(role='user', content=error_context))

Token Usage Tracking

Track token consumption across requests:
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="gpt-4o-deployment")
)

response = await client.generate_response(
    messages=messages,
    prompt_name="entity_extraction"
)

# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")

# By prompt name
usage = client.token_tracker.get_usage_by_prompt("entity_extraction")

Model Detection

The client automatically detects reasoning models:
@staticmethod
def _supports_reasoning_features(model: str) -> bool:
    """Return True when the Azure model supports reasoning/verbosity options."""
    reasoning_prefixes = ('o1', 'o3', 'gpt-5')
    return model.startswith(reasoning_prefixes)
Behavior changes for reasoning models:
  • Uses responses.parse instead of beta.chat.completions.parse
  • Omits temperature parameter
  • Includes reasoning and verbosity options

Example: Complete Integration

import os
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig, ModelSize
from graphiti_core.prompts.models import Message
from pydantic import BaseModel

# Setup Azure client
azure_client = AsyncAzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# Create Graphiti client
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(
        model="gpt-4o",  # Your deployment name
        small_model="gpt-4o-mini",  # Smaller deployment
        temperature=0.7
    ),
    max_tokens=8192
)

# Define schema
class ExtractedEntities(BaseModel):
    people: list[str]
    organizations: list[str]
    locations: list[str]

# Extract entities
messages = [
    Message(
        role="system",
        content="Extract named entities from the text."
    ),
    Message(
        role="user",
        content="Apple CEO Tim Cook announced a new facility in Cupertino."
    )
]

result = await client.generate_response(
    messages=messages,
    response_model=ExtractedEntities,
    prompt_name="entity_extraction"
)

print(result)
# {
#   'people': ['Tim Cook'],
#   'organizations': ['Apple'],
#   'locations': ['Cupertino']
# }

# Check token usage
usage = client.token_tracker.get_usage()
print(f"Tokens used: {usage['total_tokens']}")

Performance Tips

  1. Use appropriate deployment sizes: Deploy both large and small models
  2. Set reasonable max_tokens: Azure charges per token
  3. Monitor quotas: Azure has deployment-specific rate limits
  4. Use model_size parameter: Let Graphiti choose optimal deployment
# Automatic deployment selection
response = await client.generate_response(
    messages=messages,
    model_size=ModelSize.small  # Uses small_model deployment
)

Differences from OpenAIClient

FeatureOpenAIClientAzureOpenAILLMClient
Client typeAsyncOpenAIAsyncAzureOpenAI or AsyncOpenAI
Model parameterBase model nameAzure deployment name
API versionLatestConfigurable
Endpointapi.openai.comAzure resource endpoint
CachingNot implementedNot supported
Structured outputsresponses.parseresponses.parse + beta.parse

Troubleshooting

Authentication Errors

# Ensure API key and endpoint are correct
azure_client = AsyncAzureOpenAI(
    api_key="your-key",  # From Azure portal
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"  # Full URL
)

Deployment Not Found

# Use deployment name, not base model
config = LLMConfig(
    model="my-gpt4o-deployment"  # Your custom deployment name
)

Rate Limiting

# Azure has per-deployment quotas
# Check Azure portal for:
# - Tokens per minute (TPM)
# - Requests per minute (RPM)
# Implement backoff or use multiple deployments

Build docs developers (and LLMs) love