Azure OpenAI Client

Overview

The AzureOpenAILLMClient provides integration with OpenAI models hosted on Azure, supporting both the native Azure OpenAI SDK and OpenAI’s v1 API compatibility endpoint.

Installation

pip install graphiti-core

The OpenAI SDK (which includes Azure support) is included by default.

Basic Usage

from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncAzureOpenAI
from pydantic import BaseModel

# Create Azure OpenAI client
azure_client = AsyncAzureOpenAI(
    api_key="your-azure-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"
)

# Initialize Graphiti client
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(
        model="gpt-4o",  # Your Azure deployment name
        temperature=1.0
    ),
    max_tokens=16384
)

# Define response structure
class Analysis(BaseModel):
    summary: str
    sentiment: str
    key_points: list[str]

# Generate structured response
from graphiti_core.prompts.models import Message

messages = [
    Message(role="system", content="Analyze the following text."),
    Message(role="user", content="Product review text...")
]

response = await client.generate_response(
    messages=messages,
    response_model=Analysis
)

Constructor

azure_client

AsyncAzureOpenAI | AsyncOpenAI

required

Pre-configured Azure OpenAI client. Must be either:

AsyncAzureOpenAI for native Azure SDK
AsyncOpenAI with Azure v1 API endpoint

config

LLMConfig | None

default:"None"

Configuration object. If None, creates default config.

max_tokens

int

default:"16384"

Maximum output tokens for responses.

reasoning

str | None

default:"None"

Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'

verbosity

str | None

default:"None"

Verbosity level for reasoning models. Options: 'low', 'medium', 'high'

Caching is not supported. The cache parameter in the base class is always False.

Azure SDK Setup

Option 1: AsyncAzureOpenAI (Recommended)

from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig

azure_client = AsyncAzureOpenAI(
    api_key="your-azure-api-key",
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"
)

client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="gpt-4o-deployment")  # Your deployment name
)

Option 2: AsyncOpenAI with Azure v1 Endpoint

from openai import AsyncOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig

# Using Azure's OpenAI v1 compatibility endpoint
openai_client = AsyncOpenAI(
    api_key="your-azure-api-key",
    base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment"
)

client = AzureOpenAILLMClient(
    azure_client=openai_client,
    config=LLMConfig(model="gpt-4o")
)

Supported Models

All OpenAI models available on Azure are supported: Reasoning Models (via responses.parse):

gpt-5-* deployments
o1-* deployments
o3-* deployments

Standard Models (via chat.completions or beta.chat.completions.parse):

gpt-4o deployments
gpt-4-turbo deployments
gpt-4 deployments
gpt-3.5-turbo deployments

Use your Azure deployment name as the model parameter, not the base model name.

Structured Output Handling

The client automatically selects the appropriate API based on model type:

Reasoning Models (GPT-5, o1, o3)

Uses responses.parse API:

response = await client.responses.parse(
    model="gpt-5-deployment",
    input=messages,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': 'minimal'},
    text={'verbosity': 'low'}
)

Standard Models (GPT-4o, etc.)

Uses beta.chat.completions.parse API:

response = await client.beta.chat.completions.parse(
    model="gpt-4o-deployment",
    messages=messages,
    max_tokens=max_tokens,
    temperature=temperature,
    response_format=response_model  # Structured output
)

Response Parsing

The client handles different response formats:

ParsedChatCompletion (Standard Models)

# From beta.chat.completions.parse
if hasattr(message, 'parsed') and message.parsed:
    return message.parsed.model_dump()  # Already a Pydantic model
elif hasattr(message, 'refusal') and message.refusal:
    raise RefusalError(message.refusal)

Responses.parse (Reasoning Models)

# From responses.parse
if hasattr(response, 'output_text'):
    return json.loads(response.output_text)
elif hasattr(response, 'refusal') and response.refusal:
    raise RefusalError(response.refusal)

Reasoning Model Configuration

For GPT-5 and o-series deployments:

client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="o1-deployment"),
    reasoning="high",      # More thorough reasoning
    verbosity="medium"     # Detailed output
)

Reasoning parameters:

reasoning: 'minimal', 'low', 'medium', 'high'
verbosity: 'low', 'medium', 'high'

Reasoning models do not support temperature. The client automatically omits temperature for these models.

Error Handling

Refusals

from graphiti_core.llm_client.errors import RefusalError

try:
    response = await client.generate_response(messages=messages)
except RefusalError as e:
    print(f"Model refused to respond: {e}")
    # No retry - request was rejected

Rate Limits

from graphiti_core.llm_client.errors import RateLimitError

try:
    response = await client.generate_response(messages=messages)
except RateLimitError as e:
    print(f"Rate limited: {e}")
    # Implement exponential backoff

Automatic Retries

The client retries up to 2 times for:

Validation errors
JSON parsing errors
Transient API failures

Error context is appended for model self-correction:

error_context = (
    f'The previous response attempt was invalid. '
    f'Error type: {e.__class__.__name__}. '
    f'Please try again with a valid response.'
)
messages.append(Message(role='user', content=error_context))

Token Usage Tracking

Track token consumption across requests:

client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(model="gpt-4o-deployment")
)

response = await client.generate_response(
    messages=messages,
    prompt_name="entity_extraction"
)

# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")

# By prompt name
usage = client.token_tracker.get_usage_by_prompt("entity_extraction")

Model Detection

The client automatically detects reasoning models:

@staticmethod
def _supports_reasoning_features(model: str) -> bool:
    """Return True when the Azure model supports reasoning/verbosity options."""
    reasoning_prefixes = ('o1', 'o3', 'gpt-5')
    return model.startswith(reasoning_prefixes)

Behavior changes for reasoning models:

Uses responses.parse instead of beta.chat.completions.parse
Omits temperature parameter
Includes reasoning and verbosity options

Example: Complete Integration

import os
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig, ModelSize
from graphiti_core.prompts.models import Message
from pydantic import BaseModel

# Setup Azure client
azure_client = AsyncAzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-02-15-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# Create Graphiti client
client = AzureOpenAILLMClient(
    azure_client=azure_client,
    config=LLMConfig(
        model="gpt-4o",  # Your deployment name
        small_model="gpt-4o-mini",  # Smaller deployment
        temperature=0.7
    ),
    max_tokens=8192
)

# Define schema
class ExtractedEntities(BaseModel):
    people: list[str]
    organizations: list[str]
    locations: list[str]

# Extract entities
messages = [
    Message(
        role="system",
        content="Extract named entities from the text."
    ),
    Message(
        role="user",
        content="Apple CEO Tim Cook announced a new facility in Cupertino."
    )
]

result = await client.generate_response(
    messages=messages,
    response_model=ExtractedEntities,
    prompt_name="entity_extraction"
)

print(result)
# {
#   'people': ['Tim Cook'],
#   'organizations': ['Apple'],
#   'locations': ['Cupertino']
# }

# Check token usage
usage = client.token_tracker.get_usage()
print(f"Tokens used: {usage['total_tokens']}")

Performance Tips

Use appropriate deployment sizes: Deploy both large and small models
Set reasonable max_tokens: Azure charges per token
Monitor quotas: Azure has deployment-specific rate limits
Use model_size parameter: Let Graphiti choose optimal deployment

# Automatic deployment selection
response = await client.generate_response(
    messages=messages,
    model_size=ModelSize.small  # Uses small_model deployment
)

Differences from OpenAIClient

Feature	OpenAIClient	AzureOpenAILLMClient
Client type	AsyncOpenAI	AsyncAzureOpenAI or AsyncOpenAI
Model parameter	Base model name	Azure deployment name
API version	Latest	Configurable
Endpoint	api.openai.com	Azure resource endpoint
Caching	Not implemented	Not supported
Structured outputs	responses.parse	responses.parse + beta.parse

Troubleshooting

Authentication Errors

# Ensure API key and endpoint are correct
azure_client = AsyncAzureOpenAI(
    api_key="your-key",  # From Azure portal
    api_version="2024-02-15-preview",
    azure_endpoint="https://your-resource.openai.azure.com"  # Full URL
)

Deployment Not Found

# Use deployment name, not base model
config = LLMConfig(
    model="my-gpt4o-deployment"  # Your custom deployment name
)

Rate Limiting

# Azure has per-deployment quotas
# Check Azure portal for:
# - Tokens per minute (TPM)
# - Requests per minute (RPM)
# Implement backoff or use multiple deployments

Core

Data Models

Drivers

LLM Clients

Embedders

Azure OpenAI Client

Overview

Installation

Basic Usage

Constructor

Azure SDK Setup

Option 1: AsyncAzureOpenAI (Recommended)

Option 2: AsyncOpenAI with Azure v1 Endpoint

Supported Models

Structured Output Handling

Reasoning Models (GPT-5, o1, o3)

Standard Models (GPT-4o, etc.)

Response Parsing

ParsedChatCompletion (Standard Models)

Responses.parse (Reasoning Models)

Reasoning Model Configuration

Error Handling

Refusals

Rate Limits

Automatic Retries

Token Usage Tracking

Model Detection

Example: Complete Integration

Performance Tips

Differences from OpenAIClient

Troubleshooting

Authentication Errors

Deployment Not Found

Rate Limiting

Build docs developers (and LLMs) love

Core

Data Models

Drivers

LLM Clients

Embedders

​Overview

​Installation

​Basic Usage

​Constructor

​Azure SDK Setup

​Option 1: AsyncAzureOpenAI (Recommended)

​Option 2: AsyncOpenAI with Azure v1 Endpoint

​Supported Models

​Structured Output Handling

​Reasoning Models (GPT-5, o1, o3)

​Standard Models (GPT-4o, etc.)

​Response Parsing

​ParsedChatCompletion (Standard Models)

​Responses.parse (Reasoning Models)

​Reasoning Model Configuration

​Error Handling

​Refusals

​Rate Limits

​Automatic Retries

​Token Usage Tracking

​Model Detection

​Example: Complete Integration

​Performance Tips

​Differences from OpenAIClient

​Troubleshooting

​Authentication Errors

​Deployment Not Found

​Rate Limiting

Build docs developers (and LLMs) love

Overview

Installation

Basic Usage

Constructor

Azure SDK Setup

Option 1: AsyncAzureOpenAI (Recommended)

Option 2: AsyncOpenAI with Azure v1 Endpoint

Supported Models

Structured Output Handling

Reasoning Models (GPT-5, o1, o3)

Standard Models (GPT-4o, etc.)

Response Parsing

ParsedChatCompletion (Standard Models)

Responses.parse (Reasoning Models)

Reasoning Model Configuration

Error Handling

Refusals

Rate Limits

Automatic Retries

Token Usage Tracking

Model Detection

Example: Complete Integration

Performance Tips

Differences from OpenAIClient

Troubleshooting

Authentication Errors

Deployment Not Found

Rate Limiting