Azure OpenAI - LiteLLM

Overview

LiteLLM provides comprehensive support for Azure OpenAI Service, allowing you to use GPT-4, GPT-3.5, embeddings, and more through your Azure deployments.

Quick Start

Install LiteLLM

pip install litellm

Set Azure Credentials

export AZURE_API_KEY="your-azure-api-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"

Make Your First Call

from litellm import completion

response = completion(
    model="azure/gpt-4o",  # Your Azure deployment name
    messages=[{"role": "user", "content": "Hello Azure!"}]
)
print(response.choices[0].message.content)

Authentication

Environment Variables
Direct Parameters
Azure Active Directory
Managed Identity

Set Azure credentials via environment variables:

export AZURE_API_KEY="your-api-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"

from litellm import completion

response = completion(
    model="azure/gpt-4o",  # Your deployment name
    messages=[{"role": "user", "content": "Hello!"}]
)

Pass credentials directly:

from litellm import completion

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    api_key="your-api-key",
    api_base="https://your-resource.openai.azure.com",
    api_version="2024-02-15-preview"
)

Use Azure AD authentication:

export AZURE_AD_TOKEN="your-ad-token"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"

from litellm import completion

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Use Azure Managed Identity:

export AZURE_USE_MANAGED_IDENTITY="true"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"

from litellm import completion

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Model Naming

Azure uses deployment names, not model names. Format: azure/{deployment_name}

# If your Azure deployment is named "gpt-4o-deployment"
response = completion(
    model="azure/gpt-4o-deployment",
    messages=[{"role": "user", "content": "Hello!"}]
)

# If your deployment is named "my-gpt-35-turbo"
response = completion(
    model="azure/my-gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Common Azure Deployments

GPT-4o

model="azure/gpt-4o"

Latest GPT-4o model

GPT-4

model="azure/gpt-4"

GPT-4 Turbo

GPT-3.5 Turbo

model="azure/gpt-35-turbo"

Fast and efficient

Embeddings

model="azure/text-embedding-ada-002"

Text embeddings

API Versions

Azure OpenAI uses API versions. Recommended versions:

Version	Features	Recommended For
`2024-02-15-preview`	Latest features	Production use
`2024-08-01-preview`	Newest preview	Testing new features
`2023-12-01-preview`	Stable	Legacy support

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    api_version="2024-02-15-preview"
)

Streaming

from litellm import completion

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Vision (Multimodal)

Use GPT-4 Vision on Azure:

response = completion(
    model="azure/gpt-4o",  # Or your GPT-4-vision deployment
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Embeddings

Generate embeddings using Azure:

from litellm import embedding

response = embedding(
    model="azure/text-embedding-ada-002",  # Your deployment name
    input="Hello world"
)

print(response.data[0].embedding)
print(f"Dimensions: {len(response.data[0].embedding)}")

# Multiple texts
response = embedding(
    model="azure/text-embedding-ada-002",
    input=["Text 1", "Text 2", "Text 3"]
)

Azure Embedding Models

# text-embedding-ada-002
embedding(model="azure/text-embedding-ada-002", input="...")

# text-embedding-3-small
embedding(model="azure/text-embedding-3-small", input="...")

# text-embedding-3-large
embedding(model="azure/text-embedding-3-large", input="...", dimensions=256)

Image Generation (DALL-E)

Generate images using DALL-E on Azure:

from litellm import image_generation

response = image_generation(
    model="azure/dall-e-3",
    prompt="A sunset over mountains",
    n=1,
    size="1024x1024",
    quality="standard"  # or "hd"
)

print(response.data[0].url)

Audio Transcription (Whisper)

Transcribe audio using Whisper on Azure:

from litellm import transcription

with open("audio.mp3", "rb") as audio_file:
    response = transcription(
        model="azure/whisper",  # Your Whisper deployment
        file=audio_file,
        language="en"
    )

print(response.text)

Text-to-Speech

Generate speech from text:

from litellm import speech

response = speech(
    model="azure/tts",  # Your TTS deployment
    input="Hello, this is a test.",
    voice="alloy"  # alloy, echo, fable, onyx, nova, shimmer
)

# Save audio file
with open("output.mp3", "wb") as f:
    f.write(response.content)

Batch Processing

Process requests in batches:

from litellm import create_batch, retrieve_batch

# Create batch
batch = create_batch(
    custom_llm_provider="azure",
    input_file_id="file-abc123",
    endpoint="/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Check status
batch_status = retrieve_batch(
    custom_llm_provider="azure",
    batch_id=batch.id
)

Advanced Features

JSON Mode

response = completion(
    model="azure/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract info: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

Seed for Reproducibility

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Tell a joke"}],
    seed=42,
    temperature=0.7
)

Logprobs

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=3
)

for token in response.choices[0].logprobs.content:
    print(f"{token.token}: {token.logprob}")

Multiple Azure Deployments

Use different Azure resources:

# Resource 1
response1 = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="https://resource1.openai.azure.com",
    api_key="key1"
)

# Resource 2
response2 = completion(
    model="azure/gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="https://resource2.openai.azure.com",
    api_key="key2"
)

Content Filtering

Azure applies content filtering by default:

try:
    response = completion(
        model="azure/gpt-4o",
        messages=[{"role": "user", "content": "..."}]
    )
except Exception as e:
    # Check if content was filtered
    if "content_filter" in str(e).lower():
        print("Content was filtered by Azure")
    raise

# Access content filter results
if hasattr(response.choices[0], 'content_filter_results'):
    print(response.choices[0].content_filter_results)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="azure/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError:
    print("Invalid API key or auth")
except RateLimitError:
    print("Rate limit exceeded")
except ContextWindowExceededError:
    print("Input too long")
except APIError as e:
    print(f"Azure API error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Calculate cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

# Token usage
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")

Regional Deployments

Azure OpenAI is available in multiple regions:

# East US
response = completion(
    model="azure/gpt-4o",
    api_base="https://eastus.api.cognitive.microsoft.com/",
    api_key="..."
)

# West Europe
response = completion(
    model="azure/gpt-4o",
    api_base="https://westeurope.api.cognitive.microsoft.com/",
    api_key="..."
)

Best Practices

Use Latest API Version

Always use the latest stable API version for new features and improvements.

Handle Content Filters

Azure applies content filtering - handle these responses appropriately.

Use Managed Identity

For Azure-hosted apps, use Managed Identity instead of API keys.

Monitor Rate Limits

Track TPM (tokens per minute) and RPM (requests per minute) limits.

Troubleshooting

Deployment Not Found

# Make sure deployment name matches Azure
response = completion(
    model="azure/your-exact-deployment-name",  # Must match Azure portal
    messages=[{"role": "user", "content": "Hello"}]
)

API Version Issues

# Use a supported API version
response = completion(
    model="azure/gpt-4o",
    api_version="2024-02-15-preview",  # Check Azure docs for valid versions
    messages=[{"role": "user", "content": "Hello"}]
)

OpenAI

Learn about OpenAI models and features

Streaming

Stream responses in real-time

Function Calling

Implement function calling

Embeddings

Generate embeddings on Azure

Providers

Provider Features

​Overview

​Quick Start

​Authentication

​Model Naming

​Common Azure Deployments

GPT-4o

GPT-4

GPT-3.5 Turbo

Embeddings

​API Versions

​Streaming

​Function Calling

​Vision (Multimodal)

​Embeddings

​Azure Embedding Models

​Image Generation (DALL-E)

​Audio Transcription (Whisper)

​Text-to-Speech

​Batch Processing

​Advanced Features

​JSON Mode

​Seed for Reproducibility

​Logprobs

​Multiple Azure Deployments

​Content Filtering

​Error Handling

​Cost Tracking

​Regional Deployments

​Best Practices

Use Latest API Version

Handle Content Filters

Use Managed Identity

Monitor Rate Limits

​Troubleshooting

​Deployment Not Found

​API Version Issues

​Related Documentation

OpenAI

Streaming

Function Calling

Embeddings

Build docs developers (and LLMs) love

Overview

Quick Start

Authentication

Model Naming

Common Azure Deployments

API Versions

Streaming

Function Calling

Vision (Multimodal)

Embeddings

Azure Embedding Models

Image Generation (DALL-E)

Audio Transcription (Whisper)

Text-to-Speech

Batch Processing

Advanced Features

JSON Mode

Seed for Reproducibility

Logprobs

Multiple Azure Deployments

Content Filtering

Error Handling

Cost Tracking

Regional Deployments

Best Practices

Troubleshooting

Deployment Not Found

API Version Issues

Related Documentation