Skip to main content

Overview

LiteLLM provides comprehensive support for Azure OpenAI Service, allowing you to use GPT-4, GPT-3.5, embeddings, and more through your Azure deployments.

Quick Start

1

Install LiteLLM

pip install litellm
2

Set Azure Credentials

export AZURE_API_KEY="your-azure-api-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"
3

Make Your First Call

from litellm import completion

response = completion(
    model="azure/gpt-4o",  # Your Azure deployment name
    messages=[{"role": "user", "content": "Hello Azure!"}]
)
print(response.choices[0].message.content)

Authentication

Set Azure credentials via environment variables:
export AZURE_API_KEY="your-api-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"
export AZURE_API_VERSION="2024-02-15-preview"
from litellm import completion

response = completion(
    model="azure/gpt-4o",  # Your deployment name
    messages=[{"role": "user", "content": "Hello!"}]
)

Model Naming

Azure uses deployment names, not model names. Format: azure/{deployment_name}
# If your Azure deployment is named "gpt-4o-deployment"
response = completion(
    model="azure/gpt-4o-deployment",
    messages=[{"role": "user", "content": "Hello!"}]
)

# If your deployment is named "my-gpt-35-turbo"
response = completion(
    model="azure/my-gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)

Common Azure Deployments

GPT-4o

model="azure/gpt-4o"
Latest GPT-4o model

GPT-4

model="azure/gpt-4"
GPT-4 Turbo

GPT-3.5 Turbo

model="azure/gpt-35-turbo"
Fast and efficient

Embeddings

model="azure/text-embedding-ada-002"
Text embeddings

API Versions

Azure OpenAI uses API versions. Recommended versions:
VersionFeaturesRecommended For
2024-02-15-previewLatest featuresProduction use
2024-08-01-previewNewest previewTesting new features
2023-12-01-previewStableLegacy support
response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    api_version="2024-02-15-preview"
)

Streaming

from litellm import completion

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Seattle?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Vision (Multimodal)

Use GPT-4 Vision on Azure:
response = completion(
    model="azure/gpt-4o",  # Or your GPT-4-vision deployment
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {"url": "https://example.com/image.jpg"}
            }
        ]
    }]
)

Embeddings

Generate embeddings using Azure:
from litellm import embedding

response = embedding(
    model="azure/text-embedding-ada-002",  # Your deployment name
    input="Hello world"
)

print(response.data[0].embedding)
print(f"Dimensions: {len(response.data[0].embedding)}")

# Multiple texts
response = embedding(
    model="azure/text-embedding-ada-002",
    input=["Text 1", "Text 2", "Text 3"]
)

Azure Embedding Models

# text-embedding-ada-002
embedding(model="azure/text-embedding-ada-002", input="...")

# text-embedding-3-small
embedding(model="azure/text-embedding-3-small", input="...")

# text-embedding-3-large
embedding(model="azure/text-embedding-3-large", input="...", dimensions=256)

Image Generation (DALL-E)

Generate images using DALL-E on Azure:
from litellm import image_generation

response = image_generation(
    model="azure/dall-e-3",
    prompt="A sunset over mountains",
    n=1,
    size="1024x1024",
    quality="standard"  # or "hd"
)

print(response.data[0].url)

Audio Transcription (Whisper)

Transcribe audio using Whisper on Azure:
from litellm import transcription

with open("audio.mp3", "rb") as audio_file:
    response = transcription(
        model="azure/whisper",  # Your Whisper deployment
        file=audio_file,
        language="en"
    )

print(response.text)

Text-to-Speech

Generate speech from text:
from litellm import speech

response = speech(
    model="azure/tts",  # Your TTS deployment
    input="Hello, this is a test.",
    voice="alloy"  # alloy, echo, fable, onyx, nova, shimmer
)

# Save audio file
with open("output.mp3", "wb") as f:
    f.write(response.content)

Batch Processing

Process requests in batches:
from litellm import create_batch, retrieve_batch

# Create batch
batch = create_batch(
    custom_llm_provider="azure",
    input_file_id="file-abc123",
    endpoint="/chat/completions",
    completion_window="24h"
)

print(f"Batch ID: {batch.id}")

# Check status
batch_status = retrieve_batch(
    custom_llm_provider="azure",
    batch_id=batch.id
)

Advanced Features

JSON Mode

response = completion(
    model="azure/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract info: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

import json
data = json.loads(response.choices[0].message.content)

Seed for Reproducibility

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Tell a joke"}],
    seed=42,
    temperature=0.7
)

Logprobs

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    logprobs=True,
    top_logprobs=3
)

for token in response.choices[0].logprobs.content:
    print(f"{token.token}: {token.logprob}")

Multiple Azure Deployments

Use different Azure resources:
# Resource 1
response1 = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="https://resource1.openai.azure.com",
    api_key="key1"
)

# Resource 2
response2 = completion(
    model="azure/gpt-35-turbo",
    messages=[{"role": "user", "content": "Hello"}],
    api_base="https://resource2.openai.azure.com",
    api_key="key2"
)

Content Filtering

Azure applies content filtering by default:
try:
    response = completion(
        model="azure/gpt-4o",
        messages=[{"role": "user", "content": "..."}]
    )
except Exception as e:
    # Check if content was filtered
    if "content_filter" in str(e).lower():
        print("Content was filtered by Azure")
    raise

# Access content filter results
if hasattr(response.choices[0], 'content_filter_results'):
    print(response.choices[0].content_filter_results)

Error Handling

from litellm import completion
from litellm.exceptions import (
    AuthenticationError,
    RateLimitError,
    ContextWindowExceededError,
    APIError
)

try:
    response = completion(
        model="azure/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except AuthenticationError:
    print("Invalid API key or auth")
except RateLimitError:
    print("Rate limit exceeded")
except ContextWindowExceededError:
    print("Input too long")
except APIError as e:
    print(f"Azure API error: {e}")

Cost Tracking

from litellm import completion, completion_cost

response = completion(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Calculate cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

# Token usage
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")

Regional Deployments

Azure OpenAI is available in multiple regions:
# East US
response = completion(
    model="azure/gpt-4o",
    api_base="https://eastus.api.cognitive.microsoft.com/",
    api_key="..."
)

# West Europe
response = completion(
    model="azure/gpt-4o",
    api_base="https://westeurope.api.cognitive.microsoft.com/",
    api_key="..."
)

Best Practices

Use Latest API Version

Always use the latest stable API version for new features and improvements.

Handle Content Filters

Azure applies content filtering - handle these responses appropriately.

Use Managed Identity

For Azure-hosted apps, use Managed Identity instead of API keys.

Monitor Rate Limits

Track TPM (tokens per minute) and RPM (requests per minute) limits.

Troubleshooting

Deployment Not Found

# Make sure deployment name matches Azure
response = completion(
    model="azure/your-exact-deployment-name",  # Must match Azure portal
    messages=[{"role": "user", "content": "Hello"}]
)

API Version Issues

# Use a supported API version
response = completion(
    model="azure/gpt-4o",
    api_version="2024-02-15-preview",  # Check Azure docs for valid versions
    messages=[{"role": "user", "content": "Hello"}]
)

OpenAI

Learn about OpenAI models and features

Streaming

Stream responses in real-time

Function Calling

Implement function calling

Embeddings

Generate embeddings on Azure

Build docs developers (and LLMs) love