Skip to main content
AgentChat agents can work with any LLM provider through the model client abstraction. All model clients implement the ChatCompletionClient interface.

OpenAI

The most commonly used provider:
from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    api_key="sk-...",  # Or set OPENAI_API_KEY env var
    temperature=0.7,
    max_tokens=2000
)

Available models

  • gpt-4o - Latest multimodal model
  • gpt-4o-mini - Faster, cheaper version
  • gpt-4-turbo - Previous generation
  • gpt-3.5-turbo - Legacy, cheaper

Azure OpenAI

For Azure deployments:
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient

model_client = AzureOpenAIChatCompletionClient(
    azure_endpoint="https://your-resource.openai.azure.com",
    model="gpt-4o",  # Your deployment name
    api_version="2024-02-15-preview",
    api_key="...",  # Or use Azure AD authentication
)
See Azure OpenAI documentation for more details.

Anthropic

For Claude models:
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

model_client = AnthropicChatCompletionClient(
    model="claude-3-5-sonnet-20241022",
    api_key="sk-ant-...",  # Or set ANTHROPIC_API_KEY
    max_tokens=4096
)

Available models

  • claude-3-5-sonnet-20241022 - Latest, most capable
  • claude-3-5-haiku-20241022 - Fast and efficient
  • claude-3-opus-20240229 - Most powerful (legacy)

Extended thinking

Claude supports extended thinking mode:
model_client = AnthropicChatCompletionClient(
    model="claude-3-7-sonnet-20250219",
    extended_thinking=True  # Enable thinking mode
)

Ollama

For local models:
from autogen_ext.models.ollama import OllamaChatCompletionClient

model_client = OllamaChatCompletionClient(
    model="llama3.2",
    base_url="http://localhost:11434",  # Ollama server
    temperature=0.7
)
Ollama requires running Ollama server locally. Install from ollama.com.

Available models

  • llama3.2 - Meta’s Llama 3.2
  • codellama - Code-specialized
  • mistral - Mistral AI models
  • phi3 - Microsoft’s Phi-3
See Ollama library for all models.

Llama.cpp

For GGUF models:
from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient

model_client = LlamaCppChatCompletionClient(
    model_path="/path/to/model.gguf",
    n_gpu_layers=35,  # GPU acceleration
    temperature=0.7
)

Configuration options

All model clients support these common parameters:
model
str
required
The model identifier
temperature
float
default:"1.0"
Sampling temperature (0.0 to 2.0). Lower = more deterministic.
max_tokens
int
Maximum tokens to generate
top_p
float
default:"1.0"
Nucleus sampling parameter
api_key
str
API key (can also use environment variables)

Streaming

Enable streaming for real-time responses:
from autogen_agentchat.agents import AssistantAgent

agent = AssistantAgent(
    "assistant",
    model_client=model_client,
    model_client_stream=True  # Enable streaming
)

# Stream responses
async for message in agent.run_stream(task="Tell me a story"):
    print(message)

Token counting

Track token usage:
result = await agent.run(task="Hello!")

# Check token usage
for msg in result.messages:
    if hasattr(msg, 'models_usage'):
        usage = msg.models_usage
        print(f"Prompt tokens: {usage.prompt_tokens}")
        print(f"Completion tokens: {usage.completion_tokens}")

Model comparison

ProviderStrengthsCostLocal
OpenAIMost capable, multimodal$$$No
Azure OpenAIEnterprise features, compliance$$$No
AnthropicLong context, safety$$$No
OllamaFree, privacyFreeYes
Llama.cppMaximum control, GGUF supportFreeYes

Environment variables

Set API keys via environment variables:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://..."
Then create clients without explicit keys:
model_client = OpenAIChatCompletionClient(model="gpt-4o")
# API key read from OPENAI_API_KEY automatically

Switching providers

Switch between providers by changing the model client:
from autogen_ext.models.openai import OpenAIChatCompletionClient
model_client = OpenAIChatCompletionClient(model="gpt-4o")
The agent code stays the same - just swap the model client.

Best practices

Never hardcode API keys in source code. Use environment variables or secret managers.
Test with cheaper models (gpt-4o-mini, claude-3-5-haiku) before using expensive ones.
Track token usage to control costs. Use TokenUsageTermination in teams.
Use Ollama or Llama.cpp for rapid development without API costs.

Next steps

Model Clients Guide

Full model client documentation

Azure Integration

Azure-specific configuration

Quickstart

Build your first agent

Examples

See model clients in action

Build docs developers (and LLMs) love