Model clients

AgentChat agents can work with any LLM provider through the model client abstraction. All model clients implement the ChatCompletionClient interface.

OpenAI

The most commonly used provider:

from autogen_ext.models.openai import OpenAIChatCompletionClient

model_client = OpenAIChatCompletionClient(
    model="gpt-4o",
    api_key="sk-...",  # Or set OPENAI_API_KEY env var
    temperature=0.7,
    max_tokens=2000
)

Available models

gpt-4o - Latest multimodal model
gpt-4o-mini - Faster, cheaper version
gpt-4-turbo - Previous generation
gpt-3.5-turbo - Legacy, cheaper

Azure OpenAI

For Azure deployments:

from autogen_ext.models.openai import AzureOpenAIChatCompletionClient

model_client = AzureOpenAIChatCompletionClient(
    azure_endpoint="https://your-resource.openai.azure.com",
    model="gpt-4o",  # Your deployment name
    api_version="2024-02-15-preview",
    api_key="...",  # Or use Azure AD authentication
)

See Azure OpenAI documentation for more details.

Anthropic

For Claude models:

from autogen_ext.models.anthropic import AnthropicChatCompletionClient

model_client = AnthropicChatCompletionClient(
    model="claude-3-5-sonnet-20241022",
    api_key="sk-ant-...",  # Or set ANTHROPIC_API_KEY
    max_tokens=4096
)

Available models

claude-3-5-sonnet-20241022 - Latest, most capable
claude-3-5-haiku-20241022 - Fast and efficient
claude-3-opus-20240229 - Most powerful (legacy)

Extended thinking

Claude supports extended thinking mode:

model_client = AnthropicChatCompletionClient(
    model="claude-3-7-sonnet-20250219",
    extended_thinking=True  # Enable thinking mode
)

Ollama

For local models:

from autogen_ext.models.ollama import OllamaChatCompletionClient

model_client = OllamaChatCompletionClient(
    model="llama3.2",
    base_url="http://localhost:11434",  # Ollama server
    temperature=0.7
)

Ollama requires running Ollama server locally. Install from ollama.com.

Available models

llama3.2 - Meta’s Llama 3.2
codellama - Code-specialized
mistral - Mistral AI models
phi3 - Microsoft’s Phi-3

See Ollama library for all models.

Llama.cpp

For GGUF models:

from autogen_ext.models.llama_cpp import LlamaCppChatCompletionClient

model_client = LlamaCppChatCompletionClient(
    model_path="/path/to/model.gguf",
    n_gpu_layers=35,  # GPU acceleration
    temperature=0.7
)

Configuration options

All model clients support these common parameters:

model

str

required

The model identifier

temperature

float

default:"1.0"

Sampling temperature (0.0 to 2.0). Lower = more deterministic.

max_tokens

int

Maximum tokens to generate

top_p

float

default:"1.0"

Nucleus sampling parameter

api_key

str

API key (can also use environment variables)

Streaming

Enable streaming for real-time responses:

from autogen_agentchat.agents import AssistantAgent

agent = AssistantAgent(
    "assistant",
    model_client=model_client,
    model_client_stream=True  # Enable streaming
)

# Stream responses
async for message in agent.run_stream(task="Tell me a story"):
    print(message)

Token counting

Track token usage:

result = await agent.run(task="Hello!")

# Check token usage
for msg in result.messages:
    if hasattr(msg, 'models_usage'):
        usage = msg.models_usage
        print(f"Prompt tokens: {usage.prompt_tokens}")
        print(f"Completion tokens: {usage.completion_tokens}")

Model comparison

Provider	Strengths	Cost	Local
OpenAI	Most capable, multimodal	$$$	No
Azure OpenAI	Enterprise features, compliance	$$$	No
Anthropic	Long context, safety	$$$	No
Ollama	Free, privacy	Free	Yes
Llama.cpp	Maximum control, GGUF support	Free	Yes

Environment variables

Set API keys via environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://..."

Then create clients without explicit keys:

model_client = OpenAIChatCompletionClient(model="gpt-4o")
# API key read from OPENAI_API_KEY automatically

Switching providers

Switch between providers by changing the model client:

from autogen_ext.models.openai import OpenAIChatCompletionClient
model_client = OpenAIChatCompletionClient(model="gpt-4o")

The agent code stays the same - just swap the model client.

Best practices

Use environment variables for API keys

Never hardcode API keys in source code. Use environment variables or secret managers.

Start with smaller models

Test with cheaper models (gpt-4o-mini, claude-3-5-haiku) before using expensive ones.

Monitor token usage

Track token usage to control costs. Use TokenUsageTermination in teams.

Use local models for development

Use Ollama or Llama.cpp for rapid development without API costs.

Next steps

Model Clients Guide

Full model client documentation

Azure Integration

Azure-specific configuration

Quickstart

Build your first agent

Examples

See model clients in action

Getting Started

AgentChat

Core API

Extensions

Developer Tools

Guides

OpenAI

Available models

Azure OpenAI

Anthropic

Available models

Extended thinking

Ollama

Available models

Llama.cpp

Configuration options

Streaming

Token counting

Model comparison

Environment variables

Switching providers

Best practices

Next steps

Model Clients Guide

Azure Integration

Quickstart

Examples

Build docs developers (and LLMs) love

Getting Started

AgentChat

Core API

Extensions

Developer Tools

Guides

​OpenAI

​Available models

​Azure OpenAI

​Anthropic

​Available models

​Extended thinking

​Ollama

​Available models

​Llama.cpp

​Configuration options

​Streaming

​Token counting

​Model comparison

​Environment variables

​Switching providers

​Best practices

​Next steps

Model Clients Guide

Azure Integration

Quickstart

Examples

Build docs developers (and LLMs) love

OpenAI

Available models

Azure OpenAI

Anthropic

Available models

Extended thinking

Ollama

Available models

Llama.cpp

Configuration options

Streaming

Token counting

Model comparison

Environment variables

Switching providers

Best practices

Next steps