Azure AI Inference

Overview

The AutoGen.AzureAIInference package integrates with Azure AI Inference API, providing access to:

Models deployed on Azure AI Studio
GitHub Models marketplace
Azure OpenAI Service
Various model providers through a unified interface

Installation

dotnet add package AutoGen.AzureAIInference

Azure AI Studio Setup

Deploy a model

Deploy a model in Azure AI Studio and obtain:

Endpoint URL
API Key
Model name

Set environment variables

$env:AZURE_AI_ENDPOINT="https://your-endpoint.inference.ai.azure.com"
$env:AZURE_AI_API_KEY="your-api-key"

Create an agent

using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using Azure.AI.Inference;
using Azure.Core;

var endpoint = Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_AI_API_KEY");
var modelName = "gpt-4"; // Your deployed model

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: modelName,
    systemMessage: "You are a helpful AI assistant")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var response = await agent.SendAsync("Hello!");

GitHub Models

Use models from GitHub Models marketplace:

Get GitHub token

Create a personal access token from GitHub Settings with appropriate scopes.

Set environment variable

$env:GITHUB_TOKEN="ghp_your_token_here"

Create agent with GitHub Models

using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using Azure.AI.Inference;
using Azure.Core;

var githubToken = Environment.GetEnvironmentVariable("GITHUB_TOKEN");
var endpoint = "https://models.inference.ai.azure.com";
var modelName = "gpt-4o-mini"; // Available on GitHub Models

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: modelName)
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var response = await agent.SendAsync("Explain async/await in C#");

ChatCompletionsClientAgent

The main agent class for Azure AI Inference:

using AutoGen.AzureAIInference;
using Azure.AI.Inference;
using Azure.Core;

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You are a helpful assistant",
    temperature: 0.7f,
    maxTokens: 1024,
    seed: null,
    responseFormat: null)
    .RegisterMessageConnector()
    .RegisterPrintMessage();

Constructor Parameters

chatCompletionsClient

ChatCompletionsClient

required

Azure AI Inference chat client

name

string

required

Unique identifier for the agent

modelName

string

required

Model name (e.g., “gpt-4”, “llama-3.1-8b”)

systemMessage

string

default:"You are a helpful AI assistant"

Instructions defining the agent’s behavior

temperature

float

default:"0.7"

Sampling temperature (0.0 = deterministic, 1.0 = creative)

maxTokens

int

default:"1024"

Maximum tokens to generate

seed

int?

Random seed for reproducible outputs

responseFormat

ChatCompletionsResponseFormat?

Response format (e.g., JSON mode)

Available Models

GitHub Models

Models available through GitHub Models marketplace:

OpenAI Models
Meta Models
Mistral Models
Other Models

var endpoint = "https://models.inference.ai.azure.com";
var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

// GPT-4o
var gpt4o = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o")
    .RegisterMessageConnector();

// GPT-4o Mini (Fast and efficient)
var gpt4oMini = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o-mini")
    .RegisterMessageConnector();

// GPT-4 Turbo
var gpt4Turbo = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4-turbo")
    .RegisterMessageConnector();

// Llama 3.1 8B Instruct
var llama8b = new ChatCompletionsClientAgent(
    client, "assistant", "meta-llama-3.1-8b-instruct")
    .RegisterMessageConnector();

// Llama 3.1 70B Instruct
var llama70b = new ChatCompletionsClientAgent(
    client, "assistant", "meta-llama-3.1-70b-instruct")
    .RegisterMessageConnector();

// Llama 3.1 405B Instruct (Most capable)
var llama405b = new ChatCompletionsClientAgent(
    client, "assistant", "meta-llama-3.1-405b-instruct")
    .RegisterMessageConnector();

// Mistral Small
var mistralSmall = new ChatCompletionsClientAgent(
    client, "assistant", "Mistral-small")
    .RegisterMessageConnector();

// Mistral Large
var mistralLarge = new ChatCompletionsClientAgent(
    client, "assistant", "Mistral-large")
    .RegisterMessageConnector();

// Mistral Nemo
var mistralNemo = new ChatCompletionsClientAgent(
    client, "assistant", "Mistral-Nemo")
    .RegisterMessageConnector();

// Cohere Command R+
var cohere = new ChatCompletionsClientAgent(
    client, "assistant", "cohere-command-r-plus")
    .RegisterMessageConnector();

// Phi-3 Mini
var phi3 = new ChatCompletionsClientAgent(
    client, "assistant", "Phi-3-mini-4k-instruct")
    .RegisterMessageConnector();

// Check GitHub Models for latest available models

Basic Usage

Simple conversation example:

using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using AutoGen.Core;
using Azure.AI.Inference;
using Azure.Core;

var endpoint = Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_AI_API_KEY");

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You are a helpful coding assistant")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Send text message
var response = await agent.SendAsync(
    new TextMessage(Role.User, "Write a C# factorial function"));

Console.WriteLine(response.GetContent());

Streaming Responses

Stream responses for real-time output:

var agent = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4")
    .RegisterMessageConnector();

var messages = new[]
{
    new TextMessage(Role.User, "Write a long story")
};

await foreach (var message in agent.GenerateStreamingReplyAsync(messages))
{
    if (message.GetContent() is string content)
    {
        Console.Write(content);
    }
}

Function Calling

Add function calling capabilities:

using AutoGen.Core;
using Microsoft.Extensions.AI;

public partial class DatabaseTools
{
    /// <summary>
    /// Query database
    /// </summary>
    /// <param name="query">SQL query</param>
    [Function]
    public async Task<string> QueryDatabase(string query)
    {
        // Execute query
        return "Query results...";
    }

    /// <summary>
    /// Get table schema
    /// </summary>
    /// <param name="tableName">table name</param>
    [Function]
    public async Task<string> GetSchema(string tableName)
    {
        return $"Schema for {tableName}...";
    }
}

var tools = new DatabaseTools();

AIFunction[] functions = [
    AIFunctionFactory.Create(tools.QueryDatabase),
    AIFunctionFactory.Create(tools.GetSchema),
];

var functionMiddleware = new FunctionCallMiddleware(functions);

var agent = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4")
    .RegisterMessageConnector()
    .RegisterStreamingMiddleware(functionMiddleware)
    .RegisterPrintMessage();

var response = await agent.SendAsync(
    new TextMessage(
        Role.User,
        "Show me the schema for the Users table"));

Multi-Agent with Different Models

Combine different models in group chat:

using AutoGen.Core;
using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;

var githubToken = Environment.GetEnvironmentVariable("GITHUB_TOKEN");
var endpoint = "https://models.inference.ai.azure.com";

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

// Fast agent with GPT-4o Mini
var researcher = new ChatCompletionsClientAgent(
    client,
    "researcher",
    "gpt-4o-mini",
    systemMessage: "You research and gather information quickly")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Powerful agent with Llama 405B
var analyst = new ChatCompletionsClientAgent(
    client,
    "analyst",
    "meta-llama-3.1-405b-instruct",
    systemMessage: "You perform deep analysis and reasoning")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Cost-effective agent with Phi-3
var writer = new ChatCompletionsClientAgent(
    client,
    "writer",
    "Phi-3-mini-4k-instruct",
    systemMessage: "You write clear, concise summaries")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var admin = new ChatCompletionsClientAgent(
    client, "admin", "gpt-4o")
    .RegisterMessageConnector();

var group = new GroupChat(
    members: [researcher, analyst, writer],
    admin: admin);

var result = await group.CallAsync(
    new[] { new TextMessage(Role.User, "Analyze market trends in AI") },
    maxRound: 10);

JSON Mode

Force JSON responses:

using Azure.AI.Inference;

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You output valid JSON only",
    responseFormat: new ChatCompletionsResponseFormat
    {
        Type = ChatCompletionsResponseFormatType.JsonObject
    })
    .RegisterMessageConnector();

var response = await agent.SendAsync(
    new TextMessage(
        Role.User,
        "Create JSON for a person with name, age, and hobbies"));

Console.WriteLine(response.GetContent());
// Output: {"name": "John", "age": 30, "hobbies": ["reading", "gaming"]}

Message Support

Supported message types:

using AutoGen.Core;

// Text message (default)
var textMsg = new TextMessage(Role.User, "Hello");

// Raw Azure AI message
var rawMsg = MessageEnvelope.Create(
    new ChatRequestUserMessage("Hello"));

// Agent handles both formats
var response = await agent.SendAsync(textMsg);

Configuration Options

Temperature and Randomness

// Deterministic (code, analysis)
var deterministic = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    temperature: 0.0f,
    seed: 42)
    .RegisterMessageConnector();

// Creative (writing, brainstorming)
var creative = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    temperature: 1.0f)
    .RegisterMessageConnector();

Token Limits

// Short responses (cost-effective)
var concise = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o-mini",
    maxTokens: 500)
    .RegisterMessageConnector();

// Long-form content
var verbose = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    maxTokens: 4096)
    .RegisterMessageConnector();

Best Practices

Model Selection

For speed and cost:

GPT-4o Mini
Phi-3 Mini
Mistral Small

For quality:

GPT-4o
Meta Llama 3.1 405B
Mistral Large

For balance:

GPT-4 Turbo
Meta Llama 3.1 70B
Mistral Nemo

Cost Optimization

// Use GitHub Models for development/testing (free tier)
var devAgent = new ChatCompletionsClientAgent(
    githubClient,
    "dev_assistant",
    "gpt-4o-mini",
    maxTokens: 500)
    .RegisterMessageConnector();

// Use Azure AI Studio for production
var prodAgent = new ChatCompletionsClientAgent(
    azureClient,
    "prod_assistant",
    "gpt-4",
    maxTokens: 2000)
    .RegisterMessageConnector();

Error Handling

using Azure;

try
{
    var response = await agent.SendAsync(message);
}
catch (RequestFailedException ex) when (ex.Status == 429)
{
    // Rate limited
    Console.WriteLine("Rate limit exceeded. Retrying...");
    await Task.Delay(TimeSpan.FromSeconds(5));
    // Retry
}
catch (RequestFailedException ex)
{
    Console.WriteLine($"Azure AI error: {ex.Message}");
    Console.WriteLine($"Status: {ex.Status}");
}

Performance

Reuse ChatCompletionsClient instances
Use streaming for long responses
Set appropriate token limits
Consider model capabilities vs. cost
Use GitHub Models for testing

Environment Variables

AZURE_AI_ENDPOINT

string

Your Azure AI endpoint URL

AZURE_AI_API_KEY

string

Your Azure AI API key

GITHUB_TOKEN

string

GitHub personal access token for GitHub Models

GitHub Models vs Azure AI Studio

GitHub Models

Advantages:

Free tier for testing
Easy setup with GitHub token
Access to multiple model providers
Great for development

Limitations:

Rate limits on free tier
Not for production at scale

Azure AI Studio

Advantages:

Production-ready
Enterprise SLAs
Private deployments
Custom fine-tuning
Higher rate limits

Use for:

Production applications
High-volume scenarios
Enterprise requirements

Next Steps

OpenAI Integration

Use OpenAI models directly

Function Calling

Add tools to your agents

Group Chat

Create multi-agent workflows

Examples

See complete examples

Get Started

Core Concepts

Integrations

Azure AI Inference

Overview

Installation

Azure AI Studio Setup

GitHub Models

ChatCompletionsClientAgent

Constructor Parameters

Available Models

GitHub Models

Basic Usage

Streaming Responses

Function Calling

Multi-Agent with Different Models

JSON Mode

Message Support

Configuration Options

Temperature and Randomness

Token Limits

Best Practices

Environment Variables

GitHub Models vs Azure AI Studio

GitHub Models

Azure AI Studio

Next Steps

OpenAI Integration

Function Calling

Group Chat

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

Integrations

​Overview

​Installation

​Azure AI Studio Setup

​GitHub Models

​ChatCompletionsClientAgent

​Constructor Parameters

​Available Models

​GitHub Models

​Basic Usage

​Streaming Responses

​Function Calling

​Multi-Agent with Different Models

​JSON Mode

​Message Support

​Configuration Options

​Temperature and Randomness

​Token Limits

​Best Practices

​Environment Variables

​GitHub Models vs Azure AI Studio

GitHub Models

Azure AI Studio

​Next Steps

OpenAI Integration

Function Calling

Group Chat

Examples

Build docs developers (and LLMs) love

Overview

Installation

Azure AI Studio Setup

GitHub Models

ChatCompletionsClientAgent

Constructor Parameters

Available Models

GitHub Models

Basic Usage

Streaming Responses

Function Calling

Multi-Agent with Different Models

JSON Mode

Message Support

Configuration Options

Temperature and Randomness

Token Limits

Best Practices

Environment Variables

GitHub Models vs Azure AI Studio

Next Steps