Skip to main content

Overview

The AutoGen.AzureAIInference package integrates with Azure AI Inference API, providing access to:
  • Models deployed on Azure AI Studio
  • GitHub Models marketplace
  • Azure OpenAI Service
  • Various model providers through a unified interface

Installation

dotnet add package AutoGen.AzureAIInference

Azure AI Studio Setup

1

Deploy a model

Deploy a model in Azure AI Studio and obtain:
  • Endpoint URL
  • API Key
  • Model name
2

Set environment variables

$env:AZURE_AI_ENDPOINT="https://your-endpoint.inference.ai.azure.com"
$env:AZURE_AI_API_KEY="your-api-key"
3

Create an agent

using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using Azure.AI.Inference;
using Azure.Core;

var endpoint = Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_AI_API_KEY");
var modelName = "gpt-4"; // Your deployed model

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: modelName,
    systemMessage: "You are a helpful AI assistant")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var response = await agent.SendAsync("Hello!");

GitHub Models

Use models from GitHub Models marketplace:
1

Get GitHub token

Create a personal access token from GitHub Settings with appropriate scopes.
2

Set environment variable

$env:GITHUB_TOKEN="ghp_your_token_here"
3

Create agent with GitHub Models

using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using Azure.AI.Inference;
using Azure.Core;

var githubToken = Environment.GetEnvironmentVariable("GITHUB_TOKEN");
var endpoint = "https://models.inference.ai.azure.com";
var modelName = "gpt-4o-mini"; // Available on GitHub Models

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: modelName)
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var response = await agent.SendAsync("Explain async/await in C#");

ChatCompletionsClientAgent

The main agent class for Azure AI Inference:
using AutoGen.AzureAIInference;
using Azure.AI.Inference;
using Azure.Core;

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You are a helpful assistant",
    temperature: 0.7f,
    maxTokens: 1024,
    seed: null,
    responseFormat: null)
    .RegisterMessageConnector()
    .RegisterPrintMessage();

Constructor Parameters

chatCompletionsClient
ChatCompletionsClient
required
Azure AI Inference chat client
name
string
required
Unique identifier for the agent
modelName
string
required
Model name (e.g., “gpt-4”, “llama-3.1-8b”)
systemMessage
string
default:"You are a helpful AI assistant"
Instructions defining the agent’s behavior
temperature
float
default:"0.7"
Sampling temperature (0.0 = deterministic, 1.0 = creative)
maxTokens
int
default:"1024"
Maximum tokens to generate
seed
int?
Random seed for reproducible outputs
responseFormat
ChatCompletionsResponseFormat?
Response format (e.g., JSON mode)

Available Models

GitHub Models

Models available through GitHub Models marketplace:
var endpoint = "https://models.inference.ai.azure.com";
var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

// GPT-4o
var gpt4o = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o")
    .RegisterMessageConnector();

// GPT-4o Mini (Fast and efficient)
var gpt4oMini = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o-mini")
    .RegisterMessageConnector();

// GPT-4 Turbo
var gpt4Turbo = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4-turbo")
    .RegisterMessageConnector();

Basic Usage

Simple conversation example:
using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;
using AutoGen.Core;
using Azure.AI.Inference;
using Azure.Core;

var endpoint = Environment.GetEnvironmentVariable("AZURE_AI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_AI_API_KEY");

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey));

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You are a helpful coding assistant")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Send text message
var response = await agent.SendAsync(
    new TextMessage(Role.User, "Write a C# factorial function"));

Console.WriteLine(response.GetContent());

Streaming Responses

Stream responses for real-time output:
var agent = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4")
    .RegisterMessageConnector();

var messages = new[]
{
    new TextMessage(Role.User, "Write a long story")
};

await foreach (var message in agent.GenerateStreamingReplyAsync(messages))
{
    if (message.GetContent() is string content)
    {
        Console.Write(content);
    }
}

Function Calling

Add function calling capabilities:
using AutoGen.Core;
using Microsoft.Extensions.AI;

public partial class DatabaseTools
{
    /// <summary>
    /// Query database
    /// </summary>
    /// <param name="query">SQL query</param>
    [Function]
    public async Task<string> QueryDatabase(string query)
    {
        // Execute query
        return "Query results...";
    }

    /// <summary>
    /// Get table schema
    /// </summary>
    /// <param name="tableName">table name</param>
    [Function]
    public async Task<string> GetSchema(string tableName)
    {
        return $"Schema for {tableName}...";
    }
}

var tools = new DatabaseTools();

AIFunction[] functions = [
    AIFunctionFactory.Create(tools.QueryDatabase),
    AIFunctionFactory.Create(tools.GetSchema),
];

var functionMiddleware = new FunctionCallMiddleware(functions);

var agent = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4")
    .RegisterMessageConnector()
    .RegisterStreamingMiddleware(functionMiddleware)
    .RegisterPrintMessage();

var response = await agent.SendAsync(
    new TextMessage(
        Role.User,
        "Show me the schema for the Users table"));

Multi-Agent with Different Models

Combine different models in group chat:
using AutoGen.Core;
using AutoGen.AzureAIInference;
using AutoGen.AzureAIInference.Extension;

var githubToken = Environment.GetEnvironmentVariable("GITHUB_TOKEN");
var endpoint = "https://models.inference.ai.azure.com";

var client = new ChatCompletionsClient(
    new Uri(endpoint),
    new AzureKeyCredential(githubToken));

// Fast agent with GPT-4o Mini
var researcher = new ChatCompletionsClientAgent(
    client,
    "researcher",
    "gpt-4o-mini",
    systemMessage: "You research and gather information quickly")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Powerful agent with Llama 405B
var analyst = new ChatCompletionsClientAgent(
    client,
    "analyst",
    "meta-llama-3.1-405b-instruct",
    systemMessage: "You perform deep analysis and reasoning")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

// Cost-effective agent with Phi-3
var writer = new ChatCompletionsClientAgent(
    client,
    "writer",
    "Phi-3-mini-4k-instruct",
    systemMessage: "You write clear, concise summaries")
    .RegisterMessageConnector()
    .RegisterPrintMessage();

var admin = new ChatCompletionsClientAgent(
    client, "admin", "gpt-4o")
    .RegisterMessageConnector();

var group = new GroupChat(
    members: [researcher, analyst, writer],
    admin: admin);

var result = await group.CallAsync(
    new[] { new TextMessage(Role.User, "Analyze market trends in AI") },
    maxRound: 10);

JSON Mode

Force JSON responses:
using Azure.AI.Inference;

var agent = new ChatCompletionsClientAgent(
    chatCompletionsClient: client,
    name: "assistant",
    modelName: "gpt-4",
    systemMessage: "You output valid JSON only",
    responseFormat: new ChatCompletionsResponseFormat
    {
        Type = ChatCompletionsResponseFormatType.JsonObject
    })
    .RegisterMessageConnector();

var response = await agent.SendAsync(
    new TextMessage(
        Role.User,
        "Create JSON for a person with name, age, and hobbies"));

Console.WriteLine(response.GetContent());
// Output: {"name": "John", "age": 30, "hobbies": ["reading", "gaming"]}

Message Support

Supported message types:
using AutoGen.Core;

// Text message (default)
var textMsg = new TextMessage(Role.User, "Hello");

// Raw Azure AI message
var rawMsg = MessageEnvelope.Create(
    new ChatRequestUserMessage("Hello"));

// Agent handles both formats
var response = await agent.SendAsync(textMsg);

Configuration Options

Temperature and Randomness

// Deterministic (code, analysis)
var deterministic = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    temperature: 0.0f,
    seed: 42)
    .RegisterMessageConnector();

// Creative (writing, brainstorming)
var creative = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    temperature: 1.0f)
    .RegisterMessageConnector();

Token Limits

// Short responses (cost-effective)
var concise = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4o-mini",
    maxTokens: 500)
    .RegisterMessageConnector();

// Long-form content
var verbose = new ChatCompletionsClientAgent(
    client, "assistant", "gpt-4",
    maxTokens: 4096)
    .RegisterMessageConnector();

Best Practices

For speed and cost:
  • GPT-4o Mini
  • Phi-3 Mini
  • Mistral Small
For quality:
  • GPT-4o
  • Meta Llama 3.1 405B
  • Mistral Large
For balance:
  • GPT-4 Turbo
  • Meta Llama 3.1 70B
  • Mistral Nemo
// Use GitHub Models for development/testing (free tier)
var devAgent = new ChatCompletionsClientAgent(
    githubClient,
    "dev_assistant",
    "gpt-4o-mini",
    maxTokens: 500)
    .RegisterMessageConnector();

// Use Azure AI Studio for production
var prodAgent = new ChatCompletionsClientAgent(
    azureClient,
    "prod_assistant",
    "gpt-4",
    maxTokens: 2000)
    .RegisterMessageConnector();
using Azure;

try
{
    var response = await agent.SendAsync(message);
}
catch (RequestFailedException ex) when (ex.Status == 429)
{
    // Rate limited
    Console.WriteLine("Rate limit exceeded. Retrying...");
    await Task.Delay(TimeSpan.FromSeconds(5));
    // Retry
}
catch (RequestFailedException ex)
{
    Console.WriteLine($"Azure AI error: {ex.Message}");
    Console.WriteLine($"Status: {ex.Status}");
}
  • Reuse ChatCompletionsClient instances
  • Use streaming for long responses
  • Set appropriate token limits
  • Consider model capabilities vs. cost
  • Use GitHub Models for testing

Environment Variables

AZURE_AI_ENDPOINT
string
Your Azure AI endpoint URL
AZURE_AI_API_KEY
string
Your Azure AI API key
GITHUB_TOKEN
string
GitHub personal access token for GitHub Models

GitHub Models vs Azure AI Studio

GitHub Models

Advantages:
  • Free tier for testing
  • Easy setup with GitHub token
  • Access to multiple model providers
  • Great for development
Limitations:
  • Rate limits on free tier
  • Not for production at scale

Azure AI Studio

Advantages:
  • Production-ready
  • Enterprise SLAs
  • Private deployments
  • Custom fine-tuning
  • Higher rate limits
Use for:
  • Production applications
  • High-volume scenarios
  • Enterprise requirements

Next Steps

OpenAI Integration

Use OpenAI models directly

Function Calling

Add tools to your agents

Group Chat

Create multi-agent workflows

Examples

See complete examples

Build docs developers (and LLMs) love