Skip to main content

Other LLM Providers

Beyond Azure OpenAI, OpenAI, and Anthropic, Microsoft Agent Framework supports several additional providers including AWS Bedrock, Ollama for local models, GitHub Copilot, and more.

Supported Providers

AWS Bedrock

Access models via Amazon Bedrock

Ollama

Run models locally with Ollama

GitHub Copilot

Use GitHub Copilot models

Azure AI Foundry Local

Local model inference via Foundry

AWS Bedrock

AWS Bedrock provides access to foundation models from multiple providers through a single API.

Installation

pip install agent-framework --pre
pip install agent-framework-bedrock

Authentication

Bedrock uses AWS credentials:
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_SESSION_TOKEN=your-session-token  # Optional
BEDROCK_REGION=us-east-1
BEDROCK_CHAT_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0

Basic Usage

import asyncio
from agent_framework import Agent
from agent_framework.amazon import BedrockChatClient

async def main():
    # Create agent with Bedrock
    agent = Agent(
        client=BedrockChatClient(),
        instructions="You are a helpful assistant.",
        name="BedrockAgent",
    )
    
    result = await agent.run("What is the capital of France?")
    print(result.text)

asyncio.run(main())

Available Models

Bedrock provides access to models from multiple providers:
ProviderModel IDBest For
Anthropicanthropic.claude-3-sonnet-20240229-v1:0General purpose
Anthropicanthropic.claude-3-haiku-20240307-v1:0Speed and cost
Anthropicanthropic.claude-3-opus-20240229-v1:0Maximum capability
Metameta.llama3-70b-instruct-v1:0Open source, reasoning
Amazonamazon.titan-text-premier-v1:0AWS-native
AI21 Labsai21.jamba-instruct-v1:0Long context
Coherecohere.command-r-plus-v1:0Retrieval, summarization
Mistralmistral.mistral-large-2407-v1:0Multilingual
Model availability varies by AWS region. Check the Bedrock documentation for details.

Configuration

from agent_framework.amazon import BedrockChatClient

client = BedrockChatClient(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region="us-east-1",
    max_tokens=4096,
)

Function Calling

import asyncio
from typing import Annotated
from agent_framework import Agent, tool
from agent_framework.amazon import BedrockChatClient
from pydantic import Field

@tool(approval_mode="never_require")
def get_weather(city: Annotated[str, Field(description="City name")]) -> dict:
    """Get the weather for a city."""
    return {"city": city, "forecast": "72F and sunny"}

async def main():
    agent = Agent(
        client=BedrockChatClient(),
        instructions="You are a weather assistant.",
        name="WeatherAgent",
        tools=[get_weather],
    )
    
    result = await agent.run("What's the weather in Seattle?")
    print(result.text)

asyncio.run(main())
Not all Bedrock models support function calling. Claude 3 models have excellent function calling support.

Ollama

Ollama enables running large language models locally on your machine.

Installation

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Install the framework package:
pip install agent-framework --pre
pip install agent-framework-ollama

Basic Usage

import asyncio
from agent_framework.ollama import OllamaChatClient

async def main():
    # Ollama must be running locally (ollama serve)
    client = OllamaChatClient()
    agent = client.as_agent(
        instructions="You are a helpful assistant.",
    )
    
    result = await agent.run("What is the capital of France?")
    print(result)

asyncio.run(main())

Configuration

OLLAMA_ENDPOINT=http://localhost:11434
OLLAMA_MODEL_ID=llama3.2

Available Models

Popular models available via Ollama:
ModelSizeBest ForFunction Calling
llama3.23BFast, general purpose✅ Limited
llama3.18B/70BReasoning, coding✅ Limited
mistral7BInstruction following⚠️ Limited
codellama7B/13B/34BCode generation
phi33.8BSmall, efficient⚠️ Limited
gemma29B/27BGoogle’s model⚠️ Limited
qwen2.50.5B-72BMultilingual✅ Good
deepseek-coder6.7B/33BCode understanding
Install models with ollama pull <model-name>. Not all models support function calling - check model capabilities before using tools.

Multimodal Models

Some Ollama models support vision:
import asyncio
from agent_framework import Message
from agent_framework.ollama import OllamaChatClient

async def main():
    # Use a multimodal model like llava
    client = OllamaChatClient(model_id="llava")
    agent = client.as_agent(
        instructions="You analyze images.",
    )
    
    message = Message(
        role="user",
        text="What's in this image?",
        images=["path/to/image.jpg"],
    )
    
    result = await agent.run(message)
    print(result)

asyncio.run(main())
Multimodal models like llava and llava-phi3 support image inputs. Pull them with ollama pull llava.

GitHub Copilot

Use GitHub Copilot models through the Copilot CLI.

Installation

  1. Install GitHub Copilot CLI
  2. Install the framework package:
pip install agent-framework --pre
pip install agent-framework-github-copilot

Basic Usage

import asyncio
from agent_framework.github import GitHubCopilotAgent

async def main():
    agent = GitHubCopilotAgent(
        instructions="You are a helpful assistant.",
    )
    
    async with agent:
        result = await agent.run("What is the capital of France?")
        print(result)

asyncio.run(main())

Configuration

GITHUB_COPILOT_CLI_PATH=/path/to/copilot-cli
GITHUB_COPILOT_MODEL=gpt-5
GITHUB_COPILOT_TIMEOUT=30
GITHUB_COPILOT_LOG_LEVEL=info

Available Models

GitHub Copilot provides access to multiple models:
  • gpt-5 - Latest GPT model
  • claude-sonnet-4 - Anthropic Claude
  • o1-preview - OpenAI reasoning model
  • o3-mini - Compact reasoning model
Model availability depends on your GitHub Copilot subscription and organization settings.

Azure AI Foundry Local

Run models locally via Azure AI Foundry for development and testing.

Installation

pip install agent-framework --pre
pip install agent-framework-foundry-local

Basic Usage

import asyncio
from agent_framework.foundry_local import FoundryLocalAgent

async def main():
    agent = FoundryLocalAgent(
        instructions="You are a helpful assistant.",
    )
    
    result = await agent.run("What is the capital of France?")
    print(result)

asyncio.run(main())

Choosing a Provider

Here’s guidance on when to use each provider:
  • You’re already using AWS infrastructure
  • You need access to multiple model providers
  • You want managed scaling and availability
  • You require AWS compliance features
  • You need region-specific deployments
  • You want to run models locally
  • You need offline operation
  • You’re concerned about data privacy
  • You want to avoid API costs
  • You’re doing local development
  • You need fast iteration without rate limits
  • You have a GitHub Copilot subscription
  • You want access to multiple models through one API
  • You’re building developer tools
  • You want model selection flexibility
  • You’re developing Azure AI Foundry applications
  • You need local testing before cloud deployment
  • You want to prototype without cloud costs
  • You’re working offline or in restricted environments

Provider Comparison

FeatureBedrockOllamaGitHub CopilotFoundry Local
Cost$$Free (local)$ (subscription)Free (local)
Internet Required
Setup ComplexityMediumLowLowMedium
Model SelectionMultiple providersLarge catalogMultipleLimited
Function Calling✅ Model dependent⚠️ Limited⚠️ Limited
Streaming
Production Ready⚠️ Depends❌ Dev only

Best Practices

  1. Use IAM roles for authentication in production
  2. Enable CloudWatch logging for debugging
  3. Choose region based on data residency requirements
  4. Monitor costs - different models have different pricing
  5. Test model availability in your target region
  1. Ensure sufficient RAM for your chosen model
  2. Use GPU acceleration when available
  3. Keep Ollama updated for latest models
  4. Test model capabilities before production use
  5. Not all models support function calling
  6. Consider model size vs. quality tradeoffs
  1. Verify your organization allows Copilot use
  2. Check model availability for your subscription
  3. Monitor token usage
  4. Implement retry logic for rate limits
  5. Test fallback to other providers
  1. Only use for development and testing
  2. Transition to cloud for production
  3. Test with same models as production
  4. Monitor resource usage
  5. Keep dependencies updated

Troubleshooting

  1. Verify AWS credentials are configured correctly
  2. Check IAM permissions for Bedrock access
  3. Ensure the model is available in your region
  4. Verify network connectivity to AWS
  5. Check CloudWatch logs for detailed errors
  1. Verify Ollama is running: ollama serve
  2. Check if the model is pulled: ollama list
  3. Verify endpoint URL (default: http://localhost:11434)
  4. Check system resources (RAM, GPU)
  5. Review Ollama logs for errors
  1. Verify Copilot CLI is installed
  2. Check authentication: gh auth status
  3. Verify subscription is active
  4. Check model availability
  5. Review CLI logs for details

Next Steps

Provider Comparison

Compare all available providers

Function Tools

Add function calling capabilities

Workflows

Build multi-agent workflows

Hosting & Deployment

Deploy agents to production

Build docs developers (and LLMs) love