Skip to main content

Understanding AI Models

Microsoft Foundry provides access to a comprehensive catalog of AI models from Microsoft, OpenAI, and leading AI companies. This guide explains model concepts and helps you choose the right model for your application.

Model Catalog

The Foundry model catalog contains 1900+ models organized into two main categories:

Models Sold Directly by Azure

These models are hosted and sold by Microsoft under Microsoft Product Terms: Characteristics:
  • Direct Microsoft support and SLAs
  • Deep integration with Azure services
  • Reviewed based on Microsoft’s Responsible AI standards
  • Model documentation and transparency reports
  • Enterprise-grade scalability and security
Examples:
  • Azure OpenAI: GPT-4o, GPT-4, GPT-3.5
  • Foundry Direct: DeepSeek, xAI models
  • Microsoft models: Phi family, other Microsoft-developed models
Many Azure models support fungible Provisioned Throughput, allowing flexible use of quota and reservations across models.

Models from Partners and Community

These models are provided by trusted third-party organizations and community contributors: Characteristics:
  • Diverse specialized capabilities
  • Rapid access to cutting-edge innovations
  • Community-driven development
  • Provider-managed support and maintenance
Examples:
  • Anthropic: Claude family (text and vision)
  • Meta: Llama family (open source)
  • Cohere: Command and Embed models
  • Mistral AI: Mistral models
  • Hugging Face: 1400+ open models

Model Types

Foundation Models

Large-scale models trained on broad datasets:
  • GPT-4o: Multimodal reasoning and generation
  • GPT-4: Advanced language understanding
  • Claude: Long-context processing
  • Llama 3: Open-source alternative

Reasoning Models

Optimized for complex problem-solving:
  • Multi-step reasoning
  • Mathematical problem solving
  • Code generation and analysis
  • Logical inference

Small Language Models (SLMs)

Compact models for efficient deployment:
  • Lower latency and cost
  • On-device deployment
  • Specialized tasks
  • Examples: Phi-3 family

Multimodal Models

Process multiple input types:
  • Text and images
  • Audio and video
  • Examples: GPT-4o, Claude

Domain-Specific Models

Trained for particular industries or tasks:
  • Healthcare and life sciences
  • Financial services
  • Legal and compliance
  • Manufacturing

Deployment Options

Foundry offers multiple ways to deploy and access models:

Serverless API Deployment

How it works:
  • Pay-per-token billing
  • Microsoft-managed infrastructure
  • No capacity management needed
  • Instant scaling
Best for:
  • Development and testing
  • Variable workloads
  • Getting started quickly
  • Cost optimization with sporadic use
Example:
# Model accessed via serverless API
response = client.chat.completions.create(
    model="gpt-4o",  # Deployed as serverless
    messages=messages
)

Provisioned Throughput

How it works:
  • Reserved model capacity
  • Predictable performance and cost
  • Dedicated resources
  • Fungible across compatible models
Best for:
  • Production workloads
  • High-volume applications
  • Latency-sensitive scenarios
  • Predictable usage patterns
Example:
# Create provisioned deployment
deployment = client.deployments.create(
    model="gpt-4o",
    sku={
        "name": "ProvisionedManaged",
        "capacity": 100  # Provisioned Throughput Units
    }
)

Managed Compute

How it works:
  • Deploy to dedicated virtual machines
  • Full control over compute resources
  • Billed for VM hours
  • Supports open-source models
Best for:
  • Custom model deployments
  • Open-source models
  • Fine-tuned models
  • Specific hardware requirements

Batch Deployment

How it works:
  • Asynchronous processing
  • Cost-optimized for large batches
  • No real-time inference
  • 50% discount vs. standard pricing
Best for:
  • Bulk processing
  • Non-time-sensitive workloads
  • Data analysis and transformation
  • Evaluation and testing

Model Capabilities

Inference Tasks

Generate natural language text:
  • Chat completions
  • Content creation
  • Summarization
  • Translation

Fine-Tuning Support

Some models support fine-tuning to tailor behavior: Supported models:
  • GPT-3.5-turbo
  • GPT-4o-mini
  • Babbage-002, Davinci-002
  • Selected open-source models
Fine-tuning options:
  • Supervised fine-tuning
  • Custom dataset training
  • Hyperparameter optimization
# Create fine-tuning job
fine_tune = client.fine_tuning.jobs.create(
    training_file="file-abc123",
    model="gpt-3.5-turbo",
    hyperparameters={
        "n_epochs": 3
    }
)

Choosing the Right Model

Decision Framework

1

Define Requirements

  • What task do you need to accomplish?
  • What inputs will the model process?
  • What outputs do you expect?
  • What are your latency requirements?
2

Consider Capabilities

  • Does the model support your input types?
  • Can it handle your context length?
  • Does it support necessary features (tool calling, JSON mode)?
3

Evaluate Performance

  • Review benchmark scores
  • Test with your specific use case
  • Compare multiple models
4

Assess Cost

  • Calculate token costs for expected volume
  • Compare serverless vs. provisioned pricing
  • Consider fine-tuning costs if needed
5

Check Availability

  • Verify regional availability
  • Confirm deployment options
  • Check model lifecycle status

Model Comparison

ModelStrengthsBest ForContext Length
GPT-4oStrong reasoning, multimodal, tool useComplex tasks, agents, vision128K tokens
GPT-4Advanced understanding, reasoningHigh-quality content, analysis128K tokens
GPT-3.5-turboFast, cost-effectiveSimple tasks, high volume16K tokens
Claude 3Long context, analysisDocument processing, research200K tokens
Llama 3Open source, customizableFine-tuning, specialized tasks8K tokens
Phi-3Small, efficientEdge deployment, low latency4K tokens

Model Features

Tool Calling

Models that support function calling:
  • GPT-4o, GPT-4, GPT-3.5-turbo
  • Claude 3 family
  • Gemini Pro
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {...}
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

JSON Mode

Ensure valid JSON output:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    response_format={"type": "json_object"}
)

Vision Capabilities

Process images alongside text:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }]
)

Streaming

Receive responses incrementally:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

Model Lifecycle

Versioning

Models are versioned with release dates:
  • gpt-4o-2024-05-13
  • gpt-4-turbo-2024-04-09
  • gpt-3.5-turbo-0125

Deprecation and Retirement

Models have defined lifecycle phases:
  1. Generally Available: Full support and SLA
  2. Deprecation Notice: 6-12 months advance warning
  3. Deprecated: No new deployments allowed
  4. Retired: Model no longer available
Monitor deprecation announcements and plan migrations early. Set up automatic updates where available.

Auto-Update

Some deployments support automatic version updates:
deployment = client.deployments.create(
    model="gpt-4o",
    version_upgrade_option="OnceNewDefaultVersionAvailable"
)

Regional Availability

Model availability varies by region:
  • Check the region support page
  • Consider data residency requirements
  • Evaluate latency for global applications

Next Steps

Model Deployment

Learn how to deploy models

Region Support

Check model availability by region

Model Overview

Explore the model catalog

Quickstart

Start using models

Build docs developers (and LLMs) love