Skip to main content

Model Deployment

Deploy models in Microsoft Foundry using multiple options optimized for different scenarios.

Deployment Methods

Serverless API Deployment

Characteristics:
  • Pay-per-token billing
  • Microsoft-managed infrastructure
  • Automatic scaling
  • No capacity planning
Example:
# Model is accessed via serverless API endpoint
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

Provisioned Throughput

Characteristics:
  • Reserved capacity (PTUs)
  • Predictable cost and performance
  • Dedicated resources
  • Fungible across models
Example:
deployment = client.deployments.create(
    model="gpt-4o",
    sku={
        "name": "ProvisionedManaged",
        "capacity": 100  # Provisioned Throughput Units
    }
)

Managed Compute

Characteristics:
  • Deploy to Azure VMs
  • Billed for VM hours
  • Supports open-source models
  • Full infrastructure control

Deployment Process

1

Select Model

Choose from model catalog based on requirements
2

Choose Deployment Option

Serverless, Provisioned, or Managed Compute
3

Configure Settings

Region, capacity, version settings
4

Deploy

Create deployment via portal, CLI, or SDK
5

Test

Verify deployment with test requests

Regional Considerations

  • Model availability varies by region
  • Check Region Support
  • Consider data residency requirements
  • Evaluate latency for global users

Model Lifecycle

  • GA: Full support and SLA
  • Deprecation Notice: 6-12 months warning
  • Deprecated: No new deployments
  • Retired: Model unavailable
Set up auto-update for seamless transitions. See Model Overview for model catalog details.

Build docs developers (and LLMs) love