Understanding AI Models
Microsoft Foundry provides access to a comprehensive catalog of AI models from Microsoft, OpenAI, and leading AI companies. This guide explains model concepts and helps you choose the right model for your application.Model Catalog
The Foundry model catalog contains 1900+ models organized into two main categories:Models Sold Directly by Azure
These models are hosted and sold by Microsoft under Microsoft Product Terms: Characteristics:- Direct Microsoft support and SLAs
- Deep integration with Azure services
- Reviewed based on Microsoft’s Responsible AI standards
- Model documentation and transparency reports
- Enterprise-grade scalability and security
- Azure OpenAI: GPT-4o, GPT-4, GPT-3.5
- Foundry Direct: DeepSeek, xAI models
- Microsoft models: Phi family, other Microsoft-developed models
Many Azure models support fungible Provisioned Throughput, allowing flexible use of quota and reservations across models.
Models from Partners and Community
These models are provided by trusted third-party organizations and community contributors: Characteristics:- Diverse specialized capabilities
- Rapid access to cutting-edge innovations
- Community-driven development
- Provider-managed support and maintenance
- Anthropic: Claude family (text and vision)
- Meta: Llama family (open source)
- Cohere: Command and Embed models
- Mistral AI: Mistral models
- Hugging Face: 1400+ open models
Model Types
Foundation Models
Large-scale models trained on broad datasets:- GPT-4o: Multimodal reasoning and generation
- GPT-4: Advanced language understanding
- Claude: Long-context processing
- Llama 3: Open-source alternative
Reasoning Models
Optimized for complex problem-solving:- Multi-step reasoning
- Mathematical problem solving
- Code generation and analysis
- Logical inference
Small Language Models (SLMs)
Compact models for efficient deployment:- Lower latency and cost
- On-device deployment
- Specialized tasks
- Examples: Phi-3 family
Multimodal Models
Process multiple input types:- Text and images
- Audio and video
- Examples: GPT-4o, Claude
Domain-Specific Models
Trained for particular industries or tasks:- Healthcare and life sciences
- Financial services
- Legal and compliance
- Manufacturing
Deployment Options
Foundry offers multiple ways to deploy and access models:Serverless API Deployment
How it works:- Pay-per-token billing
- Microsoft-managed infrastructure
- No capacity management needed
- Instant scaling
- Development and testing
- Variable workloads
- Getting started quickly
- Cost optimization with sporadic use
Provisioned Throughput
How it works:- Reserved model capacity
- Predictable performance and cost
- Dedicated resources
- Fungible across compatible models
- Production workloads
- High-volume applications
- Latency-sensitive scenarios
- Predictable usage patterns
Managed Compute
How it works:- Deploy to dedicated virtual machines
- Full control over compute resources
- Billed for VM hours
- Supports open-source models
- Custom model deployments
- Open-source models
- Fine-tuned models
- Specific hardware requirements
Batch Deployment
How it works:- Asynchronous processing
- Cost-optimized for large batches
- No real-time inference
- 50% discount vs. standard pricing
- Bulk processing
- Non-time-sensitive workloads
- Data analysis and transformation
- Evaluation and testing
Model Capabilities
Inference Tasks
- Text Generation
- Embeddings
- Vision
- Code
Generate natural language text:
- Chat completions
- Content creation
- Summarization
- Translation
Fine-Tuning Support
Some models support fine-tuning to tailor behavior: Supported models:- GPT-3.5-turbo
- GPT-4o-mini
- Babbage-002, Davinci-002
- Selected open-source models
- Supervised fine-tuning
- Custom dataset training
- Hyperparameter optimization
Choosing the Right Model
Decision Framework
Define Requirements
- What task do you need to accomplish?
- What inputs will the model process?
- What outputs do you expect?
- What are your latency requirements?
Consider Capabilities
- Does the model support your input types?
- Can it handle your context length?
- Does it support necessary features (tool calling, JSON mode)?
Evaluate Performance
- Review benchmark scores
- Test with your specific use case
- Compare multiple models
Assess Cost
- Calculate token costs for expected volume
- Compare serverless vs. provisioned pricing
- Consider fine-tuning costs if needed
Model Comparison
| Model | Strengths | Best For | Context Length |
|---|---|---|---|
| GPT-4o | Strong reasoning, multimodal, tool use | Complex tasks, agents, vision | 128K tokens |
| GPT-4 | Advanced understanding, reasoning | High-quality content, analysis | 128K tokens |
| GPT-3.5-turbo | Fast, cost-effective | Simple tasks, high volume | 16K tokens |
| Claude 3 | Long context, analysis | Document processing, research | 200K tokens |
| Llama 3 | Open source, customizable | Fine-tuning, specialized tasks | 8K tokens |
| Phi-3 | Small, efficient | Edge deployment, low latency | 4K tokens |
Model Features
Tool Calling
Models that support function calling:- GPT-4o, GPT-4, GPT-3.5-turbo
- Claude 3 family
- Gemini Pro
JSON Mode
Ensure valid JSON output:Vision Capabilities
Process images alongside text:Streaming
Receive responses incrementally:Model Lifecycle
Versioning
Models are versioned with release dates:gpt-4o-2024-05-13gpt-4-turbo-2024-04-09gpt-3.5-turbo-0125
Deprecation and Retirement
Models have defined lifecycle phases:- Generally Available: Full support and SLA
- Deprecation Notice: 6-12 months advance warning
- Deprecated: No new deployments allowed
- Retired: Model no longer available
Auto-Update
Some deployments support automatic version updates:Regional Availability
Model availability varies by region:- Check the region support page
- Consider data residency requirements
- Evaluate latency for global applications
Next Steps
Model Deployment
Learn how to deploy models
Region Support
Check model availability by region
Model Overview
Explore the model catalog
Quickstart
Start using models