Overview
Azure OpenAI Service provides REST API access to OpenAI models including GPT-4, GPT-3.5-Turbo, and embeddings through Microsoft’s Azure platform with enterprise-grade security and compliance.Setup
Get Connection Details
Collect:
- API Key (from Keys and Endpoint)
- Instance Name (e.g.,
my-openai-resource) - Deployment Name (the name you gave your deployment)
- API Version (e.g.,
2024-02-01)
Configuration
Basic Parameters
Azure OpenAI API credential containing:
- Azure OpenAI API Key
- Azure OpenAI API Instance Name
- Azure OpenAI API Deployment Name
- Azure OpenAI API Version
The deployed model name in your Azure OpenAI resource. Common deployments:
gpt-4gpt-4-32kgpt-35-turbogpt-35-turbo-16k
Sampling temperature between 0 and 2. Higher values increase randomness
Enable streaming for real-time token generation
Advanced Parameters
Maximum tokens to generate. Varies by model:
- GPT-4: up to 8192
- GPT-4-32k: up to 32768
- GPT-3.5-Turbo: up to 4096
Nucleus sampling parameter (0-1). Alternative to temperature
Reduce repetition of token sequences (-2.0 to 2.0)
Encourage talking about new topics (-2.0 to 2.0)
Request timeout in milliseconds
Vision & Multimodal
Enable image analysis for vision-capable models like GPT-4 Turbo with Vision
Image detail level:
low, high, or autoReasoning Models
Azure OpenAI supports o1 and o3 reasoning models with special parameters.
Enable reasoning mode for o1/o3 deployments
Set reasoning effort:
low, medium, or highGet reasoning summary:
auto, concise, or detailedCustom Configuration
Custom Azure OpenAI endpoint base path (overrides default)
Additional HTTP headers and configuration as JSON
Usage Examples
Basic Azure GPT-4 Setup
GPT-4 Vision
Using Environment Variables
You can configure Azure OpenAI credentials via environment variables for server-wide defaults.
Private Endpoint
Azure vs OpenAI Direct
| Feature | Azure OpenAI | OpenAI Direct |
|---|---|---|
| Data Privacy | Stays in Azure region | Sent to OpenAI |
| Compliance | Azure compliance certifications | OpenAI policies |
| SLA | Azure SLA (99.9%) | OpenAI uptime |
| Pricing | Pay-as-you-go or reserved | Pay-per-token |
| Model Updates | Controlled deployment | Automatic |
| Enterprise Features | VNET, Private Link | Standard |
Best Practices
Security
- Use Managed Identity when possible
- Enable Private Endpoints for production
- Rotate API keys regularly
- Use RBAC for access control
Deployment
- Deploy models in same region as app
- Use provisioned throughput for production
- Monitor quota and limits
- Test in non-production first
Cost Management
- Use reserved capacity for predictable workloads
- Monitor token usage in Azure Portal
- Set up budget alerts
- Use appropriate model tiers
Performance
- Use streaming for better UX
- Implement caching where appropriate
- Monitor latency metrics
- Scale deployments based on demand
Common Issues
Deployment Not Found
Deployment Not Found
Ensure:
- Deployment name matches exactly (case-sensitive)
- Model is successfully deployed in Azure OpenAI Studio
- Instance name is correct
- API version is supported
Quota Exceeded
Quota Exceeded
Azure OpenAI has quota limits per region and subscription:
- Check quota in Azure Portal
- Request quota increase if needed
- Consider multiple deployments
- Use provisioned throughput
Network/CORS Issues
Network/CORS Issues
For private endpoints:
- Verify VNET configuration
- Check Private Link setup
- Ensure DNS resolution
- Verify firewall rules
API Version Mismatch
API Version Mismatch
Different API versions support different features:
- Use latest stable version (2024-02-01+)
- Check Azure OpenAI API changelog
- Update version in credentials
Monitoring & Diagnostics
Azure provides built-in monitoring:- Metrics: Track requests, tokens, latency
- Logs: Enable diagnostic logging
- Alerts: Set up alerts for failures or quota
- Cost Management: Track spending by deployment