What are Compute Targets in Azure Machine Learning?
A compute target is a designated compute resource or environment where you run your training script or host your service deployment. Using compute targets makes it easy to change your compute environment without modifying your code.Compute targets provide scalable, managed compute resources for machine learning workloads, from development to production deployment.
Compute Lifecycle
In a typical model development lifecycle:Training Compute Targets
Compute targets for model training can be reused across multiple training jobs.Azure Machine Learning Compute Options
- Compute Cluster
- Compute Instance
- Serverless Compute
- Attached Compute
Managed multinode clusters for scalable trainingFeatures:
- Single-node or multinode clusters
- Autoscales based on job submission
- Automatic cluster management and job scheduling
- Supports CPU and GPU resources
- Distributed training
- Large dataset processing
- Hyperparameter tuning
- AutoML experiments
Training Compute Compatibility
| Compute Type | AutoML | ML Pipelines | Designer |
|---|---|---|---|
| Compute Cluster | ✓ | ✓ | ✓ |
| Serverless Compute | ✓ | ✓ | ✓ |
| Compute Instance | ✓ (via SDK) | ✓ | ✓ |
| Kubernetes | - | ✓ | ✓ |
| Remote VM | ✓ | ✓ | - |
| Apache Spark | ✓ (SDK local) | ✓ | - |
| Databricks | ✓ (SDK local) | ✓ | - |
Inference Compute Targets
Compute for hosting deployed models and performing inference.Deployment Options
Managed Online Endpoints
Real-time inference with serverless compute
- Automatic scaling
- Fully managed infrastructure
- Built-in monitoring
- No quota consumption
Batch Endpoints
Batch scoring for large datasets
- Process files in parallel
- Scheduled or on-demand
- Cost-effective for bulk inference
- Automatic compute management
Kubernetes Endpoints
On-premises or cloud Kubernetes clusters
- Run anywhere (cloud, edge, on-prem)
- Full infrastructure control
- GPU support
- Custom networking
Azure Container Instances
Development/testing only
- Quick deployment
- No cluster management
- Limited to <48GB RAM
- Small models (<1GB)
Choosing Deployment Compute
- Real-Time (Low Latency)
- Batch Processing
- Kubernetes
Use Managed Online Endpoints when:
- Response time is critical (<1 second)
- Request-response pattern
- Small payloads (fits in HTTP request)
- Need to scale on traffic
Supported VM Series and Sizes
Azure Machine Learning supports select VM series for compute:General Purpose VMs
| Series | Use Case | Compute Support |
|---|---|---|
| Dv3/DSv3 | Balanced CPU-memory | Clusters & Instances |
| Dv2/DSv2 | General workloads | Clusters & Instances |
| DDSv4 | Memory optimized | Clusters & Instances |
GPU-Accelerated VMs
| Series | GPU Architecture | CUDA Version | Use Case |
|---|---|---|---|
| ND-H100-v5 | H100 | 11.0+ | Large-scale training |
| ND-H200-v5 | H200 | 11.0+ | AI supercomputing |
| NDasrA100_v4 | Ampere (A100) | 11.0+ | Deep learning |
| NCasT4_v3 | Turing (T4) | 10.0+ | Inference & training |
| NCv3 | Volta (V100) | 9.0+ | Training |
| NDv2 | Volta (V100) | 9.0+ | Distributed training |
High Performance Compute
| Series | Capabilities | Compute Support |
|---|---|---|
| HBv3 | AMD EPYC | Clusters & Instances |
| HBv2 | AMD EPYC | Clusters & Instances |
| HC | Intel Xeon | Clusters & Instances |
Creating Compute Resources
Cost Optimization
Minimize Idle Costs
Minimize Idle Costs
For Compute Clusters:
- Set
min_instances: 0to scale down when idle - Configure
idle_time_before_scale_down(seconds) - Use low-priority VMs for non-critical workloads
- Enable idle shutdown after period of inactivity
- Stop instances when not in use
- Use schedules to auto-start/stop
Use Serverless Compute
Use Serverless Compute
Benefits:
- No quota management
- Automatic scaling to zero
- Pay only for actual usage
- No idle compute costs
Serverless compute is billed per second of actual compute time.
Select Appropriate VM Sizes
Select Appropriate VM Sizes
Scale Up Strategy:
- Start with 150% of required RAM
- Profile performance
- Adjust size based on metrics
- Increase instance count for throughput
- Use autoscaling for variable loads
Use Spot VMs
Use Spot VMs
For fault-tolerant workloads:Savings: Up to 80% vs dedicated VMsTrade-off: Can be evicted when Azure needs capacity
Compute Isolation
Isolated VM sizes dedicated to a single customer:Standard_M128ms- Memory optimizedStandard_F72s_v2- Compute optimizedStandard_NC24s_v3- GPU acceleratedStandard_NC24rs_v3- RDMA capable GPU
Use isolated compute for compliance and regulatory requirements requiring physical isolation.
Monitoring Compute Usage
Track compute metrics in Azure ML studio:- Node allocation: Current vs max instances
- Job queue: Pending jobs waiting for compute
- Run duration: Time spent on compute
- Resource utilization: CPU, GPU, memory usage
Next Steps
Create Compute Instance
Set up your development environment
Distributed Training
Scale training across multiple GPUs
Deploy Models
Deploy to inference endpoints
Manage Quotas
Request and manage compute quotas