What are Compute Targets in Azure Machine Learning?

A compute target is a designated compute resource or environment where you run your training script or host your service deployment. Using compute targets makes it easy to change your compute environment without modifying your code.

Compute targets provide scalable, managed compute resources for machine learning workloads, from development to production deployment.

Compute Lifecycle

In a typical model development lifecycle:

Local Development

Start with local environment or cloud-based VM for experimentation

Scaled Training

Move to compute clusters for larger datasets and distributed training

Production Deployment

Deploy models to inference endpoints with dedicated compute

Training Compute Targets

Compute targets for model training can be reused across multiple training jobs.

Azure Machine Learning Compute Options

Compute Cluster
Compute Instance
Serverless Compute
Attached Compute

Managed multinode clusters for scalable trainingFeatures:

Single-node or multinode clusters
Autoscales based on job submission
Automatic cluster management and job scheduling
Supports CPU and GPU resources

Use Cases:

Distributed training
Large dataset processing
Hyperparameter tuning
AutoML experiments

from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="STANDARD_DS3_v2",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=120,
    tier="Dedicated"
)
ml_client.compute.begin_create_or_update(cluster)

Fully managed development workstation in the cloudFeatures:

Preconfigured with ML frameworks
Jupyter, JupyterLab, and VS Code integration
SSH access for development
Similar to a cloud-based virtual machine

Use Cases:

Interactive development
Notebook authoring
Small-scale training
Testing and debugging

from azure.ai.ml.entities import ComputeInstance

instance = ComputeInstance(
    name="my-compute-instance",
    size="STANDARD_DS3_v2",
    idle_time_before_shutdown=30
)
ml_client.compute.begin_create_or_update(instance)

Training Compute Compatibility

Compute Type	AutoML	ML Pipelines	Designer
Compute Cluster	✓	✓	✓
Serverless Compute	✓	✓	✓
Compute Instance	✓ (via SDK)	✓	✓
Kubernetes	-	✓	✓
Remote VM	✓	✓	-
Apache Spark	✓ (SDK local)	✓	-
Databricks	✓ (SDK local)	✓	-

Inference Compute Targets

Compute for hosting deployed models and performing inference.

Deployment Options

Managed Online Endpoints

Real-time inference with serverless compute

Automatic scaling
Fully managed infrastructure
Built-in monitoring
No quota consumption

Batch Endpoints

Batch scoring for large datasets

Process files in parallel
Scheduled or on-demand
Cost-effective for bulk inference
Automatic compute management

Kubernetes Endpoints

On-premises or cloud Kubernetes clusters

Run anywhere (cloud, edge, on-prem)
Full infrastructure control
GPU support
Custom networking

Azure Container Instances

Development/testing only

Quick deployment
No cluster management
Limited to <48GB RAM
Small models (<1GB)

Choosing Deployment Compute

Real-Time (Low Latency)
Batch Processing
Kubernetes

Use Managed Online Endpoints when:

Response time is critical (<1 second)
Request-response pattern
Small payloads (fits in HTTP request)
Need to scale on traffic

Supported VM Series and Sizes

Azure Machine Learning supports select VM series for compute:

General Purpose VMs

Series	Use Case	Compute Support
Dv3/DSv3	Balanced CPU-memory	Clusters & Instances
Dv2/DSv2	General workloads	Clusters & Instances
DDSv4	Memory optimized	Clusters & Instances

GPU-Accelerated VMs

Series	GPU Architecture	CUDA Version	Use Case
ND-H100-v5	H100	11.0+	Large-scale training
ND-H200-v5	H200	11.0+	AI supercomputing
NDasrA100_v4	Ampere (A100)	11.0+	Deep learning
NCasT4_v3	Turing (T4)	10.0+	Inference & training
NCv3	Volta (V100)	9.0+	Training
NDv2	Volta (V100)	9.0+	Distributed training

High Performance Compute

Series	Capabilities	Compute Support
HBv3	AMD EPYC	Clusters & Instances
HBv2	AMD EPYC	Clusters & Instances
HC	Intel Xeon	Clusters & Instances

CUDA CompatibilityEnsure your CUDA version is compatible with:

GPU architecture
ML framework version (PyTorch, TensorFlow)

For PyTorch: Check compatibility matrix

Creating Compute Resources

from azure.ai.ml import MLClient
from azure.ai.ml.entities import AmlCompute
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-id>",
    resource_group="<resource-group>",
    workspace_name="<workspace>"
)

# Create compute cluster
cluster = AmlCompute(
    name="gpu-cluster",
    type="amlcompute",
    size="STANDARD_NC6",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=300,
    tier="Dedicated"
)

ml_client.compute.begin_create_or_update(cluster).result()

Cost Optimization

Minimize Idle Costs

For Compute Clusters:

Set min_instances: 0 to scale down when idle
Configure idle_time_before_scale_down (seconds)
Use low-priority VMs for non-critical workloads

For Compute Instances:

Enable idle shutdown after period of inactivity
Stop instances when not in use
Use schedules to auto-start/stop

instance = ComputeInstance(
    name="dev-instance",
    size="STANDARD_DS3_v2",
    idle_time_before_shutdown_minutes=30,
    schedules=[
        ComputeStartStopSchedule(
            trigger=RecurrenceTrigger(
                frequency="day",
                interval=1,
                schedule=RecurrencePattern(
                    hours=[9],
                    minutes=[0]
                )
            ),
            action="start"
        )
    ]
)

Use Serverless Compute

Benefits:

No quota management
Automatic scaling to zero
Pay only for actual usage
No idle compute costs

Serverless compute is billed per second of actual compute time.

Select Appropriate VM Sizes

Scale Up Strategy:

Start with 150% of required RAM
Profile performance
Adjust size based on metrics

Then Scale Out:

Increase instance count for throughput
Use autoscaling for variable loads

Use Spot VMs

For fault-tolerant workloads:

cluster = AmlCompute(
    name="spot-cluster",
    size="STANDARD_DS3_v2",
    tier="LowPriority",  # Spot pricing
    min_instances=0,
    max_instances=10
)

Savings: Up to 80% vs dedicated VMsTrade-off: Can be evicted when Azure needs capacity

Compute Isolation

Isolated VM sizes dedicated to a single customer:

Standard_M128ms - Memory optimized
Standard_F72s_v2 - Compute optimized
Standard_NC24s_v3 - GPU accelerated
Standard_NC24rs_v3 - RDMA capable GPU

Use isolated compute for compliance and regulatory requirements requiring physical isolation.

Monitoring Compute Usage

Track compute metrics in Azure ML studio:

Node allocation: Current vs max instances
Job queue: Pending jobs waiting for compute
Run duration: Time spent on compute
Resource utilization: CPU, GPU, memory usage

# Get compute details
compute = ml_client.compute.get("my-cluster")

print(f"Provisioning state: {compute.provisioning_state}")
print(f"Current nodes: {compute.current_node_count}")
print(f"Target nodes: {compute.target_node_count}")

Next Steps

Create Compute Instance

Set up your development environment

Distributed Training

Scale training across multiple GPUs

Deploy Models

Deploy to inference endpoints

Manage Quotas

Request and manage compute quotas

Getting Started

Core Concepts

Training

Deployment

Component Reference

Compute Targets in Azure Machine Learning

What are Compute Targets in Azure Machine Learning?

Compute Lifecycle

Training Compute Targets

Azure Machine Learning Compute Options

Training Compute Compatibility

Inference Compute Targets

Deployment Options

Managed Online Endpoints

Batch Endpoints

Kubernetes Endpoints

Azure Container Instances

Choosing Deployment Compute

Supported VM Series and Sizes

General Purpose VMs

GPU-Accelerated VMs

High Performance Compute

Creating Compute Resources

Cost Optimization

Compute Isolation

Monitoring Compute Usage

Next Steps

Create Compute Instance

Distributed Training

Deploy Models

Manage Quotas

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Deployment

Component Reference

​What are Compute Targets in Azure Machine Learning?

​Compute Lifecycle

​Training Compute Targets

​Azure Machine Learning Compute Options

​Training Compute Compatibility

​Inference Compute Targets

​Deployment Options

Managed Online Endpoints

Batch Endpoints

Kubernetes Endpoints

Azure Container Instances

​Choosing Deployment Compute

​Supported VM Series and Sizes

​General Purpose VMs

​GPU-Accelerated VMs

​High Performance Compute

​Creating Compute Resources

​Cost Optimization

​Compute Isolation

​Monitoring Compute Usage

​Next Steps

Create Compute Instance

Distributed Training

Deploy Models

Manage Quotas

Build docs developers (and LLMs) love

What are Compute Targets in Azure Machine Learning?

Compute Lifecycle

Training Compute Targets

Azure Machine Learning Compute Options

Training Compute Compatibility

Inference Compute Targets

Deployment Options

Choosing Deployment Compute

Supported VM Series and Sizes

General Purpose VMs

GPU-Accelerated VMs

High Performance Compute

Creating Compute Resources

Cost Optimization

Compute Isolation

Monitoring Compute Usage

Next Steps