Skip to main content

What are Compute Targets in Azure Machine Learning?

A compute target is a designated compute resource or environment where you run your training script or host your service deployment. Using compute targets makes it easy to change your compute environment without modifying your code.
Compute targets provide scalable, managed compute resources for machine learning workloads, from development to production deployment.

Compute Lifecycle

In a typical model development lifecycle:
1

Local Development

Start with local environment or cloud-based VM for experimentation
2

Scaled Training

Move to compute clusters for larger datasets and distributed training
3

Production Deployment

Deploy models to inference endpoints with dedicated compute

Training Compute Targets

Compute targets for model training can be reused across multiple training jobs.

Azure Machine Learning Compute Options

Managed multinode clusters for scalable trainingFeatures:
  • Single-node or multinode clusters
  • Autoscales based on job submission
  • Automatic cluster management and job scheduling
  • Supports CPU and GPU resources
Use Cases:
  • Distributed training
  • Large dataset processing
  • Hyperparameter tuning
  • AutoML experiments
from azure.ai.ml.entities import AmlCompute

cluster = AmlCompute(
    name="cpu-cluster",
    type="amlcompute",
    size="STANDARD_DS3_v2",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=120,
    tier="Dedicated"
)
ml_client.compute.begin_create_or_update(cluster)

Training Compute Compatibility

Compute TypeAutoMLML PipelinesDesigner
Compute Cluster
Serverless Compute
Compute Instance✓ (via SDK)
Kubernetes-
Remote VM-
Apache Spark✓ (SDK local)-
Databricks✓ (SDK local)-

Inference Compute Targets

Compute for hosting deployed models and performing inference.

Deployment Options

Managed Online Endpoints

Real-time inference with serverless compute
  • Automatic scaling
  • Fully managed infrastructure
  • Built-in monitoring
  • No quota consumption

Batch Endpoints

Batch scoring for large datasets
  • Process files in parallel
  • Scheduled or on-demand
  • Cost-effective for bulk inference
  • Automatic compute management

Kubernetes Endpoints

On-premises or cloud Kubernetes clusters
  • Run anywhere (cloud, edge, on-prem)
  • Full infrastructure control
  • GPU support
  • Custom networking

Azure Container Instances

Development/testing only
  • Quick deployment
  • No cluster management
  • Limited to <48GB RAM
  • Small models (<1GB)

Choosing Deployment Compute

Use Managed Online Endpoints when:
  • Response time is critical (<1 second)
  • Request-response pattern
  • Small payloads (fits in HTTP request)
  • Need to scale on traffic

Supported VM Series and Sizes

Azure Machine Learning supports select VM series for compute:

General Purpose VMs

SeriesUse CaseCompute Support
Dv3/DSv3Balanced CPU-memoryClusters & Instances
Dv2/DSv2General workloadsClusters & Instances
DDSv4Memory optimizedClusters & Instances

GPU-Accelerated VMs

SeriesGPU ArchitectureCUDA VersionUse Case
ND-H100-v5H10011.0+Large-scale training
ND-H200-v5H20011.0+AI supercomputing
NDasrA100_v4Ampere (A100)11.0+Deep learning
NCasT4_v3Turing (T4)10.0+Inference & training
NCv3Volta (V100)9.0+Training
NDv2Volta (V100)9.0+Distributed training

High Performance Compute

SeriesCapabilitiesCompute Support
HBv3AMD EPYCClusters & Instances
HBv2AMD EPYCClusters & Instances
HCIntel XeonClusters & Instances
CUDA CompatibilityEnsure your CUDA version is compatible with:
  1. GPU architecture
  2. ML framework version (PyTorch, TensorFlow)
For PyTorch: Check compatibility matrix

Creating Compute Resources

from azure.ai.ml import MLClient
from azure.ai.ml.entities import AmlCompute
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="<subscription-id>",
    resource_group="<resource-group>",
    workspace_name="<workspace>"
)

# Create compute cluster
cluster = AmlCompute(
    name="gpu-cluster",
    type="amlcompute",
    size="STANDARD_NC6",
    min_instances=0,
    max_instances=4,
    idle_time_before_scale_down=300,
    tier="Dedicated"
)

ml_client.compute.begin_create_or_update(cluster).result()

Cost Optimization

For Compute Clusters:
  • Set min_instances: 0 to scale down when idle
  • Configure idle_time_before_scale_down (seconds)
  • Use low-priority VMs for non-critical workloads
For Compute Instances:
  • Enable idle shutdown after period of inactivity
  • Stop instances when not in use
  • Use schedules to auto-start/stop
instance = ComputeInstance(
    name="dev-instance",
    size="STANDARD_DS3_v2",
    idle_time_before_shutdown_minutes=30,
    schedules=[
        ComputeStartStopSchedule(
            trigger=RecurrenceTrigger(
                frequency="day",
                interval=1,
                schedule=RecurrencePattern(
                    hours=[9],
                    minutes=[0]
                )
            ),
            action="start"
        )
    ]
)
Benefits:
  • No quota management
  • Automatic scaling to zero
  • Pay only for actual usage
  • No idle compute costs
Serverless compute is billed per second of actual compute time.
Scale Up Strategy:
  1. Start with 150% of required RAM
  2. Profile performance
  3. Adjust size based on metrics
Then Scale Out:
  • Increase instance count for throughput
  • Use autoscaling for variable loads
For fault-tolerant workloads:
cluster = AmlCompute(
    name="spot-cluster",
    size="STANDARD_DS3_v2",
    tier="LowPriority",  # Spot pricing
    min_instances=0,
    max_instances=10
)
Savings: Up to 80% vs dedicated VMsTrade-off: Can be evicted when Azure needs capacity

Compute Isolation

Isolated VM sizes dedicated to a single customer:
  • Standard_M128ms - Memory optimized
  • Standard_F72s_v2 - Compute optimized
  • Standard_NC24s_v3 - GPU accelerated
  • Standard_NC24rs_v3 - RDMA capable GPU
Use isolated compute for compliance and regulatory requirements requiring physical isolation.

Monitoring Compute Usage

Track compute metrics in Azure ML studio:
  • Node allocation: Current vs max instances
  • Job queue: Pending jobs waiting for compute
  • Run duration: Time spent on compute
  • Resource utilization: CPU, GPU, memory usage
# Get compute details
compute = ml_client.compute.get("my-cluster")

print(f"Provisioning state: {compute.provisioning_state}")
print(f"Current nodes: {compute.current_node_count}")
print(f"Target nodes: {compute.target_node_count}")

Next Steps

Create Compute Instance

Set up your development environment

Distributed Training

Scale training across multiple GPUs

Deploy Models

Deploy to inference endpoints

Manage Quotas

Request and manage compute quotas

Build docs developers (and LLMs) love