Azure Machine Learning

Azure Machine Learning is a cloud service that accelerates and manages the machine learning (ML) project lifecycle. ML professionals, data scientists, and engineers use it in their daily workflows to train and deploy models and manage machine learning operations (MLOps).

Train Models

Train custom models or use pre-built models from open-source frameworks

Deploy at Scale

Deploy models as managed endpoints for real-time or batch scoring

MLOps

Manage the complete model lifecycle with enterprise operations

Collaborate

Team workflows with shared notebooks, compute, and environments

What is Azure Machine Learning?

You can create a model in Machine Learning or use a model built from an open-source platform, such as PyTorch, TensorFlow, or scikit-learn. MLOps tools help you monitor, retrain, and redeploy models throughout their lifecycle.

Free Trial! If you don’t have an Azure subscription, create a free account to try Azure Machine Learning. You get credits to spend on Azure services.

Who is it For?

Azure Machine Learning is designed for individuals and teams implementing MLOps within their organization to bring ML models into production in a secure and auditable environment:

Data Scientists
ML Engineers
Application Developers
Platform Developers

For Model Development

Jupyter notebooks in the cloud
Experiment tracking and versioning
Automated ML for rapid prototyping
Model catalog with LLMs and foundation models
Visual designer for no-code ML

Core Capabilities

Productivity for Everyone

ML projects often require a team with varied skills. Machine Learning provides tools for everyone:

Collaborate

Share notebooks, compute resources, serverless compute, data, and environments with your team.

Ensure Fairness

Develop models with fairness and explainability, tracking and auditability for lineage and compliance.

Deploy Efficiently

Deploy ML models quickly at scale and manage them with MLOps governance.

Run Anywhere

Execute machine learning workloads anywhere with built-in governance, security, and compliance.

Cross-Compatible Platform

Use your preferred tools to get the job done:

Azure ML Studio

Web-based UI for no-code and code-first experiences

Python SDK (v2)

Comprehensive Python library for ML workflows

Azure CLI (v2)

Command-line interface for automation

REST APIs

Azure Resource Manager APIs for integration

Azure Machine Learning Studio

The studio offers multiple authoring experiences:

Notebooks

Write and run code in managed Jupyter Notebook servers directly integrated in the studio:

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Model

# Connect to workspace
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-subscription-id",
    resource_group_name="your-resource-group",
    workspace_name="your-workspace"
)

# Register a model
model = Model(
    path="./model",
    name="my-model",
    description="Classification model",
    type="custom_model"
)

ml_client.models.create_or_update(model)

Or open notebooks in VS Code on the web or desktop.

Designer

Drag-and-drop interface to create ML pipelines without writing code:

Visual pipeline creation
Pre-built components
Custom component support
Real-time validation
One-click deployment

Automated Machine Learning

Let Azure ML automatically find the best model for your data:

Classification
Regression
Forecasting
Computer Vision

Predict categories:

Binary classification
Multi-class classification
Automatic feature engineering
Model explainability

from azure.ai.ml import automl

# Configure AutoML
classification_job = automl.classification(
    training_data=training_data,
    target_column_name="target",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True
)

# Submit job
returned_job = ml_client.jobs.create_or_update(classification_job)

Data Labeling

Efficiently coordinate labeling projects:

Image Labeling: Bounding boxes, polygons, classification
Text Labeling: Named entity recognition, text classification
ML-Assisted Labeling: Speed up labeling with pre-trained models
Consensus: Multiple labelers for quality assurance

Work with LLMs and Generative AI

Azure Machine Learning includes tools for building Generative AI applications:

Model Catalog

The model catalog features hundreds of models from leading providers:

Azure OpenAI

GPT-4, GPT-4 Turbo
GPT-3.5 Turbo
Embeddings
DALL-E

Open Source

Llama 3 (Meta)
Mistral models
Falcon
BERT variants

Specialized

Cohere models
NVIDIA models
HuggingFace models
Custom fine-tuned models

Prompt Flow

Streamline the development cycle of LLM applications:

from promptflow import PFClient
from promptflow.entities import AzureOpenAIConnection

pf = PFClient()

# Create a flow
flow = pf.flows.create_or_update(
    flow="./my-flow",
    display_name="QA Flow"
)

# Test the flow
result = pf.flows.test(
    flow=flow,
    inputs={"question": "What is Azure ML?"}
)

print(result)

Prompt Flow Features:

Visual flow designer
Prompt templates and variants
Built-in LLM tools
Evaluation metrics
Deployment to endpoints
Integration with AI Search

Use Microsoft Foundry for the latest agent-based LLM capabilities. Azure ML is recommended for custom model training and MLOps.

Training Models

Open and Interoperable

Use models created in common Python frameworks:

PyTorch
TensorFlow
Scikit-learn

import torch
import torch.nn as nn
from azure.ai.ml import command

job = command(
    code="./src",
    command="python train.py",
    environment="azureml:pytorch-env:1",
    compute="gpu-cluster",
    distribution={
        "type": "PyTorch",
        "process_count_per_instance": 4
    }
)

import tensorflow as tf
from azure.ai.ml import command

job = command(
    code="./src",
    command="python train.py",
    environment="azureml:tensorflow-env:1",
    compute="gpu-cluster",
    distribution={
        "type": "TensorFlow",
        "worker_count": 4
    }
)

from sklearn.ensemble import RandomForestClassifier
from azure.ai.ml import command

job = command(
    code="./src",
    command="python train.py",
    environment="azureml:sklearn-env:1",
    compute="cpu-cluster"
)

Also Supported:

XGBoost
LightGBM
R and .NET
Custom frameworks

Distributed Training

Scale training with multinode distributed computing:

from azure.ai.ml import command
from azure.ai.ml.entities import MpiDistribution

job = command(
    code="./src",
    command="python train.py --epochs 100 --batch-size 64",
    environment="azureml:pytorch-gpu:1",
    compute="gpu-cluster",
    distribution=MpiDistribution(
        process_count_per_instance=4
    ),
    resources={
        "instance_count": 4,
        "instance_type": "Standard_NC24s_v3"
    }
)

Distribution Strategies:

PyTorch: Distributed Data Parallel (DDP)
TensorFlow: MultiWorkerMirroredStrategy
MPI: Horovod for custom frameworks
Spark: Apache Spark on Synapse clusters

Hyperparameter Tuning

Automate hyperparameter optimization:

from azure.ai.ml.sweep import Choice, Uniform

sweep_job = command(
    code="./src",
    command="python train.py --lr ${{inputs.learning_rate}} --batch ${{inputs.batch_size}}",
    inputs={
        "learning_rate": Uniform(min_value=0.0001, max_value=0.1),
        "batch_size": Choice([16, 32, 64, 128])
    },
    compute="gpu-cluster",
    environment="azureml:pytorch-env:1"
)

sweep_job.set_limits(max_total_trials=20, max_concurrent_trials=4)
sweep_job.set_objective(goal="minimize", primary_metric="loss")

Deploying Models

Bring models into production with managed endpoints:

Real-Time Endpoints

Low-latency inference for online scenarios:

from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="my-endpoint",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Create deployment
deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="my-endpoint",
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(deployment)

# Set traffic
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint)

Batch Endpoints

Process large volumes asynchronously:

from azure.ai.ml.entities import BatchEndpoint, BatchDeployment

# Create batch endpoint
batch_endpoint = BatchEndpoint(
    name="my-batch-endpoint",
    description="Batch scoring endpoint"
)
ml_client.batch_endpoints.begin_create_or_update(batch_endpoint)

# Create deployment
batch_deployment = BatchDeployment(
    name="default",
    endpoint_name="my-batch-endpoint",
    model=model,
    compute="cpu-cluster",
    instance_count=5,
    max_concurrency_per_instance=2,
    mini_batch_size=10
)
ml_client.batch_deployments.begin_create_or_update(batch_deployment)

# Invoke batch job
job = ml_client.batch_endpoints.invoke(
    endpoint_name="my-batch-endpoint",
    input=Input(path="azureml:my-data:1")
)

Blue-Green Deployments

Safely roll out new model versions:

# Deploy new version (green)
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name="my-endpoint",
    model=new_model,
    instance_type="Standard_DS3_v2",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(green_deployment)

# Gradually shift traffic
endpoint.traffic = {"blue": 90, "green": 10}  # Test with 10%
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Full cutover
endpoint.traffic = {"green": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint)

MLOps: DevOps for ML

Manage the complete model lifecycle with enterprise operations:

ML Pipelines

Create reproducible workflows:

from azure.ai.ml import dsl, Input, Output

@dsl.pipeline(compute="cpu-cluster")
def training_pipeline(input_data):
    # Data prep component
    prep_data = data_prep_component(
        raw_data=input_data
    )
    
    # Train component
    train_model = train_component(
        training_data=prep_data.outputs.processed_data
    )
    
    # Evaluate component
    evaluate = evaluate_component(
        model=train_model.outputs.model,
        test_data=prep_data.outputs.test_data
    )
    
    return {
        "model": train_model.outputs.model,
        "metrics": evaluate.outputs.metrics
    }

# Create and run pipeline
pipeline = training_pipeline(
    input_data=Input(path="azureml:training-data:1")
)
pipeline_job = ml_client.jobs.create_or_update(pipeline)

Model Registry

Version and track all models:

# Register model
model = Model(
    path="./outputs/model",
    name="fraud-detector",
    version="2",
    description="Updated model with better recall",
    tags={"task": "classification", "framework": "sklearn"},
    properties={"accuracy": "0.95", "recall": "0.92"}
)
registered_model = ml_client.models.create_or_update(model)

# List versions
models = ml_client.models.list(name="fraud-detector")
for m in models:
    print(f"Version {m.version}: {m.properties}")

Monitoring and Logging

Track model performance in production:

# Enable data collection
deployment = ManagedOnlineDeployment(
    name="production",
    endpoint_name="my-endpoint",
    model=model,
    data_collector={
        "collections": {
            "model_inputs": {"enabled": True},
            "model_outputs": {"enabled": True}
        }
    }
)

# Monitor metrics
from azure.monitor.query import MetricsQueryClient

metrics_client = MetricsQueryClient(credential)
response = metrics_client.query_resource(
    resource_id=endpoint_resource_id,
    metric_names=["RequestLatency", "RequestsPerMinute"],
    timespan=timedelta(hours=1)
)

CI/CD Integration

Integrate with Azure DevOps or GitHub Actions:

# .github/workflows/train-deploy.yml
name: Train and Deploy Model

on:
  push:
    branches: [main]

jobs:
  train-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      
      - name: Install Azure ML CLI
        run: az extension add -n ml
      
      - name: Train Model
        run: |
          az ml job create --file train-job.yml \
            --workspace-name ${{ secrets.WORKSPACE_NAME }} \
            --resource-group ${{ secrets.RESOURCE_GROUP }}
      
      - name: Deploy Model
        run: |
          az ml online-deployment create --file deployment.yml \
            --workspace-name ${{ secrets.WORKSPACE_NAME }} \
            --resource-group ${{ secrets.RESOURCE_GROUP }}

Enterprise Integration

Azure Machine Learning integrates with the Azure ecosystem:

Azure Synapse

Process and stream data with Spark

Azure Arc

Run Azure services in Kubernetes

Azure Storage

Store training data and models

Azure App Service

Deploy ML-powered web apps

Microsoft Purview

Data governance and cataloging

Azure Key Vault

Secure secrets management

Security and Compliance

Network Security
Identity & Access
Data Protection

Azure Virtual Networks
Private endpoints
Network security groups
Firewall rules
VPN gateway support

Getting Started

Create a Workspace

Create Compute

Set up a compute instance for development or clusters for training.

Explore Samples

Browse sample notebooks to learn best practices.

Train Your Model

Submit training jobs using the SDK, CLI, or studio.

Deploy

Create managed endpoints for real-time or batch inference.

Resources

Quickstart

Get started in minutes

Tutorials

Step-by-step guides

Python SDK Docs

Complete SDK reference

GitHub Samples

Code examples and templates

Azure Machine Learning doesn’t store or process your data outside of the region where you deploy your workspace.

Get Started

Azure AI Services

​Azure Machine Learning

Train Models

Deploy at Scale

MLOps

Collaborate

​What is Azure Machine Learning?

​Who is it For?

​Core Capabilities

​Productivity for Everyone

​Cross-Compatible Platform

Azure ML Studio

Python SDK (v2)

Azure CLI (v2)

REST APIs

​Azure Machine Learning Studio

​Notebooks

​Designer

​Automated Machine Learning

​Data Labeling

​Work with LLMs and Generative AI

​Model Catalog

Azure OpenAI

Open Source

Specialized

​Prompt Flow

​Training Models

​Open and Interoperable

​Distributed Training

​Hyperparameter Tuning

​Deploying Models

​Real-Time Endpoints

​Batch Endpoints

​Blue-Green Deployments

​MLOps: DevOps for ML

​ML Pipelines

​Model Registry

​Monitoring and Logging

​CI/CD Integration

​Enterprise Integration

Azure Synapse

Azure Arc

Azure Storage

Azure App Service

Microsoft Purview

Azure Key Vault

​Security and Compliance

​Getting Started

​Resources

Quickstart

Tutorials

Python SDK Docs

GitHub Samples

Build docs developers (and LLMs) love

Azure Machine Learning

What is Azure Machine Learning?

Who is it For?

Core Capabilities

Productivity for Everyone

Cross-Compatible Platform

Azure Machine Learning Studio

Notebooks

Designer

Automated Machine Learning

Data Labeling

Work with LLMs and Generative AI

Model Catalog

Prompt Flow

Training Models

Open and Interoperable

Distributed Training

Hyperparameter Tuning

Deploying Models

Real-Time Endpoints

Batch Endpoints

Blue-Green Deployments

MLOps: DevOps for ML

ML Pipelines

Model Registry

Monitoring and Logging

CI/CD Integration

Enterprise Integration

Security and Compliance

Getting Started

Resources