Skip to main content

Azure Machine Learning

Azure Machine Learning is a cloud service that accelerates and manages the machine learning (ML) project lifecycle. ML professionals, data scientists, and engineers use it in their daily workflows to train and deploy models and manage machine learning operations (MLOps).

Train Models

Train custom models or use pre-built models from open-source frameworks

Deploy at Scale

Deploy models as managed endpoints for real-time or batch scoring

MLOps

Manage the complete model lifecycle with enterprise operations

Collaborate

Team workflows with shared notebooks, compute, and environments

What is Azure Machine Learning?

You can create a model in Machine Learning or use a model built from an open-source platform, such as PyTorch, TensorFlow, or scikit-learn. MLOps tools help you monitor, retrain, and redeploy models throughout their lifecycle.
Free Trial! If you don’t have an Azure subscription, create a free account to try Azure Machine Learning. You get credits to spend on Azure services.

Who is it For?

Azure Machine Learning is designed for individuals and teams implementing MLOps within their organization to bring ML models into production in a secure and auditable environment:
For Model Development
  • Jupyter notebooks in the cloud
  • Experiment tracking and versioning
  • Automated ML for rapid prototyping
  • Model catalog with LLMs and foundation models
  • Visual designer for no-code ML

Core Capabilities

Productivity for Everyone

ML projects often require a team with varied skills. Machine Learning provides tools for everyone:
1

Collaborate

Share notebooks, compute resources, serverless compute, data, and environments with your team.
2

Ensure Fairness

Develop models with fairness and explainability, tracking and auditability for lineage and compliance.
3

Deploy Efficiently

Deploy ML models quickly at scale and manage them with MLOps governance.
4

Run Anywhere

Execute machine learning workloads anywhere with built-in governance, security, and compliance.

Cross-Compatible Platform

Use your preferred tools to get the job done:

Azure ML Studio

Web-based UI for no-code and code-first experiences

Python SDK (v2)

Comprehensive Python library for ML workflows

Azure CLI (v2)

Command-line interface for automation

REST APIs

Azure Resource Manager APIs for integration

Azure Machine Learning Studio

The studio offers multiple authoring experiences:

Notebooks

Write and run code in managed Jupyter Notebook servers directly integrated in the studio:
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Model

# Connect to workspace
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-subscription-id",
    resource_group_name="your-resource-group",
    workspace_name="your-workspace"
)

# Register a model
model = Model(
    path="./model",
    name="my-model",
    description="Classification model",
    type="custom_model"
)

ml_client.models.create_or_update(model)
Or open notebooks in VS Code on the web or desktop.

Designer

Drag-and-drop interface to create ML pipelines without writing code:
  • Visual pipeline creation
  • Pre-built components
  • Custom component support
  • Real-time validation
  • One-click deployment

Automated Machine Learning

Let Azure ML automatically find the best model for your data:
Predict categories:
  • Binary classification
  • Multi-class classification
  • Automatic feature engineering
  • Model explainability
from azure.ai.ml import automl

# Configure AutoML
classification_job = automl.classification(
    training_data=training_data,
    target_column_name="target",
    primary_metric="accuracy",
    n_cross_validations=5,
    enable_model_explainability=True
)

# Submit job
returned_job = ml_client.jobs.create_or_update(classification_job)

Data Labeling

Efficiently coordinate labeling projects:
  • Image Labeling: Bounding boxes, polygons, classification
  • Text Labeling: Named entity recognition, text classification
  • ML-Assisted Labeling: Speed up labeling with pre-trained models
  • Consensus: Multiple labelers for quality assurance

Work with LLMs and Generative AI

Azure Machine Learning includes tools for building Generative AI applications:

Model Catalog

The model catalog features hundreds of models from leading providers:

Azure OpenAI

  • GPT-4, GPT-4 Turbo
  • GPT-3.5 Turbo
  • Embeddings
  • DALL-E

Open Source

  • Llama 3 (Meta)
  • Mistral models
  • Falcon
  • BERT variants

Specialized

  • Cohere models
  • NVIDIA models
  • HuggingFace models
  • Custom fine-tuned models

Prompt Flow

Streamline the development cycle of LLM applications:
from promptflow import PFClient
from promptflow.entities import AzureOpenAIConnection

pf = PFClient()

# Create a flow
flow = pf.flows.create_or_update(
    flow="./my-flow",
    display_name="QA Flow"
)

# Test the flow
result = pf.flows.test(
    flow=flow,
    inputs={"question": "What is Azure ML?"}
)

print(result)
Prompt Flow Features:
  • Visual flow designer
  • Prompt templates and variants
  • Built-in LLM tools
  • Evaluation metrics
  • Deployment to endpoints
  • Integration with AI Search
Use Microsoft Foundry for the latest agent-based LLM capabilities. Azure ML is recommended for custom model training and MLOps.

Training Models

Open and Interoperable

Use models created in common Python frameworks:
import torch
import torch.nn as nn
from azure.ai.ml import command

job = command(
    code="./src",
    command="python train.py",
    environment="azureml:pytorch-env:1",
    compute="gpu-cluster",
    distribution={
        "type": "PyTorch",
        "process_count_per_instance": 4
    }
)
Also Supported:
  • XGBoost
  • LightGBM
  • R and .NET
  • Custom frameworks

Distributed Training

Scale training with multinode distributed computing:
from azure.ai.ml import command
from azure.ai.ml.entities import MpiDistribution

job = command(
    code="./src",
    command="python train.py --epochs 100 --batch-size 64",
    environment="azureml:pytorch-gpu:1",
    compute="gpu-cluster",
    distribution=MpiDistribution(
        process_count_per_instance=4
    ),
    resources={
        "instance_count": 4,
        "instance_type": "Standard_NC24s_v3"
    }
)
Distribution Strategies:
  • PyTorch: Distributed Data Parallel (DDP)
  • TensorFlow: MultiWorkerMirroredStrategy
  • MPI: Horovod for custom frameworks
  • Spark: Apache Spark on Synapse clusters

Hyperparameter Tuning

Automate hyperparameter optimization:
from azure.ai.ml.sweep import Choice, Uniform

sweep_job = command(
    code="./src",
    command="python train.py --lr ${{inputs.learning_rate}} --batch ${{inputs.batch_size}}",
    inputs={
        "learning_rate": Uniform(min_value=0.0001, max_value=0.1),
        "batch_size": Choice([16, 32, 64, 128])
    },
    compute="gpu-cluster",
    environment="azureml:pytorch-env:1"
)

sweep_job.set_limits(max_total_trials=20, max_concurrent_trials=4)
sweep_job.set_objective(goal="minimize", primary_metric="loss")

Deploying Models

Bring models into production with managed endpoints:

Real-Time Endpoints

Low-latency inference for online scenarios:
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="my-endpoint",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Create deployment
deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="my-endpoint",
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(deployment)

# Set traffic
endpoint.traffic = {"blue": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint)

Batch Endpoints

Process large volumes asynchronously:
from azure.ai.ml.entities import BatchEndpoint, BatchDeployment

# Create batch endpoint
batch_endpoint = BatchEndpoint(
    name="my-batch-endpoint",
    description="Batch scoring endpoint"
)
ml_client.batch_endpoints.begin_create_or_update(batch_endpoint)

# Create deployment
batch_deployment = BatchDeployment(
    name="default",
    endpoint_name="my-batch-endpoint",
    model=model,
    compute="cpu-cluster",
    instance_count=5,
    max_concurrency_per_instance=2,
    mini_batch_size=10
)
ml_client.batch_deployments.begin_create_or_update(batch_deployment)

# Invoke batch job
job = ml_client.batch_endpoints.invoke(
    endpoint_name="my-batch-endpoint",
    input=Input(path="azureml:my-data:1")
)

Blue-Green Deployments

Safely roll out new model versions:
# Deploy new version (green)
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name="my-endpoint",
    model=new_model,
    instance_type="Standard_DS3_v2",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(green_deployment)

# Gradually shift traffic
endpoint.traffic = {"blue": 90, "green": 10}  # Test with 10%
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Full cutover
endpoint.traffic = {"green": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint)

MLOps: DevOps for ML

Manage the complete model lifecycle with enterprise operations:

ML Pipelines

Create reproducible workflows:
from azure.ai.ml import dsl, Input, Output

@dsl.pipeline(compute="cpu-cluster")
def training_pipeline(input_data):
    # Data prep component
    prep_data = data_prep_component(
        raw_data=input_data
    )
    
    # Train component
    train_model = train_component(
        training_data=prep_data.outputs.processed_data
    )
    
    # Evaluate component
    evaluate = evaluate_component(
        model=train_model.outputs.model,
        test_data=prep_data.outputs.test_data
    )
    
    return {
        "model": train_model.outputs.model,
        "metrics": evaluate.outputs.metrics
    }

# Create and run pipeline
pipeline = training_pipeline(
    input_data=Input(path="azureml:training-data:1")
)
pipeline_job = ml_client.jobs.create_or_update(pipeline)

Model Registry

Version and track all models:
# Register model
model = Model(
    path="./outputs/model",
    name="fraud-detector",
    version="2",
    description="Updated model with better recall",
    tags={"task": "classification", "framework": "sklearn"},
    properties={"accuracy": "0.95", "recall": "0.92"}
)
registered_model = ml_client.models.create_or_update(model)

# List versions
models = ml_client.models.list(name="fraud-detector")
for m in models:
    print(f"Version {m.version}: {m.properties}")

Monitoring and Logging

Track model performance in production:
# Enable data collection
deployment = ManagedOnlineDeployment(
    name="production",
    endpoint_name="my-endpoint",
    model=model,
    data_collector={
        "collections": {
            "model_inputs": {"enabled": True},
            "model_outputs": {"enabled": True}
        }
    }
)

# Monitor metrics
from azure.monitor.query import MetricsQueryClient

metrics_client = MetricsQueryClient(credential)
response = metrics_client.query_resource(
    resource_id=endpoint_resource_id,
    metric_names=["RequestLatency", "RequestsPerMinute"],
    timespan=timedelta(hours=1)
)

CI/CD Integration

Integrate with Azure DevOps or GitHub Actions:
# .github/workflows/train-deploy.yml
name: Train and Deploy Model

on:
  push:
    branches: [main]

jobs:
  train-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      
      - name: Install Azure ML CLI
        run: az extension add -n ml
      
      - name: Train Model
        run: |
          az ml job create --file train-job.yml \
            --workspace-name ${{ secrets.WORKSPACE_NAME }} \
            --resource-group ${{ secrets.RESOURCE_GROUP }}
      
      - name: Deploy Model
        run: |
          az ml online-deployment create --file deployment.yml \
            --workspace-name ${{ secrets.WORKSPACE_NAME }} \
            --resource-group ${{ secrets.RESOURCE_GROUP }}

Enterprise Integration

Azure Machine Learning integrates with the Azure ecosystem:

Azure Synapse

Process and stream data with Spark

Azure Arc

Run Azure services in Kubernetes

Azure Storage

Store training data and models

Azure App Service

Deploy ML-powered web apps

Microsoft Purview

Data governance and cataloging

Azure Key Vault

Secure secrets management

Security and Compliance

  • Azure Virtual Networks
  • Private endpoints
  • Network security groups
  • Firewall rules
  • VPN gateway support

Getting Started

1

Create a Workspace

Sign in to Azure ML Studio and create your workspace.
2

Create Compute

Set up a compute instance for development or clusters for training.
3

Explore Samples

Browse sample notebooks to learn best practices.
4

Train Your Model

Submit training jobs using the SDK, CLI, or studio.
5

Deploy

Create managed endpoints for real-time or batch inference.

Resources

Quickstart

Get started in minutes

Tutorials

Step-by-step guides

Python SDK Docs

Complete SDK reference

GitHub Samples

Code examples and templates
Azure Machine Learning doesn’t store or process your data outside of the region where you deploy your workspace.

Build docs developers (and LLMs) love