Skip to main content

MLOps: Model Management with Azure Machine Learning

Machine Learning Operations (MLOps) applies DevOps principles to the machine learning lifecycle, improving the quality, consistency, and efficiency of ML solutions.
MLOps enables faster experimentation, deployment, and iteration while maintaining quality assurance and end-to-end lineage tracking.

What is MLOps?

MLOps is based on DevOps principles that increase workflow efficiency:

Continuous Integration

Automated testing and validation of ML code and models

Continuous Deployment

Automated deployment of models to production

Continuous Delivery

Reliable release of ML solutions to users

Benefits of MLOps

Applying MLOps to machine learning results in:
  • Quick iteration on model architectures
  • Parallel experiment tracking
  • Reproducible training pipelines
  • Efficient hyperparameter tuning

MLOps Capabilities in Azure Machine Learning

1. Reproducible ML Pipelines

Define repeatable workflows for data preparation, training, and scoring:
from azure.ai.ml import dsl
from azure.ai.ml import Input, Output

@dsl.pipeline(
    name="training_pipeline",
    description="End-to-end training pipeline",
)
def ml_pipeline(pipeline_input_data):
    # Data preparation step
    prep_data = prep_component(
        raw_data=pipeline_input_data
    )
    
    # Training step
    train_model = train_component(
        training_data=prep_data.outputs.prepared_data
    )
    
    # Evaluation step
    evaluate_model = eval_component(
        model=train_model.outputs.model,
        test_data=prep_data.outputs.test_data
    )
    
    return {
        "model": train_model.outputs.model,
        "metrics": evaluate_model.outputs.metrics
    }

# Create and submit pipeline
pipeline_job = ml_pipeline(
    pipeline_input_data=Input(type="uri_folder", path="azureml://datastores/data")
)

ml_client.jobs.create_or_update(pipeline_job)
  • Reusability: Use same pipeline for different datasets
  • Versioning: Track pipeline definitions over time
  • Parallelization: Run independent steps concurrently
  • Scheduling: Trigger pipelines on schedules or events

2. Reusable Software Environments

Ensure reproducible builds without manual configuration:
from azure.ai.ml.entities import Environment

env = Environment(
    name="sklearn-env",
    description="Scikit-learn environment",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
    conda_file="environment.yml"
)

ml_client.environments.create_or_update(env)

3. Model Registration and Versioning

Store and track models in the Azure Machine Learning registry:
from azure.ai.ml.entities import Model

# Register model
model = Model(
    path="outputs/model",
    name="fraud-detection-model",
    description="XGBoost model for fraud detection",
    tags={"framework": "xgboost", "task": "classification"},
    properties={"accuracy": "0.95", "dataset": "fraud_v2"}
)

registered_model = ml_client.models.create_or_update(model)
print(f"Registered model: {registered_model.name} version {registered_model.version}")
Model Registry Features:

Automatic Versioning

Each registration increments version number automatically

Metadata Tracking

Store tags and properties for searchability

Lineage

Link to training job, dataset, and environment

Model Comparison

Compare metrics across versions

4. Model Deployment as Endpoints

Deploy models for real-time or batch inference:
Real-time inference with managed infrastructure:
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="fraud-detection-endpoint",
    description="Fraud detection API",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Create deployment
deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="fraud-detection-endpoint",
    model=registered_model,
    environment="azureml://registries/azureml/environments/sklearn-1.5/versions/1",
    code_configuration=CodeConfiguration(
        code="src",
        scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=2
)
ml_client.online_deployments.begin_create_or_update(deployment)

5. Controlled Rollout

Safely deploy new model versions with traffic splitting:
# Deploy new model version to "green" deployment
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name="fraud-detection-endpoint",
    model=new_model_version,
    instance_type="Standard_DS3_v2",
    instance_count=1
)
ml_client.online_deployments.begin_create_or_update(green_deployment)

# Gradually shift traffic from blue to green
endpoint.traffic = {"blue": 90, "green": 10}
ml_client.online_endpoints.begin_create_or_update(endpoint)

# Monitor metrics, then complete rollout
endpoint.traffic = {"green": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint)
Traffic Management Strategies:
1

Shadow Deployment

Mirror traffic to new deployment without affecting production
2

Canary Release

Route small percentage of traffic to new version
3

Blue-Green

Switch all traffic between versions instantly
4

A/B Testing

Compare performance of multiple model versions

Metadata and Lineage Tracking

Azure Machine Learning captures end-to-end lineage:

Data Lineage

from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

# Register dataset
data_asset = Data(
    name="fraud-training-data",
    version="2024-01",
    description="Fraud transactions dataset",
    path="azureml://datastores/data/paths/fraud/",
    type=AssetTypes.URI_FOLDER,
    tags={"year": "2024", "domain": "finance"}
)

ml_client.data.create_or_update(data_asset)

Job History

Automatic tracking of:
  • Code snapshots (Git commit)
  • Input datasets and versions
  • Hyperparameters
  • Metrics and outputs
  • Compute environment
  • Duration and costs
# Query job history
jobs = ml_client.jobs.list(
    parent_job_name="training-pipeline-run-123"
)

for job in jobs:
    print(f"{job.name}: {job.status} - {job.properties}")

Event-Driven Workflows

Trigger actions based on ML lifecycle events:
from azure.eventgrid import EventGridEvent

# Subscribe to model registration events
event_types = [
    "Microsoft.MachineLearningServices.ModelRegistered",
    "Microsoft.MachineLearningServices.ModelDeployed",
    "Microsoft.MachineLearningServices.DatasetDriftDetected"
]

# Event handler
def handle_ml_event(event: EventGridEvent):
    if event.event_type == "ModelRegistered":
        model_name = event.data["modelName"]
        model_version = event.data["modelVersion"]
        
        # Trigger deployment pipeline
        trigger_deployment(model_name, model_version)

Monitoring and Alerting

Model Monitoring

Track model performance in production:
from azure.ai.ml.entities import AlertNotification

# Configure monitoring
monitor = ModelMonitor(
    endpoint_name="fraud-detection-endpoint",
    deployment_name="blue",
    monitoring_signals=[
        "data_drift",
        "prediction_drift",
        "model_performance"
    ],
    alert_notification=AlertNotification(
        emails=["[email protected]"]
    )
)

Metrics to Monitor

  • Request latency (P50, P95, P99)
  • Throughput (requests/second)
  • Error rate
  • CPU/GPU utilization
  • Memory usage

CI/CD with Azure Pipelines

Integrate Azure Machine Learning into DevOps workflows:

Azure DevOps Extension

The Machine Learning extension provides:
  • Azure ML workspace integration
  • Model training triggers
  • Automated deployment tasks
  • Environment management

GitHub Actions

name: Train and Deploy ML Model

on:
  push:
    branches: [ main ]
  workflow_dispatch:

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      
      - name: Install Azure ML CLI
        run: az extension add -n ml
      
      - name: Submit Training Job
        run: |
          az ml job create \
            --file jobs/train.yml \
            --resource-group ${{ secrets.RESOURCE_GROUP }} \
            --workspace-name ${{ secrets.WORKSPACE_NAME }}
      
      - name: Deploy Model
        run: |
          az ml online-endpoint create --file endpoints/endpoint.yml
          az ml online-deployment create --file endpoints/deployment.yml

Best Practices

Track versions for:
  • Training code (Git commits)
  • Data assets (versioned datasets)
  • Models (automatic versioning)
  • Environments (pinned dependencies)
  • Pipeline definitions (YAML configs)
Implement:
  • Unit tests for training code
  • Integration tests for pipelines
  • Model validation tests
  • Deployment smoke tests
  • Performance benchmarks
Set up:
  • Real-time dashboards
  • Automated alerts
  • Data drift detection
  • Model performance tracking
  • Cost monitoring
Benefits:
  • Consistent feature definitions
  • Training-serving skew prevention
  • Feature reusability
  • Point-in-time correctness
Establish:
  • Model approval workflows
  • Access control policies
  • Compliance documentation
  • Audit trails
  • Responsible AI reviews

Next Steps

Set Up MLOps

Configure CI/CD with Azure DevOps

Model Deployment

Deploy models to endpoints

Model Monitoring

Monitor models in production

Azure Pipelines

Integrate with Azure DevOps

Build docs developers (and LLMs) love