Skip to main content

Overview

In this practice exercise, you’ll compare managed ML platforms and implement multi-model endpoints on both AWS SageMaker and GCP Vertex AI. This hands-on experience will help you understand the trade-offs between platforms and gain practical deployment experience.

Learning Objectives

By completing this exercise, you will:
  1. Deploy multiple models on AWS SageMaker multi-model endpoints
  2. Deploy multiple models on GCP Vertex AI
  3. Compare the developer experience, features, and costs of each platform
  4. Evaluate buy vs build decisions for your specific context
  5. Document platform recommendations with pros and cons

Tasks

Task 1: AWS SageMaker Multi-Model Deployment

1

Setup AWS Environment

Configure your AWS credentials and create required IAM roles:
# Configure AWS CLI
aws configure

# Create SageMaker execution role (if not exists)
aws iam create-role \
    --role-name sagemaker-execution-role \
    --assume-role-policy-document file://trust-policy.json

# Attach required policies
aws iam attach-role-policy \
    --role-name sagemaker-execution-role \
    --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
2

Prepare Models

Create at least 2 different models for deployment:
  • One image classification model (e.g., ResNet, MobileNet)
  • One text/tabular model (e.g., simple classifier)
Package them in Triton-compatible format:
model_registry/
├── image_classifier_v1/
│   ├── config.pbtxt
│   └── 1/
│       └── model.pt
└── text_classifier_v1/
    ├── config.pbtxt
    └── 1/
        └── model.pt
3

Deploy Multi-Model Endpoint

Use the provided CLI tool to create and deploy:
# Create endpoint
python cli.py create-endpoint

# Add models
python cli.py add-model ./model_registry/image_classifier_v1/ image_v1.tar.gz
python cli.py add-model ./model_registry/text_classifier_v1/ text_v1.tar.gz

# Verify upload
aws s3 ls s3://your-bucket/models/
4

Test Inference

Invoke both models and measure performance:
# Test image model
python cli.py call-model-image image_v1.tar.gz

# Test text model
python cli.py call-model-vector text_v1.tar.gz
Measure:
  • Cold start latency (first request)
  • Warm latency (subsequent requests)
  • Throughput (requests per second)
5

Monitor and Optimize

Check CloudWatch metrics and optimize:
# View invocation metrics
aws cloudwatch get-metric-statistics \
    --namespace AWS/SageMaker \
    --metric-name ModelLoadingLatency \
    --dimensions Name=EndpointName,Value=sagemaker-poc
Success Criteria:
  • Multi-model endpoint successfully created
  • At least 2 models deployed and callable
  • Performance metrics documented
  • Code committed to repository

Task 2: GCP Vertex AI Multi-Model Deployment

1

Setup GCP Environment

Configure GCP credentials and enable required APIs:
# Authenticate
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Enable APIs
gcloud services enable aiplatform.googleapis.com
gcloud services enable storage.googleapis.com
2

Deploy Models to Vertex AI

Upload and deploy models:
from google.cloud import aiplatform

aiplatform.init(project="YOUR_PROJECT_ID", location="us-central1")

# Upload model
model = aiplatform.Model.upload(
    display_name="image-classifier",
    artifact_uri="gs://your-bucket/models/image_v1",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/pytorch-gpu.1-13:latest",
)

# Deploy to endpoint
endpoint = model.deploy(
    machine_type="n1-standard-4",
    accelerator_type="NVIDIA_TESLA_T4",
    accelerator_count=1,
)
3

Test Inference

Invoke models and compare with SageMaker:
# Make prediction
prediction = endpoint.predict(instances=[{
    "input": input_data.tolist()
}])

print(prediction.predictions)
4

Compare Approaches

Document differences in:
  • Deployment process complexity
  • API ergonomics
  • Monitoring capabilities
  • Cost structure
Vertex AI Note: Vertex AI doesn’t have direct multi-model endpoint support like SageMaker. You’ll need to:
  • Deploy models to separate endpoints, or
  • Use a custom prediction container that routes to multiple models, or
  • Use Vertex AI Prediction with NVIDIA Triton

Task 3: Platform Comparison Document

Create a comprehensive comparison document covering:
FeatureAWS SageMakerGCP Vertex AIWinner
Multi-model endpointsNative supportCustom container neededAWS
Deployment APIBoto3 (verbose)Cloud SDK (cleaner)?
MonitoringCloudWatch + Model MonitorCloud Monitoring?
Custom containersFull supportFull supportTie
AutoMLBuilt-inStrong AutoML?
GPU supportWide rangeGood selection?
Async inferenceNative supportNeed Cloud TasksAWS
Framework supportAll major frameworksAll major frameworksTie

Deliverables

PR1: AWS SageMaker

Code for multi-model deployment on AWS SageMaker with:
  • CLI tool for endpoint management
  • At least 2 different model types
  • Testing scripts
  • README with instructions

PR2: GCP Vertex AI

Code for multi-model deployment on GCP Vertex AI with:
  • Deployment scripts
  • At least 2 different model types
  • Testing scripts
  • README with instructions

Platform Comparison

Google Doc or Markdown document with:
  • Technical feature comparison
  • Cost analysis for your workload
  • Developer experience notes
  • Recommendation with rationale

MLOps Stack Templates

Two alternative MLOps stack designs:
  • AWS SageMaker-based stack
  • GCP Vertex AI-based stack
  • Comparison with current implementation

Acceptance Criteria

  • ✅ Code follows project style guide (ruff format, ruff check)
  • ✅ All tests pass (pytest)
  • ✅ Clear README with setup instructions
  • ✅ No hardcoded credentials or account IDs
  • ✅ Proper error handling and logging
  • ✅ Multi-model endpoints successfully deployed
  • ✅ At least 2 models per platform
  • ✅ Inference working for all models
  • ✅ Performance metrics collected
  • ✅ Cleanup scripts provided
  • ✅ Platform comparison document complete
  • ✅ Cost analysis with specific numbers
  • ✅ Clear recommendation with rationale
  • ✅ MLOps stack templates documented
  • ✅ Pros and cons for each approach

Reading List

Work through these resources to build understanding:

MLOps Platforms

AWS SageMaker

GCP Vertex AI

Azure ML

Tips for Success

Cost Management:
  • Use the smallest instance types for testing
  • Delete endpoints when not in use
  • Set up billing alerts
  • Use free tier where possible
  • Clean up resources after testing
Development Workflow:
  1. Start with SageMaker (better multi-model support)
  2. Use the provided CLI tool as a starting point
  3. Test with simple models first (faster iteration)
  4. Document everything as you go
  5. Take screenshots of monitoring dashboards
  6. Track all costs for comparison
Common Issues:
  • IAM permissions: Ensure your role has S3, SageMaker access
  • Model format: Follow Triton model repository structure
  • Cold starts: First request will be slow (expected)
  • Region consistency: Keep all resources in same region

Keep Iterating!

After completing this module, continue exploring:
  • Azure Machine Learning for third platform perspective
  • Kubernetes-based alternatives (KServe, Seldon)
  • Model serving optimizations (TensorRT, ONNX)
  • A/B testing and canary deployments
  • Cost optimization techniques
  • Multi-cloud strategies

Continue Learning

Explore other modules in the ML in Production course

Build docs developers (and LLMs) love