Overview
In this practice exercise, you’ll compare managed ML platforms and implement multi-model endpoints on both AWS SageMaker and GCP Vertex AI. This hands-on experience will help you understand the trade-offs between platforms and gain practical deployment experience.Learning Objectives
By completing this exercise, you will:- Deploy multiple models on AWS SageMaker multi-model endpoints
- Deploy multiple models on GCP Vertex AI
- Compare the developer experience, features, and costs of each platform
- Evaluate buy vs build decisions for your specific context
- Document platform recommendations with pros and cons
Tasks
Task 1: AWS SageMaker Multi-Model Deployment
Prepare Models
Create at least 2 different models for deployment:
- One image classification model (e.g., ResNet, MobileNet)
- One text/tabular model (e.g., simple classifier)
Test Inference
Invoke both models and measure performance:Measure:
- Cold start latency (first request)
- Warm latency (subsequent requests)
- Throughput (requests per second)
Task 2: GCP Vertex AI Multi-Model Deployment
Vertex AI Note: Vertex AI doesn’t have direct multi-model endpoint support like SageMaker. You’ll need to:
- Deploy models to separate endpoints, or
- Use a custom prediction container that routes to multiple models, or
- Use Vertex AI Prediction with NVIDIA Triton
Task 3: Platform Comparison Document
Create a comprehensive comparison document covering:- Technical Comparison
- Cost Comparison
- Developer Experience
- Recommendation
| Feature | AWS SageMaker | GCP Vertex AI | Winner |
|---|---|---|---|
| Multi-model endpoints | Native support | Custom container needed | AWS |
| Deployment API | Boto3 (verbose) | Cloud SDK (cleaner) | ? |
| Monitoring | CloudWatch + Model Monitor | Cloud Monitoring | ? |
| Custom containers | Full support | Full support | Tie |
| AutoML | Built-in | Strong AutoML | ? |
| GPU support | Wide range | Good selection | ? |
| Async inference | Native support | Need Cloud Tasks | AWS |
| Framework support | All major frameworks | All major frameworks | Tie |
Deliverables
PR1: AWS SageMaker
Code for multi-model deployment on AWS SageMaker with:
- CLI tool for endpoint management
- At least 2 different model types
- Testing scripts
- README with instructions
PR2: GCP Vertex AI
Code for multi-model deployment on GCP Vertex AI with:
- Deployment scripts
- At least 2 different model types
- Testing scripts
- README with instructions
Platform Comparison
Google Doc or Markdown document with:
- Technical feature comparison
- Cost analysis for your workload
- Developer experience notes
- Recommendation with rationale
MLOps Stack Templates
Two alternative MLOps stack designs:
- AWS SageMaker-based stack
- GCP Vertex AI-based stack
- Comparison with current implementation
Acceptance Criteria
Code Quality
Code Quality
- ✅ Code follows project style guide (ruff format, ruff check)
- ✅ All tests pass (pytest)
- ✅ Clear README with setup instructions
- ✅ No hardcoded credentials or account IDs
- ✅ Proper error handling and logging
Functionality
Functionality
- ✅ Multi-model endpoints successfully deployed
- ✅ At least 2 models per platform
- ✅ Inference working for all models
- ✅ Performance metrics collected
- ✅ Cleanup scripts provided
Documentation
Documentation
- ✅ Platform comparison document complete
- ✅ Cost analysis with specific numbers
- ✅ Clear recommendation with rationale
- ✅ MLOps stack templates documented
- ✅ Pros and cons for each approach
Reading List
Work through these resources to build understanding:MLOps Platforms
AWS SageMaker
- What is Amazon SageMaker?
- Train machine learning models
- Deploy models for inference
- Multi-Model Endpoints
- Create a Multi-Model Endpoint
- Multi-Model Endpoints on GPU
- SageMaker Model Monitor
- Amazon Bedrock
- SageMaker + Hugging Face
GCP Vertex AI
- Giving Vertex AI a Spin
- Vertex Pipelines AutoML Workflow
- Vertex AI Samples
- Serving with NVIDIA Triton
- Vertex AI Model Monitoring
- Monitor feature skew and drift
- Machine Learning Engineer Learning Path
Azure ML
Tips for Success
Common Issues:
- IAM permissions: Ensure your role has S3, SageMaker access
- Model format: Follow Triton model repository structure
- Cold starts: First request will be slow (expected)
- Region consistency: Keep all resources in same region
Keep Iterating!
After completing this module, continue exploring:- Azure Machine Learning for third platform perspective
- Kubernetes-based alternatives (KServe, Seldon)
- Model serving optimizations (TensorRT, ONNX)
- A/B testing and canary deployments
- Cost optimization techniques
- Multi-cloud strategies
Continue Learning
Explore other modules in the ML in Production course