Practice Exercises
Complete these hands-on exercises to build end-to-end training and inference pipelines using all three orchestration frameworks.Prerequisites
Before starting, ensure you have:Kubernetes Cluster
Environment Variables
Homework 7: Kubeflow + Airflow Pipelines
Learning Objectives
- Deploy Kubeflow Pipelines on Kubernetes
- Write training and inference DAGs for Kubeflow
- Deploy Airflow with KubernetesPodOperator
- Implement parallel training/inference pipelines
Reading List
Kubeflow Deployment
Standalone deployment guide
KFP SDK Reference
API documentation
KubernetesPodOperator
Airflow Kubernetes integration
Pipeline Design Pattern
DoorDash’s modular approach
Task Requirements
Both training and inference pipelines must include at minimum:- Training Pipeline
- Inference Pipeline
Required Steps:
- Load Training Data
- Train Model
- Save Trained Models
- Data preprocessing/augmentation
- Hyperparameter tuning
- Model evaluation on validation set
- Upload metrics to experiment tracker
- Trained model artifacts
- Training metrics logged to W&B
- Pipeline execution completes successfully
Assignments
PR1: Kubeflow Deployment README
Write a README with instructions on:
- Installing Kubeflow Pipelines on Kind cluster
- Accessing the UI via port-forward
- Configuring the Python SDK
- Verifying installation
- README is clear and reproducible
- Includes troubleshooting common issues
- Tested on a fresh cluster
PR2: Kubeflow Training Pipeline
Implement a Kubeflow training pipeline:
- Use
@dsl.componentdecorator - Define typed Input/Output artifacts
- Upload model to W&B registry
- Pipeline compiles without errors
- Runs successfully in Kubeflow UI
- Produces trained model artifact
- Training metrics logged
PR3: Kubeflow Inference Pipeline
Implement a Kubeflow inference pipeline:
- Load model from W&B registry
- Run predictions on test data
- Save results as Dataset artifact
- Pipeline depends on training pipeline outputs
- Artifact lineage visible in UI
- Predictions saved correctly
PR4: Airflow Deployment README
Write a README covering:
- Installing Airflow with Kubernetes provider
- Creating PersistentVolumes for data sharing
- Launching Airflow standalone
- Accessing the web UI
- Instructions work on macOS and Linux
- Explains AIRFLOW_HOME setup
- Documents common errors
PR5: Airflow Training DAG
Implement an Airflow training DAG:
- Use KubernetesPodOperator for tasks
- Mount PersistentVolumes for data sharing
- Clean up storage before/after runs
- DAG appears in Airflow UI
- Triggers successfully via CLI or UI
- Model uploaded to registry
- Tasks run in correct sequence
Success Criteria
- 6 PRs merged with passing reviews
- All pipelines run end-to-end without errors
- Model training completes and uploads to registry
- Inference generates predictions using trained models
Homework 8: Dagster
Learning Objectives
- Implement asset-centric pipelines in Dagster
- Add data quality checks with asset checks
- Compare orchestration frameworks
- Document tradeoffs in design decisions
Reading List
Dagster ML Pipelines
Orchestrating ML workflows
Fine-tuning LLMs
ML pipelines for LLM training
Metaflow
Alternative framework overview
Flyte
Another orchestration option
Task Requirements
- Training Pipeline
- Inference Pipeline
Required Assets:
load_training_data- Load and preprocess datatrained_model- Train model, return model artifactmodel_metrics- Evaluate model on validation set
- Data is not empty
- Model accuracy/metrics exceed threshold
- Training completed without errors
- All assets materialize successfully
- Asset checks pass (or fail with explanations)
- Metadata visible in Dagster UI
Assignments
Update Design Document
Add a Pipeline Orchestration section to your Google Doc comparing:For Each Framework:
- Why did you choose this framework?
- What are the advantages for your use case?
- What are the limitations?
- How does it handle failures?
- What’s the learning curve?
- Ease of use
- Kubernetes integration
- Artifact tracking
- Data quality checks
- Community support
- Production readiness
PR1: Dagster Training Pipeline
Implement Dagster assets for training:
- Use
@assetdecorator - Add
@asset_checkfor validation - Attach metadata with
context.add_output_metadata() - Optionally use Modal for GPU execution
- Assets materialize in Dagster UI
- Asset checks run and report status
- Metadata includes samples, metrics, counts
- Model uploaded to registry
PR2: Dagster Inference Pipeline
Implement Dagster assets for inference:
- Depend on trained model asset
- Load model from registry
- Run batch predictions
- Add checks for prediction quality
- Asset lineage shows training → inference flow
- Predictions saved successfully
- Asset checks validate output
- Inference metrics logged
Success Criteria
- 2 PRs merged with passing reviews
- Pipeline section in design document
- All assets materialize without errors
- Asset checks provide useful validation
- Clear recommendation for production use
Bonus Challenges
Multi-Model Comparison
Multi-Model Comparison
Extend pipelines to train multiple models in parallel:
- Train 3+ models with different hyperparameters
- Compare metrics in W&B
- Select best model for inference
- Implement A/B testing in inference pipeline
Advanced Scheduling
Advanced Scheduling
Implement complex scheduling logic:
- Retrain model weekly
- Run inference hourly
- Trigger retraining if inference drift detected
- Send Slack notifications on failures
Cost Optimization
Cost Optimization
Reduce training costs:
- Use spot instances for training
- Implement early stopping
- Cache intermediate results
- Compare costs across orchestrators
- Document savings (aim for 50%+ reduction)
Data Versioning
Data Versioning
Add data versioning and lineage:
- Use DVC or Pachyderm for data versioning
- Track which data version trained each model
- Enable rollback to previous data/model versions
- Implement drift detection on training data
Additional Resources
Why Data Scientists Shouldn't Know K8s
Chip Huyen’s perspective
MLOps Orchestration
Made With ML course
Awesome Workflow Engines
Comprehensive comparison
Tips for Success
Getting Help
If you’re stuck:- Check the framework’s documentation
- Search GitHub issues for similar problems
- Review example DAGs/pipelines in this module
- Ask in course discussion forums
- Consult with your peers
Remember: The goal is learning, not perfection. It’s okay if your first pipelines are messy—refactor as you learn!
Submission Checklist
Homework 7 Checklist
Homework 7 Checklist
- PR1: Kubeflow deployment README
- PR2: Kubeflow training pipeline code
- PR3: Kubeflow inference pipeline code
- PR4: Airflow deployment README
- PR5: Airflow training DAG
- PR6: Airflow inference DAG
- All PRs have passing CI checks
- All pipelines run successfully
- Models uploaded to W&B registry
- Screenshots of UI showing successful runs
Homework 8 Checklist
Homework 8 Checklist
- PR1: Dagster training pipeline
- PR2: Dagster inference pipeline
- Design doc pipeline section completed
- Comparison table filled out
- Recommendation documented
- All asset checks implemented
- Metadata attached to assets
- Asset lineage visible in UI
What’s Next?
After completing these exercises, you’ll be ready to:- Deploy production ML pipelines
- Choose appropriate orchestration tools for projects
- Implement data quality checks and monitoring
- Scale ML workflows on Kubernetes
- Compare and evaluate orchestration frameworks
Continue to Module 5
Move on to the next module