Kubeflow Pipelines
Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable ML workflows based on Docker containers. It provides native support for ML artifacts, experiment tracking, and component reusability.Why Kubeflow Pipelines?
Kubeflow Pipelines offers several advantages for ML workflows:- Component-based architecture: Reusable, containerized pipeline components
- Native artifact tracking: Input/Output artifacts with lineage tracking
- Pipeline versioning: Track and compare different pipeline versions
- Kubernetes-native: Built for cloud-native ML workloads
- Experiment management: Organize runs into experiments
Installation & Setup
Access UI and Storage
Forward ports to access the UI and MinIO storage:Access the UI at
http://0.0.0.0:3000Training Pipeline
The training pipeline uses Kubeflow’s component-based architecture with typed inputs and outputs.Pipeline Components
- Load Data Component
- Train Model Component
- Upload Model Component
Output[Dataset]: Declares typed output artifacts- Kubeflow automatically tracks artifact lineage
- Artifacts stored in MinIO and passed between components
Pipeline Definition
kubeflow_pipelines/kfp_training_pipeline.py
Compiling and Deploying
Inference Pipeline
The inference pipeline loads a trained model and runs predictions.Pipeline Components
- Load Model Component
- Run Inference Component
Pipeline Definition
kubeflow_pipelines/kfp_inference_pipeline.py
Running Pipelines
Trigger Runs via UI
- Navigate to
http://0.0.0.0:3000 - Go to Pipelines → Select your pipeline
- Click Create run
- Configure run parameters (if any)
- Click Start
Artifact Management
Artifact Types
Artifact Types
Kubeflow Pipelines v2 supports typed artifacts:
| Type | Description | Use Case |
|---|---|---|
Dataset | Tabular or structured data | CSVs, DataFrames |
Model | ML model artifacts | Trained models |
Artifact | Generic files | Configs, logs, metadata |
Metrics | Evaluation metrics | Accuracy, loss |
Artifact Lineage
Artifact Lineage
Kubeflow automatically tracks:
- Which component produced each artifact
- Which components consumed the artifact
- Artifact versions across pipeline runs
- Storage location in MinIO
Accessing Artifacts
Accessing Artifacts
Download artifacts from MinIO:
Best Practices
Component Design
- Keep components focused and single-purpose
- Use typed inputs/outputs for clarity
- Document component parameters
- Make components reusable across pipelines
Pipeline Versioning
- Upload new versions instead of overwriting
- Tag pipeline versions semantically
- Test pipelines in separate experiments
- Document breaking changes between versions
Resource Management
- Set resource limits on components
- Use node selectors for GPU workloads
- Enable autoscaling for variable workloads
- Monitor MinIO storage usage
Artifact Storage
- Use appropriate artifact types
- Compress large artifacts (models, datasets)
- Clean up old experiments periodically
- Back up MinIO for production
Troubleshooting
Pipeline Upload Fails
Pipeline Upload Fails
If pipeline upload fails:
Component Execution Errors
Component Execution Errors
Debug component failures:
- Click failed component in UI
- View Logs tab for error messages
- Check Input/Output for artifact issues
- Verify base image has required dependencies
- Test component locally:
Artifact Not Found
Artifact Not Found
If artifacts aren’t passed between components:
- Verify component output names match pipeline inputs
- Check MinIO is running:
kubectl get pods -n kubeflow | grep minio - Ensure components write to
.pathattribute - Verify network policies allow pod communication
Additional Resources
- Kubeflow Pipelines Documentation
- KFP SDK Reference
- Artifact Management
- Vertex AI Pipelines (managed KFP)
Next Steps
Explore Dagster
Learn asset-centric orchestration with built-in data quality checks