Installation
kfp>=2.6.0- Kubeflow Pipelines SDK v2kfp-kubernetes>=1.1.0- Kubernetes-specific KFP extensions
Available Components
The Kubeflow integration provides:Kubeflow Orchestrator
Execute complete pipelines using Kubeflow Pipelines on Kubernetes
Kubeflow Orchestrator
The Kubeflow orchestrator compiles ZenML pipelines into KFP format and executes them on a Kubeflow Pipelines deployment.Prerequisites
Before using the Kubeflow orchestrator:- Kubernetes cluster with Kubeflow Pipelines installed
- kubectl access configured
- Container registry accessible from the cluster
- Artifact store accessible from the cluster (S3, GCS, etc.)
Installing Kubeflow Pipelines
Standalone KFP (Recommended):Configuration
- None (uses defaults if running from within the cluster)
kubernetes_context- kubectl context name (defaults to current context)kubernetes_namespace- Namespace for KFP (default:kubeflow)kubeflow_hostname- KFP API endpoint URLsynchronous- Wait for pipeline completion (default:True)skip_local_validations- Skip kubectl checks (default:False)skip_ui_daemon_provisioning- Don’t start local UI proxy (default:False)
Access Patterns
Local Access (Port Forwarding):Step-Level Pod Configuration
Customize Kubernetes Pods for individual steps usingKubernetesPodSettings:
Pipeline Caching
Kubeflow Pipelines supports execution caching:- Caches at the step level based on inputs and code
- Cached results are reused across pipeline runs
- Cache is stored in the KFP backend
- Disable caching for non-deterministic steps
Resource Management
CPU and Memory:Volume Mounts
Persistent Volumes:Complete Stack Example
Authentication
Service Account Setup
Create a Kubernetes service account for pipelines:UI Access
Access the Kubeflow Pipelines UI to monitor runs: Port Forwarding:Best Practices
Use Minimal Docker Images
Use Minimal Docker Images
Reduce pull times with slim images:
Set Resource Limits
Set Resource Limits
Always set resource limits to prevent resource exhaustion:
Use Node Affinity for GPU Jobs
Use Node Affinity for GPU Jobs
Ensure GPU jobs land on GPU nodes:
Enable Pipeline Caching Selectively
Enable Pipeline Caching Selectively
Disable caching for non-deterministic steps:
Troubleshooting
Pipeline Compilation Fails
Pipeline Compilation Fails
If pipeline compilation errors occur:
- Check KFP version compatibility (
kfp>=2.6.0) - Verify all steps have proper type hints
- Ensure materializers exist for custom types
- Check ZenML version matches integration version
Pods Stuck in Pending
Pods Stuck in Pending
If pods don’t start:
- Check node resources:
kubectl describe nodes - View pod events:
kubectl describe pod -n kubeflow - Verify image pull secrets are configured
- Check resource requests vs. available capacity
Cannot Connect to KFP API
Cannot Connect to KFP API
If orchestrator can’t reach KFP:
- Verify port forwarding is active
- Check
kubeflow_hostnameURL is correct - Ensure firewall rules allow access
- Test with
curl $KUBEFLOW_HOSTNAME
Artifact Loading Fails
Artifact Loading Fails
If steps can’t load artifacts:
- Ensure artifact store is accessible from cluster
- Check service account has storage permissions
- Verify network policies allow egress
- For cloud storage, check credentials are mounted
Differences from Kubernetes Orchestrator
Kubeflow vs. native Kubernetes orchestrator:| Feature | Kubeflow | Kubernetes |
|---|---|---|
| UI | KFP dashboard | None (kubectl only) |
| Pipeline DAG visualization | ✓ | ✗ |
| Caching | Built-in | Manual |
| Execution engine | Argo Workflows | Direct Jobs |
| Scheduling | Advanced | Basic |
| Monitoring | Extensive | Basic |
| Setup complexity | Higher | Lower |
Next Steps
Kubernetes Integration
Compare with native Kubernetes orchestrator
Container Registries
Configure image registries
Remote Execution
Production deployment patterns
Kubeflow Docs
Official Kubeflow Pipelines documentation
