@kubernetes decorator specifies that a step should execute on Kubernetes.
Basic Usage
Description
The@kubernetes decorator enables steps to run on Kubernetes clusters, providing container orchestration and resource management. Kubernetes offers flexible deployment options and supports various cloud providers.
Prerequisites
- Kubernetes cluster configured
- Cloud datastore configured (
--datastore=s3,--datastore=azure, or--datastore=gs) - Kubernetes Python package installed:
pip install kubernetes
Parameters
Number of CPUs required for this step. If
@resources is also present, the maximum value from all decorators is used.Memory size (in MB) required for this step. If
@resources is also present, the maximum value from all decorators is used.Disk size (in MB) required for this step. If
@resources is also present, the maximum value from all decorators is used.Number of GPUs required for this step. A value of 0 implies that the scheduled node should not have GPUs.
Docker image to use when launching on Kubernetes. If not specified and
METAFLOW_KUBERNETES_CONTAINER_IMAGE is set, that image is used. Otherwise, defaults to a Python image matching your Python version.Kubernetes namespace to use when launching pod.
Kubernetes service account to use when launching pod.
Kubernetes secrets to use when launching pod. These are in addition to secrets defined in
METAFLOW_KUBERNETES_SECRETS.Kubernetes node selector(s) to apply to the pod. Can be passed as a comma-separated string like
'kubernetes.io/os=linux,kubernetes.io/arch=amd64' or as a dictionary {'kubernetes.io/os': 'linux', 'kubernetes.io/arch': 'amd64'}.Kubernetes tolerations to use when launching pod. Default is extracted from
METAFLOW_KUBERNETES_TOLERATIONS.Kubernetes labels to apply to the pod.
Kubernetes annotations to apply to the pod.
The vendor of the GPUs to be used for this step (e.g., ‘nvidia’, ‘amd’).
The imagePullPolicy to apply to the Docker image of the step.
Kubernetes image pull secrets to use when pulling container images. Default is extracted from
METAFLOW_KUBERNETES_IMAGE_PULL_SECRETS.A map of persistent volumes to mount to the pod. The map is from persistent volume claim names to mount paths, e.g.,
{'pvc-name': '/path/to/mount/on'}.Shared memory size (in MiB) required for this step.
Enable an explicit tmpfs mount for this step.
Sets
METAFLOW_TEMPDIR to tmpfs_path if enabled.The size (in MiB) of the tmpfs mount for this step. Defaults to 50% of allocated memory.
Path to tmpfs mount for this step.
Port number to specify in the Kubernetes job object.
Compute pool to use for this step. If not specified, any accessible compute pool within the perimeter is used.
Quality of Service class to assign to the pod. Supported values:
Guaranteed, Burstable, BestEffort.Container security context. Applies to the task container. Supported keys:
privileged(bool)allow_privilege_escalation(bool)run_as_user(int)run_as_group(int)run_as_non_root(bool)
Timeout in seconds for worker tasks to resolve the hostname of control task. Only applicable when
@parallel is used.Examples
Basic Kubernetes Execution
GPU-Accelerated Step
Node Selection
With Tolerations
Custom Docker Image
With Persistent Volumes
With Labels and Annotations
Security Context
Runtime Override
Override Kubernetes parameters at runtime:Environment Variables
When running on Kubernetes, these environment variables are available:METAFLOW_KUBERNETES_WORKLOAD- Indicates running on KubernetesMETAFLOW_KUBERNETES_POD_NAME- The pod nameMETAFLOW_KUBERNETES_POD_NAMESPACE- The pod namespaceMETAFLOW_KUBERNETES_POD_ID- The pod IDMETAFLOW_KUBERNETES_NODE_IP- The node IP address
Best Practices
- Use resource requests wisely: Set CPU/memory based on actual needs
- Leverage node selectors: Use node selectors to run on appropriate hardware
- Quality of Service: Use
GuaranteedQoS for critical workloads - Persistent storage: Use PVCs for data that needs to persist across runs
- Security: Use security contexts to enforce least-privilege principles
- Monitoring: Add labels and annotations for observability
