@task(task_config=RayJobConfig(...)) tasks to the KubeRay operator, which manages RayJob and RayCluster Kubernetes resources.
Prerequisites
- A running Kubernetes cluster with Flyte installed
helmandkubectlconfigured
Step 1: Install the KubeRay operator
Step 2: Enable the Ray plugin in Flyte
Create avalues-ray.yaml override file:
- flyte-binary
- flyte-core
Step 3: Write a Ray task
Install the flytekit Ray plugin:Basic Ray task
Ray task with GPU workers
TTL and cluster cleanup
By default,ttlSecondsAfterFinished controls when Ray clusters are deleted after a job completes. Set it in the plugin config globally or per-task:
Verify
Troubleshooting
Ray cluster stays in Pending state
Ray cluster stays in Pending state
Check if nodes have sufficient CPU/memory/GPU resources:
Ray job fails with image pull error
Ray job fails with image pull error
Ray workers use the same container image as the head node unless overridden. Ensure the image is accessible from your cluster’s nodes. For private registries, configure
imagePullSecrets in a default PodTemplate.TTL not cleaning up clusters
TTL not cleaning up clusters
Verify the KubeRay operator version supports
ttlSecondsAfterFinished. Upgrade to kuberay-operator >= 1.0.0: