This guide covers the most common issues encountered when running Flyte. Before diving in, collect the following diagnostic information:
# Get the pod status and events
kubectl describe pod < PodNam e > -n < namespac e >
# Get pod logs
kubectl logs < PodNam e > -n < namespac e >
<PodName> is the node execution string shown in the Flyte UI. <namespace> corresponds to the Flyte project-domain, e.g. flytesnacks-development.
The Flyte UI shows node execution IDs like ab5mg9lzgth62h82qprp-n0-0. This is also the pod name in Kubernetes.
Installation and sandbox issues
Cannot connect to the Docker daemon
Error :Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
This occurs when running Docker Desktop instead of the native Docker engine on Linux. The socket path differs. Fix for Docker Desktop on macOS :sudo ln -s ~/Library/Containers/com.docker.docker/Data/docker.raw.sock /var/run/docker.sock
Fix for Docker Desktop on Linux :sudo ln -s ~/.docker/desktop/docker.sock /var/run/docker.sock
Fix for Rancher Desktop on Linux :sudo ln -s ~/.rd/docker.sock /var/run/docker.sock
If you are using another container runtime, link its socket to /var/run/docker.sock.
Insufficient CPU when starting sandbox
Error :message: '0/1 nodes are available: 1 Insufficient cpu.
preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.'
This is common on macOS with Docker Desktop. Fix : Open Docker Desktop settings and increase resources to a minimum of 4 CPU cores and 3 GB RAM .
TLS certificate not trusted (x509 error)
Error :authentication handshake failed: x509: "Kubernetes Ingress Controller Fake Certificate" certificate is not trusted
This occurs when TLS is not properly configured in a flyte-core deployment. Fix : Enable TLS in your values.yaml:ingress :
host : example.com
separateGrpcIngress : true
separateGrpcIngressAnnotations :
ingress.kubernetes.io/backend-protocol : "grpc"
annotations :
ingress.kubernetes.io/app-root : "/console"
ingress.kubernetes.io/default-backend-redirect : "/console"
kubernetes.io/ingress.class : haproxy
tls :
enabled : true
Also update your flytectl config to disable insecure mode: admin :
endpoint : dns:///example.com
authType : Pkce
insecure : false
insecureSkipVerify : true
Wrong SSL version (OPENSSL_internal:WRONG_VERSION_NUMBER)
Error :OPENSSL_internal:WRONG_VERSION_NUMBER
For flyte-binary : Verify that the endpoint name in your config.yaml matches the DNS names in the SSL certificate (whether self-signed or CA-issued).For sandbox : Verify the FLYTECTL_CONFIG environment variable points to the correct config file:export FLYTECTL_CONFIG =~ /. flyte / config-sandbox . yaml
Execution failures
OOMKilled — container terminated with exit code 137
Error :terminated with exit code (137). Reason [OOMKilled]
The container exceeded its memory limit. Fix 1 : For Helm deployments, update task resource defaults in your values.yaml:inline :
task_resources :
defaults :
cpu : 100m
memory : 100Mi
storage : 100Mi
limits :
memory : 1Gi
Fix 2 : Override resource limits directly in your task code:from flytekit import Resources, task
@task ( limits = Resources( mem = "256Mi" ))
def your_task (...):
...
Fix 3 : For EKS deployments, adjust limits in the inline section of eks-production.yaml. Use the most recent Helm charts .
Error : Kubernetes cannot pull the task container image.Fix 1 : If your environment uses a network proxy, pass the proxy configuration when starting the sandbox:flytectl demo start --env HTTP_PROXY= < your-proxy-I P >
Fix 2 : Never use latest as an image tag. Kubernetes changes the pull policy to Always for latest, forcing a pull on every pod start. Use a specific version tag:@task ( container_image = "my-registry.example.com/my-image:v1.2.3" )
def my_task (...):
...
Fix 3 : If the registry requires authentication, create a Kubernetes image pull secret and configure it in your pod template.
ModuleNotFoundError in container tasks
Error :ModuleNotFoundError: No module named 'mymodule'
Cause : The Python module is not on the container’s path.Fix : If using a custom Docker image, ensure:
Your Dockerfile is at the same level as the flyte directory.
An empty __init__.py exists in your project folder.
Expected directory layout: myflyteapp/
├── Dockerfile
├── docker_build_and_tag.sh
└── flyte/
├── __init__.py
└── workflows/
├── __init__.py
└── example.py
Spark task error: JavaPackage is not callable
Error :FlyteScopedUserException: 'JavaPackage' object is not callable
Cause : The spark plugin is not enabled in the FlytePropeller configuration.Fix : Add spark to the enabled-plugins list in your config YAML:tasks :
task-plugins :
enabled-plugins :
- container
- sidecar
- K8S-ARRAY
- spark
default-for-task-types :
- container : container
- container_array : K8S-ARRAY
Dynamic workflow: failed + succeeded + running inconsistent state
Error : An execution appears stuck or reports an inconsistent failed + succeeded + running state.Cause : A malformed dynamic workflow was processed by FlytePropeller. This was a known bug fixed in v1.16.4.Fix : Upgrade to Flyte v1.16.4 or later. If you cannot upgrade immediately, use RecoverExecution to resume from the last known good state:grpcurl -plaintext \
-d '{"id": {"project": "flytesnacks", "domain": "development", "name": "<execution-id>"}}' \
localhost:81 flyteidl.service.AdminService/RecoverExecution
Storage and data issues
AccessDenied when writing to S3 (EKS deployment)
Error :An error occurred (AccessDenied) when calling the PutObject operation
Cause : The Kubernetes service account Flyte uses does not have the correct IAM role annotation for IRSA (IAM Roles for Service Accounts).Fix 1 : Verify the service account annotation:kubectl describe sa < my-flyte-s a > -n < flyte-namespac e >
Expected output should include: Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/flyte-system-role
Fix 2 : If the annotation is missing, add it manually:kubectl annotate serviceaccount -n < flyte-namespac e > < my-flyte-s a > \
eks.amazonaws.com/role-arn=arn:aws:iam:: < account-i d > :role/ < flyte-iam-rol e >
Refer to the community-maintained Flyte the Hard Way guide for full EKS IAM configuration.
Cannot access Minio in local sandbox
When running the local sandbox, Minio is available at: For debugging, set these environment variables when running tasks locally: export FLYTE_AWS_ENDPOINT = "http://localhost:30002"
export FLYTE_AWS_ACCESS_KEY_ID = "minio"
export FLYTE_AWS_SECRET_ACCESS_KEY = "miniostorage"
Authentication issues
Unauthenticated errors in flytectl
Error : rpc error: code = UnauthenticatedFix 1 : Re-authenticate:flytectl config init --host flyte.example.com
Fix 2 : Verify your config file has the correct auth settings:admin :
endpoint : dns:///flyte.example.com
authType : Pkce # or ClientSecret for service accounts
insecure : false
Fix 3 : For development/sandbox, you can disable auth entirely:admin :
endpoint : dns:///localhost:30080
insecure : true
Auth config not found after demo start
After running flytectl demo start, the sandbox config is written to ~/.flyte/config-sandbox.yaml. Export it: export FLYTECTL_CONFIG =~ /. flyte / config-sandbox . yaml
Add this to your shell profile to persist it across sessions.
Getting more help
GitHub Issues Open a bug report or feature request.
Slack Community Get real-time help in the #ask-the-community channel.
GitHub Discussions Ask questions or share ideas with the community.
Documentation Browse the official Flyte documentation.