Skip to main content
This guide covers the most common issues encountered when running Flyte. Before diving in, collect the following diagnostic information:
# Get the pod status and events
kubectl describe pod <PodName> -n <namespace>

# Get pod logs
kubectl logs <PodName> -n <namespace>
<PodName> is the node execution string shown in the Flyte UI. <namespace> corresponds to the Flyte project-domain, e.g. flytesnacks-development.
The Flyte UI shows node execution IDs like ab5mg9lzgth62h82qprp-n0-0. This is also the pod name in Kubernetes.

Installation and sandbox issues

Error:
Error: Cannot connect to the Docker daemon at unix:///var/run/docker.sock.
Is the docker daemon running?
This occurs when running Docker Desktop instead of the native Docker engine on Linux. The socket path differs.Fix for Docker Desktop on macOS:
sudo ln -s ~/Library/Containers/com.docker.docker/Data/docker.raw.sock /var/run/docker.sock
Fix for Docker Desktop on Linux:
sudo ln -s ~/.docker/desktop/docker.sock /var/run/docker.sock
Fix for Rancher Desktop on Linux:
sudo ln -s ~/.rd/docker.sock /var/run/docker.sock
If you are using another container runtime, link its socket to /var/run/docker.sock.
Error:
message: '0/1 nodes are available: 1 Insufficient cpu.
preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.'
This is common on macOS with Docker Desktop.Fix: Open Docker Desktop settings and increase resources to a minimum of 4 CPU cores and 3 GB RAM.
Error:
authentication handshake failed: x509: "Kubernetes Ingress Controller Fake Certificate" certificate is not trusted
This occurs when TLS is not properly configured in a flyte-core deployment.Fix: Enable TLS in your values.yaml:
ingress:
  host: example.com
  separateGrpcIngress: true
  separateGrpcIngressAnnotations:
    ingress.kubernetes.io/backend-protocol: "grpc"
  annotations:
    ingress.kubernetes.io/app-root: "/console"
    ingress.kubernetes.io/default-backend-redirect: "/console"
    kubernetes.io/ingress.class: haproxy
  tls:
    enabled: true
Also update your flytectl config to disable insecure mode:
admin:
  endpoint: dns:///example.com
  authType: Pkce
  insecure: false
  insecureSkipVerify: true
Error:
OPENSSL_internal:WRONG_VERSION_NUMBER
For flyte-binary: Verify that the endpoint name in your config.yaml matches the DNS names in the SSL certificate (whether self-signed or CA-issued).For sandbox: Verify the FLYTECTL_CONFIG environment variable points to the correct config file:
export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml

Execution failures

Error:
terminated with exit code (137). Reason [OOMKilled]
The container exceeded its memory limit.Fix 1: For Helm deployments, update task resource defaults in your values.yaml:
inline:
  task_resources:
    defaults:
      cpu: 100m
      memory: 100Mi
      storage: 100Mi
    limits:
      memory: 1Gi
Fix 2: Override resource limits directly in your task code:
from flytekit import Resources, task

@task(limits=Resources(mem="256Mi"))
def your_task(...):
    ...
Fix 3: For EKS deployments, adjust limits in the inline section of eks-production.yaml. Use the most recent Helm charts.
Error: Kubernetes cannot pull the task container image.Fix 1: If your environment uses a network proxy, pass the proxy configuration when starting the sandbox:
flytectl demo start --env HTTP_PROXY=<your-proxy-IP>
Fix 2: Never use latest as an image tag. Kubernetes changes the pull policy to Always for latest, forcing a pull on every pod start. Use a specific version tag:
@task(container_image="my-registry.example.com/my-image:v1.2.3")
def my_task(...):
    ...
Fix 3: If the registry requires authentication, create a Kubernetes image pull secret and configure it in your pod template.
Error:
ModuleNotFoundError: No module named 'mymodule'
Cause: The Python module is not on the container’s path.Fix: If using a custom Docker image, ensure:
  1. Your Dockerfile is at the same level as the flyte directory.
  2. An empty __init__.py exists in your project folder.
Expected directory layout:
myflyteapp/
├── Dockerfile
├── docker_build_and_tag.sh
└── flyte/
    ├── __init__.py
    └── workflows/
        ├── __init__.py
        └── example.py
Error:
FlyteScopedUserException: 'JavaPackage' object is not callable
Cause: The spark plugin is not enabled in the FlytePropeller configuration.Fix: Add spark to the enabled-plugins list in your config YAML:
tasks:
  task-plugins:
    enabled-plugins:
      - container
      - sidecar
      - K8S-ARRAY
      - spark
    default-for-task-types:
      - container: container
      - container_array: K8S-ARRAY
Error: An execution appears stuck or reports an inconsistent failed + succeeded + running state.Cause: A malformed dynamic workflow was processed by FlytePropeller. This was a known bug fixed in v1.16.4.Fix: Upgrade to Flyte v1.16.4 or later. If you cannot upgrade immediately, use RecoverExecution to resume from the last known good state:
grpcurl -plaintext \
  -d '{"id": {"project": "flytesnacks", "domain": "development", "name": "<execution-id>"}}' \
  localhost:81 flyteidl.service.AdminService/RecoverExecution

Storage and data issues

Error:
An error occurred (AccessDenied) when calling the PutObject operation
Cause: The Kubernetes service account Flyte uses does not have the correct IAM role annotation for IRSA (IAM Roles for Service Accounts).Fix 1: Verify the service account annotation:
kubectl describe sa <my-flyte-sa> -n <flyte-namespace>
Expected output should include:
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::<account-id>:role/flyte-system-role
Fix 2: If the annotation is missing, add it manually:
kubectl annotate serviceaccount -n <flyte-namespace> <my-flyte-sa> \
  eks.amazonaws.com/role-arn=arn:aws:iam::<account-id>:role/<flyte-iam-role>
Refer to the community-maintained Flyte the Hard Way guide for full EKS IAM configuration.
When running the local sandbox, Minio is available at:For debugging, set these environment variables when running tasks locally:
export FLYTE_AWS_ENDPOINT="http://localhost:30002"
export FLYTE_AWS_ACCESS_KEY_ID="minio"
export FLYTE_AWS_SECRET_ACCESS_KEY="miniostorage"

Authentication issues

Error: rpc error: code = UnauthenticatedFix 1: Re-authenticate:
flytectl config init --host flyte.example.com
Fix 2: Verify your config file has the correct auth settings:
admin:
  endpoint: dns:///flyte.example.com
  authType: Pkce        # or ClientSecret for service accounts
  insecure: false
Fix 3: For development/sandbox, you can disable auth entirely:
admin:
  endpoint: dns:///localhost:30080
  insecure: true
After running flytectl demo start, the sandbox config is written to ~/.flyte/config-sandbox.yaml. Export it:
export FLYTECTL_CONFIG=~/.flyte/config-sandbox.yaml
Add this to your shell profile to persist it across sessions.

Getting more help

GitHub Issues

Open a bug report or feature request.

Slack Community

Get real-time help in the #ask-the-community channel.

GitHub Discussions

Ask questions or share ideas with the community.

Documentation

Browse the official Flyte documentation.

Build docs developers (and LLMs) love