Storage configuration

Flyte uses object storage for all task inputs, outputs, intermediate data, and workflow metadata. Storage is configured via the stow library, which provides a unified interface over multiple cloud backends.

How Flyte uses storage

Data type	Storage path	Who writes it
Workflow metadata (launch plans, executions)	`metadataContainer`	FlyteAdmin
Task input/output literals	`userDataContainer`	FlyteCopilot sidecar
Large datasets (offloaded literals)	`userDataContainer`	FlytePropeller
Cached task outputs	`userDataContainer`	DataCatalog

The metadataContainer and userDataContainer can point to the same bucket. Using separate buckets allows independent lifecycle policies.

Configuring storage backends

S3
GCS
Azure Blob
MinIO

S3 with IRSA (recommended for EKS)

configuration:
  storage:
    metadataContainer: my-flyte-metadata
    userDataContainer: my-flyte-userdata
    provider: s3
    providerConfig:
      s3:
        region: "us-east-1"
        authType: "iam"    # Uses pod IAM role / IRSA — no static keys

S3 with static access keys

configuration:
  storage:
    metadataContainer: my-flyte-metadata
    userDataContainer: my-flyte-userdata
    provider: s3
    providerConfig:
      s3:
        region: "us-east-1"
        authType: "accesskey"
        accessKey: "<ACCESS_KEY_ID>"
        secretKey: "<SECRET_ACCESS_KEY>"

Inline stow config (for advanced options)

configuration:
  inline:
    storage:
      type: stow
      stow:
        kind: s3
        config:
          region: us-east-1
          auth_type: iam
      container: my-flyte-bucket
      limits:
        maxDownloadMBs: 1000

GCS with Workload Identity (recommended for GKE)

configuration:
  storage:
    metadataContainer: my-flyte-bucket
    userDataContainer: my-flyte-bucket
    provider: gcs
    providerConfig:
      gcs:
        project: "my-gcp-project"

The GKE node pool’s service account or Workload Identity binding is used automatically when project is set.

GCS with a service account key

Mount a JSON key file as a Kubernetes Secret and set GOOGLE_APPLICATION_CREDENTIALS:

configuration:
  inline:
    storage:
      type: stow
      stow:
        kind: google
        config:
          json: ""
          project_id: my-gcp-project
          scopes: https://www.googleapis.com/auth/devstorage.read_write
      container: my-flyte-bucket

configuration:
  storage:
    metadataContainer: my-flyte-container
    userDataContainer: my-flyte-container
    provider: azure
    providerConfig:
      azure:
        account: "my-storage-account"
        key: "<STORAGE_ACCOUNT_KEY>"
        configDomainSuffix: ""
        configUploadConcurrency: 4

MinIO exposes an S3-compatible API. Use the s3 provider with the MinIO endpoint:

configuration:
  storage:
    metadataContainer: my-s3-bucket
    userDataContainer: my-s3-bucket
    provider: s3
    providerConfig:
      s3:
        disableSSL: true
        v2Signing: true
        endpoint: http://minio.minio-ns.svc.cluster.local:9000
        authType: accesskey
        accessKey: minio
        secretKey: miniostorage
        region: us-east-1    # Required but ignored by MinIO

configuration:
  inline:
    storage:
      signedURL:
        stowConfigOverride:
          endpoint: http://<NODE_IP>:30002  # For pre-signed URL generation

The stowConfigOverride.endpoint in signedURL must be the externally reachable MinIO endpoint so that pre-signed URLs work from outside the cluster.

Signed URLs

Flyte generates pre-signed URLs for the FlyteConsole to let users download task output files directly from the object store. The remoteData config controls how these URLs are created:

configuration:
  inline:
    remoteData:
      region: us-east-1
      scheme: aws      # aws, gcs, or azure
      signedUrls:
        durationMinutes: 3

Cache configuration

DataCatalog uses the object store to cache task output metadata. Configure the in-memory cache size:

configuration:
  inline:
    storage:
      cache:
        max_size_mbs: 10
        target_gc_percent: 100

Download limits

To protect against unexpectedly large task outputs being pulled into FlytePropeller memory, set a download limit:

configuration:
  inline:
    storage:
      limits:
        maxDownloadMBs: 1000

Offline / offloaded literal data

Large task inputs and outputs can be offloaded to the object store rather than stored inline in the workflow CRD. This is recommended for workflows that pass large datasets between tasks:

configuration:
  inline:
    propeller:
      literal-offloading-config:
        enabled: true

With offloading enabled, FlytePropeller writes large literals to userDataContainer/data/ and stores a reference in the workflow CRD instead of the raw bytes.

Getting Started

Cloud Deployment

Configuration

Security

Plugins

Storage configuration

How Flyte uses storage

Configuring storage backends

S3 with IRSA (recommended for EKS)

S3 with static access keys

Inline stow config (for advanced options)

GCS with Workload Identity (recommended for GKE)

GCS with a service account key

Signed URLs

Cache configuration

Download limits

Offline / offloaded literal data

Build docs developers (and LLMs) love

Getting Started

Cloud Deployment

Configuration

Security

Plugins

​How Flyte uses storage

​Configuring storage backends

​S3 with IRSA (recommended for EKS)

​S3 with static access keys

​Inline stow config (for advanced options)

​GCS with Workload Identity (recommended for GKE)

​GCS with a service account key

​Signed URLs

​Cache configuration

​Download limits

​Offline / offloaded literal data

Build docs developers (and LLMs) love

How Flyte uses storage

Configuring storage backends

S3 with IRSA (recommended for EKS)

S3 with static access keys

Inline stow config (for advanced options)

GCS with Workload Identity (recommended for GKE)

GCS with a service account key

Signed URLs

Cache configuration

Download limits

Offline / offloaded literal data