- Data lineage — track exactly which data produced which outputs
- Memoization — cache task results and skip re-computation when inputs match
- Auto parallelization — Flyte can reason about data dependencies and schedule tasks optimally
- Simplified data access — automatic upload and download between local execution and cloud storage
- Auto-generated CLI and launch UI — the Flyte console and
pyflyte runuse type information to generate input forms
Python to Flyte type mapping
Flytekit automatically translates most Python types into Flyte types at compile time. The following table describes the full set of supported mappings:| Python Type | Flyte Type | Conversion | Notes |
|---|---|---|---|
int | Integer | Automatic | Use Python 3 type hints |
float | Float | Automatic | Use Python 3 type hints |
str | String | Automatic | Use Python 3 type hints |
bool | Boolean | Automatic | Use Python 3 type hints |
bytes / bytearray | Binary | Not supported | Use a custom type transformer |
complex | N/A | Not supported | Use a custom type transformer |
datetime.timedelta | Duration | Automatic | Use Python 3 type hints |
datetime.datetime | Datetime | Automatic | Use Python 3 type hints |
datetime.date | Datetime | Automatic | Use Python 3 type hints |
typing.List[T] / list[T] | Collection[T] | Automatic | T can be any supported type |
typing.Iterator[T] | Collection[T] | Automatic | T can be any supported type |
typing.Dict[str, V] / dict[str, V] | Map[str, V] | Automatic | V can be any supported type, including nested dicts |
dict | Binary | Automatic | Uses MessagePack serialization since flytekit 1.14 |
File / os.PathLike | FlyteFile | Automatic | Defaults to binary protocol; use FlyteFile["jpg"] to specify format |
| Directory | FlyteDirectory | Automatic | Use FlyteDirectory["protocol"] to specify file format |
@dataclass | Binary | Automatic | All fields must be annotated; uses MessagePack since flytekit 1.14 |
np.ndarray | File | Automatic | Use np.ndarray as the type hint |
pandas.DataFrame | StructuredDataset | Automatic | Column types are not preserved in the simple form |
polars.DataFrame | StructuredDataset | Automatic | Column types are not preserved in the simple form |
polars.LazyFrame | StructuredDataset | Automatic | Column types are not preserved in the simple form |
pyspark.DataFrame | StructuredDataset | Requires flytekitplugins-spark | Use pyspark.DataFrame as the type hint |
pydantic.BaseModel | Binary | Automatic | Pydantic v2 supported in flytekit ≥ 1.14 |
torch.Tensor / torch.nn.Module | File | Requires torch | Derived types also supported |
tf.keras.Model | File | Requires tensorflow | Derived types also supported |
sklearn.base.BaseEstimator | File | Requires scikit-learn | Derived types also supported |
| User-defined types | Any | Custom transformers | FlytePickle is the default fallback |
Primitive types
The simplest types map directly. Flytekit reads the Python function signature and generates a typed interface automatically:Collection types
Usetyping.List (or list) and typing.Dict (or dict) to pass collections between tasks. Flyte maps these to Collection[T] and Map[str, V] in the IDL:
Dataclasses
When you need to pass multiple values together as a structured unit, use Python’s@dataclass decorator. Since flytekit 1.14, dataclasses are serialized with MessagePack instead of Protobuf struct, which preserves integer types without conversion to float:
If you are using flytekit < 1.11.1, add
from dataclasses_json import dataclass_json to your imports and decorate your dataclass with @dataclass_json.Enum types
Limit acceptable values to a predefined set using Python’senum.Enum. Flytekit constrains task inputs and outputs to the declared values:
How Flyte represents types internally
Flyte’s type system is defined in FlyteIDL, the Protobuf-based interface definition language shared by all Flyte components. The coreLiteralType message covers the full type hierarchy:
- Primitive types (
SimpleType):INTEGER,FLOAT,STRING,BOOLEAN,DATETIME,DURATION,BINARY - Blob types (
BlobType): single-part (SINGLE) forFlyteFile, multi-part (MULTIPART) forFlyteDirectory - StructuredDataset (
StructuredDatasetType): columns with names and types, plus storage format - Collection types: homogeneous lists via
collection_type - Map types: string-keyed maps via
map_value_type - Enum types (
EnumType): predefined string values - Union types (
UnionType): tagged unions across multipleLiteralTypevariants
hash field used by DataCatalog for memoization. If the hash of inputs matches a previously cached execution, Flyte skips re-running the task entirely.
Explore the type system in depth
FlyteFile
Manage individual remote files as typed task inputs and outputs. Flyte handles upload, download, and blob storage automatically.
FlyteDirectory
Work with entire directories of files — useful for model checkpoints, dataset splits, and multi-file artifacts.
StructuredDataset
Abstract tabular type that converts between pandas, Spark, Arrow, and other dataframe libraries with optional column-level type checking.
Dataclass
Combine multiple values into a single typed structure for passing complex configurations between tasks.