flytekit 1.12.0, the default map_task implementation uses ArrayNode, which provides full Flyte functionality for subtask executions including caching, checkpointing, and workflow recovery.
When to use map tasks
Batch processing
Apply the same transformation to every item in a dataset without writing per-item workflow nodes.
Hyperparameter search
Train a model with each combination of hyperparameters concurrently.
Concurrent data loading
Download or preprocess multiple data shards in parallel.
Basic map task
Import the required library and define a mappable task:map_task accepts a single @task and returns a callable that takes a list as input and returns a list of results.
Controlling concurrency and fault tolerance
You can configureconcurrency and min_success_ratio on a map task:
concurrency— maximum number of subtasks running in parallel at any time. If the input list is larger than this value, Flyte runs multiple batches serially. Defaults to unbounded concurrency.min_success_ratio— minimum fraction of subtasks that must succeed before the map task is marked as successful. Allows tolerating a small number of failures.
When
min_success_ratio is set, the return type of the list must be Optional. Because failures are tolerated, some positions in the output list may not be populated.Overriding resources per subtask
Usewith_overrides to customize resource requests for individual subtask executions:
Setting task metadata
UseTaskMetadata to configure caching, retries, and other properties on the map task:
Mapping over multiple inputs
Partial binding with functools.partial
A map task accepts only one varying input. To map over one input while keeping others fixed, use functools.partial:
Binding task outputs to partials
You can also bind the output of an upstream task to a partial:Providing multiple lists
You can pass multiple lists directly to a map task. Each list maps element-wise to its corresponding parameter:You cannot provide a list as an input to a partial task. Use multiple list inputs directly on
map_task instead.ArrayNode enhancements
The ArrayNode-backedmap_task (default since flytekit 1.12.0) provides improvements over the original Kubernetes array plugin:
Wider mapping support
Wider mapping support
ArrayNode extends mapping beyond Kubernetes tasks to Python tasks, container tasks, and pod tasks.
Cache management
Cache management
Supports both cache serialization and cache overwriting for subtask executions.
Intra-task checkpointing
Intra-task checkpointing
Subtasks can checkpoint their state, improving reliability on spot or preemptible instances.
Workflow recovery
Workflow recovery
Subtasks remain recoverable during the workflow recovery process.
Subtask failure handling
Subtask failure handling
Running subtasks are aborted appropriately when a sibling subtask fails beyond the
min_success_ratio.