@batch decorator specifies that a step should execute on AWS Batch.
Basic Usage
Description
The@batch decorator allows steps to run on AWS Batch, which provides managed compute resources for large-scale batch processing. AWS Batch automatically provisions the right amount of compute resources based on your requirements.
Prerequisites
- AWS credentials configured
- S3 datastore configured (
--datastore=s3) - AWS Batch job queue and compute environment set up
Parameters
Number of CPUs required for this step. If
@resources is also present, the maximum value from all decorators is used.Memory size (in MB) required for this step. If
@resources is also present, the maximum value from all decorators is used.Number of GPUs required for this step. If
@resources is also present, the maximum value from all decorators is used.Docker image to use when launching on AWS Batch. If not specified and
METAFLOW_BATCH_CONTAINER_IMAGE is set, that image is used. Otherwise, defaults to a Python image matching your Python version.AWS Batch Job Queue to submit the job to.
AWS IAM role that AWS Batch container uses to access AWS cloud resources.
AWS IAM role that AWS Batch uses to trigger AWS Fargate tasks.
The value for the size (in MiB) of the /dev/shm volume for this step. This parameter maps to the
--shm-size option in Docker.The total amount of swap memory (in MiB) a container can use for this step. This parameter is translated to the
--memory-swap option in Docker where the value is the sum of the container memory plus the max_swap value.Tune memory swappiness behavior for this step. A value of 0 causes swapping not to happen unless absolutely necessary. A value of 100 causes pages to be swapped very aggressively. Accepted values are whole numbers between 0 and 100.
Number of AWS Inferentia chips required for this step.
Number of AWS Trainium chips required for this step. Alias for
inferentia - use only one.Number of elastic fabric adapter network devices to attach to container.
Enable an explicit tmpfs mount for this step. Note that tmpfs is not available on Fargate compute environments.
Sets
METAFLOW_TEMPDIR to tmpfs_path if enabled.The value for the size (in MiB) of the tmpfs mount for this step. Defaults to 50% of the memory allocated for this step.
Path to tmpfs mount for this step.
The total amount, in GiB, of ephemeral storage to set for the task (21-200 GiB). Only relevant for Fargate compute environments.
Sets arbitrary AWS tags on the AWS Batch compute environment. Specified as string key-value pairs.
The log driver to use for the Amazon ECS container.
List of strings containing options for the chosen log driver. Example:
["awslogs-group:aws/batch/job"]Control whether the task can run as a privileged process on AWS Batch.
Examples
Basic Batch Execution
GPU-Accelerated Training
Custom Docker Image
Using AWS Inferentia
With Shared Memory
Adding AWS Tags
Runtime Override
Override batch parameters at runtime:Environment Variables
When running on AWS Batch, these environment variables are available:AWS_BATCH_JOB_ID- The job IDAWS_BATCH_JOB_ATTEMPT- The attempt numberAWS_BATCH_CE_NAME- Compute environment nameAWS_BATCH_JQ_NAME- Job queue name
Best Practices
- Right-size resources: Monitor actual usage and adjust CPU/memory accordingly
- Use spot instances: Configure your AWS Batch compute environment to use spot instances for cost savings
- Custom images: Build custom Docker images with your dependencies pre-installed
- Timeout decorators: Always use
@timeoutwith@batchto prevent hung jobs - Retry logic: Combine with
@retryto handle transient failures
