@retry decorator specifies the number of times a task should be retried if it fails.
Basic Usage
Description
The@retry decorator is useful for handling transient errors such as network issues, temporary resource unavailability, or other intermittent failures. When a step fails, Metaflow will automatically retry it according to the specified configuration.
Important: If your task contains operations that can’t be retried safely (e.g., database updates, API calls that aren’t idempotent), use @retry(times=0) to disable retries.
Parameters
Number of times to retry this task on failure. The total number of attempts will be
times + 1 (the original attempt plus retries).Number of minutes to wait between retry attempts.
Examples
Basic Retry
Custom Retry Delay
Disable Retries
Combining with Other Decorators
Combining with @catch
The@retry decorator works well with @catch. After all retries are exhausted, @catch will execute fallback code:
Retry Behavior
When a step fails:- Metaflow waits for
minutes_between_retriesminutes - The entire step is re-executed from the beginning
- All previous artifacts from the failed attempt are discarded
- This continues until either:
- The step succeeds, OR
- All retry attempts are exhausted
Detecting Retries
You can check if a step is being retried using thecurrent object:
Best Practices
- Use for transient failures: Retries work well for network issues, cloud API throttling, and temporary resource unavailability
- Idempotency: Ensure your step can be safely re-executed multiple times
- Set appropriate delays: Use
minutes_between_retriesto avoid overwhelming external services - Combine with timeout: Always use
@timeoutwith@retryto prevent infinite hangs - Disable when needed: Use
@retry(times=0)for non-idempotent operations
Common Patterns
Network Operations
Cloud API Calls
Exponential Backoff Pattern
Limitations
- Maximum retry count is limited by
MAX_ATTEMPTSin Metaflow configuration - The total number of attempts (original + retries +
@catchfallback) must not exceedMAX_ATTEMPTS - Retries consume additional compute resources and time
