Overview
Workflows and tasks can be configured to timeout after a defined amount of time. Timeouts are essential for preventing workflows from hanging indefinitely and ensuring that resources are not consumed by stalled operations.
When a timeout occurs, runtimes must abruptly interrupt the execution of the workflow/task, and must raise an error that, if uncaught, forces the workflow/task to transition to the faulted status phase.
Timeout Errors
When a timeout occurs, the runtime raises a standardized error:
- Type:
https://serverlessworkflow.io/spec/1.0.0/errors/timeout
- Status:
408 (Request Timeout)
timeoutError:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
status: 408
title: Timeout
detail: The operation exceeded the configured timeout duration
instance: /do/longRunningTask
Timeout errors can be caught and handled like any other error using try-catch blocks.
Workflow Timeouts
Workflow-level timeouts apply to the entire workflow execution:
Basic Workflow Timeout
document:
dsl: '1.0.3'
namespace: default
name: timed-workflow
version: '1.0.0'
timeout:
after:
minutes: 30
do:
- step1:
call: function1
- step2:
call: function2
- step3:
call: function3
Duration object specifying when the workflow should timeout
Workflow Timeout with Error Handling
You cannot directly catch a workflow timeout within the workflow itself, but you can design workflows to be resilient:
document:
dsl: '1.0.3'
namespace: default
name: resilient-workflow
version: '1.0.0'
timeout:
after:
hours: 2
do:
- processWithTimeout:
# Each task has its own timeout to fail faster
timeout:
after:
minutes: 15
call: longOperation
- handleSuccess:
call: successHandler
Task Timeouts
Individual tasks can have their own timeout configuration:
Basic Task Timeout
do:
- fetchData:
call: http
with:
method: get
endpoint:
uri: https://api.example.com/data
timeout:
after:
seconds: 30
Duration object specifying when the task should timeout
Task Timeout with Error Handling
do:
- fetchDataWithTimeout:
try:
call: http
with:
method: get
endpoint:
uri: https://api.example.com/large-dataset
timeout:
after:
minutes: 5
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
do:
- logTimeout:
call: logger
with:
message: Data fetch timed out after 5 minutes
- useCachedData:
call: cache
with:
key: last-known-dataset
Multiple Tasks with Different Timeouts
do:
- quickOperation:
call: fastService
timeout:
after:
seconds: 5
- mediumOperation:
call: standardService
timeout:
after:
seconds: 30
- longOperation:
call: slowService
timeout:
after:
minutes: 10
Serverless Workflow supports multiple duration units for specifying timeouts:
Duration Units
| Unit | Property | Example |
|---|
| Milliseconds | milliseconds | milliseconds: 500 |
| Seconds | seconds | seconds: 30 |
| Minutes | minutes | minutes: 5 |
| Hours | hours | hours: 2 |
| Days | days | days: 1 |
Duration Examples
Milliseconds
timeout:
after:
milliseconds: 500
Seconds
timeout:
after:
seconds: 30
Minutes
timeout:
after:
minutes: 5
Hours
Days
Combined Duration Units
You can combine multiple duration units:
timeout:
after:
hours: 2
minutes: 30
seconds: 45
This creates a timeout of 2 hours, 30 minutes, and 45 seconds (9,045 seconds total).
Complex Duration Example
processLargeDataset:
call: dataProcessor
with:
dataset: ${ .largeDataset }
timeout:
after:
days: 0
hours: 1
minutes: 30
seconds: 0
milliseconds: 0
Timeout Patterns
Pattern: Fast-Fail with Fallback
do:
- tryPrimaryWithTimeout:
try:
call: http
with:
method: get
endpoint:
uri: https://primary.example.com/api
timeout:
after:
seconds: 5
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
as: timeoutError
- useFallback:
if: ${ .timeoutError != null }
call: http
with:
method: get
endpoint:
uri: https://fallback.example.com/api
timeout:
after:
seconds: 10
This pattern attempts a fast operation first, then falls back to an alternative with a longer timeout if the first times out.
Pattern: Progressive Timeout
do:
- attemptQuick:
try:
call: processor
with:
mode: quick
timeout:
after:
seconds: 10
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
as: quickTimeout
- attemptStandard:
if: ${ .quickTimeout != null }
try:
call: processor
with:
mode: standard
timeout:
after:
seconds: 60
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
as: standardTimeout
- attemptDeep:
if: ${ .standardTimeout != null }
call: processor
with:
mode: deep
timeout:
after:
minutes: 10
This pattern tries progressively slower processing modes with increasing timeouts.
Pattern: Timeout with Retry
reliableOperation:
try:
call: http
with:
method: post
endpoint:
uri: https://api.example.com/process
body: ${ .data }
timeout:
after:
seconds: 30
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
retry:
delay:
seconds: 5
backoff:
exponential:
factor: 2
limit:
attempt:
count: 3
Combining timeouts with retry policies provides robust handling for operations that may occasionally be slow.
Pattern: Partial Results on Timeout
do:
- fetchWithTimeout:
try:
call: http
with:
method: get
endpoint:
uri: https://api.example.com/complete-data
timeout:
after:
seconds: 15
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
do:
- fetchPartialData:
call: http
with:
method: get
endpoint:
uri: https://api.example.com/partial-data
query:
limit: 100
timeout:
after:
seconds: 5
Pattern: Parallel Operations with Individual Timeouts
do:
- fetchFromMultipleSources:
fork:
compete: true
branches:
- source1:
call: http
with:
method: get
endpoint:
uri: https://source1.example.com/api
timeout:
after:
seconds: 10
- source2:
call: http
with:
method: get
endpoint:
uri: https://source2.example.com/api
timeout:
after:
seconds: 15
- source3:
call: http
with:
method: get
endpoint:
uri: https://source3.example.com/api
timeout:
after:
seconds: 20
In compete mode with different timeouts, the fastest source that completes within its timeout wins.
Timeout Best Practices
Set realistic timeouts
Base timeout values on actual performance measurements, not arbitrary numbers. Monitor your services to understand typical response times.
Use task timeouts over workflow timeouts
Prefer setting timeouts on individual tasks rather than the entire workflow. This provides more granular control and better error messages.
Always handle timeout errors
Wrap operations with timeouts in try-catch blocks to handle timeout errors gracefully and provide fallback behavior.
Consider different timeout strategies
Use shorter timeouts for fast operations and longer timeouts for heavy processing. Adjust based on the criticality of the operation.
Combine with retry policies
When using retries, ensure the total retry time (attempts × delay × backoff factor) doesn’t exceed your workflow timeout.
Log timeout occurrences
Always log when timeouts occur to help identify performance issues and adjust timeout values accordingly.
Test timeout scenarios
Include timeout scenarios in your workflow tests to ensure your error handling works correctly.
Timeout Calculation Examples
Example 1: Simple Timeout
timeout:
after:
seconds: 30
Total timeout: 30 seconds
Example 2: Combined Units
timeout:
after:
minutes: 5
seconds: 30
Total timeout: (5 × 60) + 30 = 330 seconds (5 minutes 30 seconds)
Example 3: Complex Duration
timeout:
after:
hours: 2
minutes: 15
seconds: 45
milliseconds: 500
Total timeout:
- Hours: 2 × 3600 = 7,200 seconds
- Minutes: 15 × 60 = 900 seconds
- Seconds: 45 seconds
- Milliseconds: 500 milliseconds = 0.5 seconds
- Total: 8,145.5 seconds (2 hours 15 minutes 45.5 seconds)
Example 4: Days-Based Timeout
timeout:
after:
days: 1
hours: 6
Total timeout: (1 × 86400) + (6 × 3600) = 108,000 seconds (30 hours)
Common Timeout Scenarios
HTTP API Calls
apiCall:
call: http
with:
method: get
endpoint:
uri: https://api.example.com/data
timeout:
after:
seconds: 30 # Standard API timeout
Database Queries
databaseQuery:
call: database
with:
query: SELECT * FROM large_table WHERE condition = true
timeout:
after:
minutes: 5 # Longer timeout for complex queries
File Processing
processFile:
call: fileProcessor
with:
file: ${ .largeFile }
timeout:
after:
hours: 1 # Extended timeout for large file processing
Real-time Operations
realtimeCheck:
call: realtimeService
with:
data: ${ .streamData }
timeout:
after:
milliseconds: 500 # Very short timeout for real-time requirements
Batch Processing
batchProcess:
call: batchProcessor
with:
items: ${ .batchItems }
timeout:
after:
days: 1 # Long-running batch job
Monitoring and Debugging Timeouts
monitoredOperation:
try:
call: slowService
with:
data: ${ .inputData }
timeout:
after:
minutes: 5
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
as: timeoutError
do:
- logTimeoutDetails:
call: http
with:
method: post
endpoint:
uri: https://logging.example.com/timeouts
body:
workflowId: ${ $workflow.id }
taskName: ${ $task.name }
taskReference: ${ $task.reference }
startedAt: ${ $task.startedAt.iso8601 }
timeoutDuration: 300 # seconds
error: ${ .timeoutError }
Metrics Collection
metricTrackedOperation:
try:
call: trackedService
timeout:
after:
seconds: 30
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
do:
- incrementTimeoutMetric:
call: http
with:
method: post
endpoint:
uri: https://metrics.example.com/increment
body:
metric: task_timeouts
tags:
service: trackedService
workflow: ${ $workflow.definition.document.name }
Common Pitfalls
Pitfall 1: Too Short Timeouts
# Bad: Unrealistic timeout
processComplexData:
call: complexProcessor
with:
data: ${ .largeDataset }
timeout:
after:
seconds: 5 # Too short for complex processing
# Good: Realistic timeout
processComplexData:
call: complexProcessor
with:
data: ${ .largeDataset }
timeout:
after:
minutes: 10 # Appropriate for the operation
Pitfall 2: No Timeout Error Handling
# Bad: Timeout will fault the workflow
criticalOperation:
call: importantService
timeout:
after:
seconds: 30
# Good: Timeout is handled gracefully
criticalOperation:
try:
call: importantService
timeout:
after:
seconds: 30
catch:
errors:
with:
type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
do:
- handleTimeout:
call: fallbackService
Pitfall 3: Retry Timeout Exceeds Workflow Timeout
# Bad: Total retry time can exceed workflow timeout
timeout:
after:
minutes: 5
do:
- retryingOperation:
try:
call: service
catch:
errors: {}
retry:
delay:
minutes: 2
limit:
attempt:
count: 5 # 5 attempts × 2 minutes = 10 minutes > 5 minute workflow timeout
# Good: Retry times are within workflow timeout
timeout:
after:
minutes: 10
do:
- retryingOperation:
try:
call: service
timeout:
after:
seconds: 30 # Individual attempt timeout
catch:
errors: {}
retry:
delay:
seconds: 10
limit:
attempt:
count: 5 # 5 attempts × 10 seconds delay = ~50 seconds < 10 minute workflow timeout