Skip to main content

Overview

Serverless Workflow is designed with resilience in mind, acknowledging that errors are an inevitable part of any system. The DSL provides robust mechanisms to identify, describe, and handle errors effectively, ensuring the workflow can recover gracefully from failures.
The fault tolerance features in Serverless Workflow enhance its robustness and reliability, making it capable of handling a wide range of failure scenarios gracefully and effectively.

Errors

Errors in Serverless Workflow are described using the Problem Details RFC (RFC 7807). This specification helps to standardize the way errors are communicated, using the instance property as a JSON Pointer to identify the specific component of the workflow that has raised the error.

Error Structure

An error follows this structure:
type: https://serverlessworkflow.io/spec/1.0.0/errors/communication
title: Service Unavailable
status: 503
detail: The service is currently unavailable. Please try again later.
instance: /do/getPetById
type
string
required
A URI reference that identifies the error type
title
string
A short, human-readable summary of the error type
status
integer
The HTTP status code for this occurrence of the error
detail
string
A human-readable explanation specific to this occurrence of the error
instance
string
A JSON Pointer to the workflow component that raised the error

Standard Error Types

The Serverless Workflow specification defines several standard error types to describe commonly known errors:
Error TypeStatusDescription
https://serverlessworkflow.io/spec/1.0.0/errors/configuration500Configuration error
https://serverlessworkflow.io/spec/1.0.0/errors/validation400Validation error
https://serverlessworkflow.io/spec/1.0.0/errors/expression400Expression evaluation error
https://serverlessworkflow.io/spec/1.0.0/errors/communication500Communication error
https://serverlessworkflow.io/spec/1.0.0/errors/timeout408Timeout error
https://serverlessworkflow.io/spec/1.0.0/errors/authorization403Authorization error
Using these standard error types ensures that workflows behave consistently across different runtimes and allows authors to rely on predictable error handling and recovery processes.

Defining Custom Errors

You can define reusable custom errors in the use section:
use:
  errors:
    businessRuleViolation:
      type: https://example.com/errors/business-rule
      status: 422
      title: Business Rule Violation
    
    insufficientFunds:
      type: https://example.com/errors/insufficient-funds
      status: 402
      title: Insufficient Funds
    
    invalidOperation:
      type: https://example.com/errors/invalid-operation
      status: 400
      title: Invalid Operation

Try-Catch Pattern

The try task enables you to attempt executing a task and handle errors gracefully:

Basic Try-Catch

tryExample:
  try:
    call: http
    with:
      method: get
      endpoint:
        uri: https://api.example.com/data
  catch:
    errors:
      with:
        status: 503
    as: serviceError
try
object
required
The task to attempt
catch
object
required
Error handling configuration
catch.errors
object
Error filter to specify which errors to catch
catch.as
string
Variable name to store the caught error

Catching Specific Errors

processPayment:
  try:
    call: paymentService
    with:
      amount: ${ .orderTotal }
      customerId: ${ .customerId }
  catch:
    errors:
      with:
        type: https://example.com/errors/insufficient-funds
    do:
      - notifyCustomer:
          call: notificationService
          with:
            message: Insufficient funds for payment

Multiple Error Handlers

processOrder:
  try:
    call: orderService
    with:
      orderId: ${ .orderId }
  catch:
    errors:
      one:
        - with:
            type: https://example.com/errors/validation
          do:
            - handleValidationError:
                call: errorLogger
                with:
                  error: Validation failed
        
        - with:
            type: https://example.com/errors/insufficient-inventory
          do:
            - handleInventoryError:
                call: inventoryService
                with:
                  action: backorder
        
        - with:
            status: 503
          retry:
            delay:
              seconds: 3
            limit:
              attempt:
                count: 5

Catch All Errors

riskyOperation:
  try:
    call: unreliableService
    with:
      data: ${ .inputData }
  catch:
    errors: {}
    as: caughtError
    do:
      - logError:
          call: logger
          with:
            error: ${ .caughtError }
      - useDefault:
          set:
            result: default-value
An empty errors object catches all errors, providing a fallback for any failure.

Retry Policies

Retry policies allow workflows to automatically retry failed operations, which is especially useful for handling transient failures.

Basic Retry

fetchData:
  try:
    call: http
    with:
      method: get
      endpoint:
        uri: https://api.example.com/data
  catch:
    errors:
      with:
        status: 503
    retry:
      delay:
        seconds: 3
      limit:
        attempt:
          count: 5
retry.delay
object
Duration to wait before retrying
retry.limit
object
Limits on retry attempts

Retry with Exponential Backoff

reliableCall:
  try:
    call: http
    with:
      method: post
      endpoint:
        uri: https://api.example.com/process
      body: ${ .data }
  catch:
    errors:
      with:
        type: https://serverlessworkflow.io/spec/1.0.0/errors/communication
    retry:
      delay:
        seconds: 1
      backoff:
        exponential:
          factor: 2
      limit:
        attempt:
          count: 5
retry.backoff.exponential
object
Exponential backoff configuration
retry.backoff.exponential.factor
number
default:"2"
Multiplication factor for each retry delay
With exponential backoff and factor 2:
  • Attempt 1: 1 second delay
  • Attempt 2: 2 seconds delay
  • Attempt 3: 4 seconds delay
  • Attempt 4: 8 seconds delay
  • Attempt 5: 16 seconds delay

Retry with Linear Backoff

steadyRetry:
  try:
    call: dataService
  catch:
    errors:
      with:
        status: 503
    retry:
      delay:
        seconds: 3
      backoff:
        linear: {}
      limit:
        attempt:
          count: 5
retry.backoff.linear
object
Linear backoff configuration (delay increases by the same amount each time)
With linear backoff:
  • Attempt 1: 3 seconds delay
  • Attempt 2: 6 seconds delay
  • Attempt 3: 9 seconds delay
  • Attempt 4: 12 seconds delay
  • Attempt 5: 15 seconds delay

Time-Based Retry Limit

timedRetry:
  try:
    call: longRunningService
  catch:
    errors:
      with:
        type: https://serverlessworkflow.io/spec/1.0.0/errors/timeout
    retry:
      delay:
        seconds: 5
      limit:
        duration:
          minutes: 10
retry.limit.duration
object
Maximum total time to spend retrying

Reusable Retry Policies

Define retry policies once and reuse them across multiple tasks:
use:
  retries:
    standardRetry:
      delay:
        seconds: 2
      backoff:
        exponential:
          factor: 2
      limit:
        attempt:
          count: 5
    
    aggressiveRetry:
      delay:
        milliseconds: 500
      backoff:
        exponential:
          factor: 1.5
      limit:
        attempt:
          count: 10
    
    patientRetry:
      delay:
        seconds: 10
      backoff:
        linear: {}
      limit:
        duration:
          minutes: 30

do:
  - criticalOperation:
      try:
        call: criticalService
      catch:
        errors:
          with:
            status: 503
        retry: standardRetry
  
  - rapidOperation:
      try:
        call: fastService
      catch:
        errors:
          with:
            type: https://serverlessworkflow.io/spec/1.0.0/errors/communication
        retry: aggressiveRetry

Advanced Error Handling Patterns

Error Recovery with Fallback

do:
  - tryPrimaryService:
      try:
        call: http
        with:
          method: get
          endpoint:
            uri: https://primary.example.com/api
      catch:
        errors:
          with:
            status: 503
        retry:
          delay:
            seconds: 2
          limit:
            attempt:
              count: 3
        as: primaryError
  
  - tryFallbackService:
      if: ${ .primaryError != null }
      try:
        call: http
        with:
          method: get
          endpoint:
            uri: https://fallback.example.com/api
      catch:
        errors:
          with:
            status: 503
        as: fallbackError
  
  - useDefaultData:
      if: ${ .primaryError != null and .fallbackError != null }
      set:
        result:
          data: default-data
          source: default

Circuit Breaker Pattern

do:
  - checkCircuitState:
      call: circuitBreakerService
      with:
        service: externalApi
  
  - callService:
      if: ${ .checkCircuitState.output.state != "open" }
      try:
        call: http
        with:
          method: get
          endpoint:
            uri: https://external.example.com/api
      catch:
        errors: {}
        do:
          - recordFailure:
              call: circuitBreakerService
              with:
                action: recordFailure
                service: externalApi
  
  - useCachedData:
      if: ${ .checkCircuitState.output.state == "open" }
      call: cacheService
      with:
        key: lastKnownGood

Saga Pattern with Compensation

do:
  - reserveInventory:
      try:
        call: inventoryService
        with:
          action: reserve
          items: ${ .orderItems }
      catch:
        errors: {}
        then: end
      export:
        as: ${ $context + { inventoryReserved: true } }
  
  - processPayment:
      try:
        call: paymentService
        with:
          amount: ${ .orderTotal }
      catch:
        errors: {}
        do:
          - compensateInventory:
              call: inventoryService
              with:
                action: release
                items: ${ .orderItems }
        then: end
      export:
        as: ${ $context + { paymentProcessed: true } }
  
  - confirmOrder:
      try:
        call: orderService
        with:
          action: confirm
      catch:
        errors: {}
        do:
          - compensatePayment:
              call: paymentService
              with:
                action: refund
                amount: ${ .orderTotal }
          - compensateInventory:
              call: inventoryService
              with:
                action: release
                items: ${ .orderItems }
        then: end

Nested Try-Catch

processWithRecovery:
  try:
    do:
      - step1:
          try:
            call: service1
          catch:
            errors:
              with:
                status: 503
            retry:
              delay:
                seconds: 2
              limit:
                attempt:
                  count: 3
      
      - step2:
          try:
            call: service2
            with:
              data: ${ .step1.output }
          catch:
            errors:
              with:
                type: https://example.com/errors/validation
            do:
              - fixData:
                  call: dataFixer
                  with:
                    data: ${ .step1.output }
              - retryStep2:
                  call: service2
                  with:
                    data: ${ .fixData.output }
  catch:
    errors: {}
    do:
      - logFailure:
          call: logger
          with:
            message: Complete process failed
      - notifyAdmin:
          call: notificationService
          with:
            recipient: [email protected]
            message: Critical workflow failure

Error Handling with Context Preservation

do:
  - initializeContext:
      set:
        processId: ${ .requestId }
        startTime: ${ now }
        status: processing
  
  - processData:
      try:
        call: processor
        with:
          data: ${ .inputData }
      catch:
        errors: {}
        as: processingError
        export:
          as: ${ $context + { 
            error: .processingError, 
            status: "failed",
            failedAt: now 
          } }
  
  - finalizeStatus:
      set:
        finalStatus: ${ if $context.error then "failed" else "success" end }

Raising Errors

The raise task explicitly raises an error:
validateInput:
  call: validator
  with:
    data: ${ .inputData }

checkValidation:
  if: ${ .validateInput.output.isValid == false }
  raise:
    error:
      type: https://serverlessworkflow.io/spec/1.0.0/errors/validation
      status: 400
      title: Validation Failed
      detail: ${ .validateInput.output.message }
raise.error
object
required
The error to raise, following RFC 7807 Problem Details format

Conditional Error Raising

do:
  - checkBusinessRules:
      call: businessRuleEngine
      with:
        data: ${ .orderData }
  
  - raiseIfViolation:
      if: ${ .checkBusinessRules.output.violations | length > 0 }
      raise:
        error:
          type: https://example.com/errors/business-rule
          status: 422
          title: Business Rule Violation
          detail: ${ .checkBusinessRules.output.violations | map(.message) | join(", ") }

Error Logging and Monitoring

Logging Errors

processWithLogging:
  try:
    call: riskyOperation
  catch:
    errors: {}
    as: caughtError
    do:
      - logError:
          call: http
          with:
            method: post
            endpoint:
              uri: https://logging.example.com/errors
            body:
              workflowId: ${ $workflow.id }
              taskName: ${ $task.name }
              error: ${ .caughtError }
              timestamp: ${ now }
      
      - handleError:
          call: errorHandler
          with:
            error: ${ .caughtError }

Metrics and Alerting

processWithMetrics:
  try:
    call: monitoredOperation
  catch:
    errors: {}
    as: operationError
    do:
      - incrementErrorCounter:
          call: http
          with:
            method: post
            endpoint:
              uri: https://metrics.example.com/increment
            body:
              metric: operation_errors
              tags:
                service: ${ $task.name }
                errorType: ${ .operationError.type }
      
      - sendAlert:
          if: ${ .operationError.status >= 500 }
          call: alertingService
          with:
            severity: high
            message: ${ .operationError.detail }

Best Practices

1

Catch specific errors first

Handle specific error types before catching general errors to provide targeted recovery strategies.
2

Use appropriate retry strategies

Apply exponential backoff for transient failures and set reasonable retry limits to avoid infinite loops.
3

Log all errors

Always log errors with sufficient context for debugging and monitoring.
4

Provide fallback mechanisms

Implement fallback strategies like cached data or default values when services are unavailable.
5

Clean up resources

Use compensation tasks to release resources when errors occur midway through a process.
6

Set appropriate timeouts

Combine error handling with timeout configuration to prevent workflows from hanging indefinitely.
7

Use standard error types

Prefer standard error types for common failure scenarios to ensure consistency across workflows.

Common Pitfalls

Catching Too Broadly

# Bad: Catches and ignores all errors
try:
  call: importantOperation
catch:
  errors: {}
  # No error handling or logging

# Good: Specific error handling with logging
try:
  call: importantOperation
catch:
  errors:
    with:
      type: https://example.com/errors/expected-error
  as: error
  do:
    - logError:
        call: logger
        with:
          error: ${ .error }

Infinite Retry Loops

# Bad: No retry limit
retry:
  delay:
    seconds: 1

# Good: Reasonable retry limit
retry:
  delay:
    seconds: 1
  limit:
    attempt:
      count: 5

Not Handling Compensation

# Bad: No compensation for partial failures
do:
  - reserveResource:
      call: reserveService
  - processResource:
      call: processService  # If this fails, resource remains reserved

# Good: Proper compensation
do:
  - reserveResource:
      try:
        call: reserveService
      catch:
        errors: {}
        then: end
  - processResource:
      try:
        call: processService
      catch:
        errors: {}
        do:
          - releaseResource:
              call: releaseService

Build docs developers (and LLMs) love