Skip to main content
Cadence Archival provides long-term storage for workflow histories and visibility records beyond the standard retention period. This enables compliance, auditing, and historical analysis while keeping the primary database lean.

Overview

Archival supports two types of data:
  • History Archival: Stores complete workflow event histories
  • Visibility Archival: Stores workflow metadata for querying
Both can be enabled independently and support multiple storage backends.

Supported Storage Backends

File Storage

Local or network filesystem storage. Use Cases: Development, testing, NFS-backed storage Configuration:
archival:
  history:
    status: "enabled"
    enableRead: true
    provider:
      filestore:
        fileMode: "0644"
        dirMode: "0755"

domainDefaults:
  archival:
    history:
      status: "enabled"
      URI: "file:///mnt/cadence/archival/history"

Amazon S3

Scalable cloud object storage. Use Cases: Production, large-scale deployments, cloud environments Configuration:
archival:
  history:
    status: "enabled"
    enableRead: true
    provider:
      s3store:
        region: "us-east-1"
  visibility:
    status: "enabled"
    enableRead: true
    provider:
      s3store:
        region: "us-east-1"

domainDefaults:
  archival:
    history:
      status: "enabled"
      URI: "s3://my-cadence-bucket"
    visibility:
      status: "enabled"
      URI: "s3://my-cadence-bucket"
Authentication: Uses AWS SDK default credential chain:
  • Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
  • IAM instance profile
  • AWS credentials file
  • ECS task role
Permissions Required:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-cadence-bucket",
        "arn:aws:s3:::my-cadence-bucket/*"
      ]
    }
  ]
}

Google Cloud Storage

Google Cloud object storage. Use Cases: GCP deployments, multi-cloud setups Configuration:
archival:
  history:
    status: "enabled"
    enableRead: true
    provider:
      gstorage:
        credentialsPath: "/path/to/keyfile.json"
  visibility:
    status: "enabled"
    enableRead: true
    provider:
      gstorage:
        credentialsPath: "/path/to/keyfile.json"

domainDefaults:
  archival:
    history:
      status: "enabled"
      URI: "gs://my-cadence-bucket/history"
    visibility:
      status: "enabled"
      URI: "gs://my-cadence-bucket/visibility"
Authentication Priority:
  1. credentialsPath in config
  2. GOOGLE_APPLICATION_CREDENTIALS environment variable
  3. Google default credentials

Setup and Configuration

Global Configuration

Enable archival at the cluster level (config.yaml):
archival:
  history:
    status: "enabled"       # Enable history archival
    enableRead: true        # Allow reading archived data
    provider:
      s3store:
        region: "us-east-1"
        # endpoint: "http://localhost:4572"  # For localstack
        # s3ForcePathStyle: true              # For localstack
  
  visibility:
    status: "enabled"
    enableRead: true
    provider:
      s3store:
        region: "us-east-1"

domainDefaults:
  archival:
    history:
      status: "enabled"
      URI: "s3://cadence-archival"
    visibility:
      status: "enabled"
      URI: "s3://cadence-archival"

Domain Configuration

Enable archival per domain:
# Register domain with archival
cadence --do my-domain domain register \
  --global_domain false \
  --retention 7 \
  --history_archival_status enabled \
  --history_uri "s3://my-bucket/history" \
  --visibility_archival_status enabled \
  --visibility_uri "s3://my-bucket/visibility"

# Update existing domain
cadence --do my-domain domain update \
  --history_archival_status enabled \
  --history_uri "s3://my-bucket/history"

Worker Configuration

Archival workers automatically start with the history service. Configure worker pool size:
services:
  history:
    archival:
      numArchiverConcurrency: 50
      archivalConcurrency: 100
      archivalPollInterval: 100ms

Storage Structure

S3 Storage Layout

s3://my-bucket/
├── <domain-id>/
│   ├── history/
│   │   └── <workflow-id>/
│   │       └── <run-id>
│   └── visibility/
│       ├── workflowTypeName/
│       │   └── <workflow-type>/
│       │       ├── startTimeout/
│       │       │   └── <timestamp>/
│       │       │       └── <run-id>
│       │       └── closeTimeout/
│       │           └── <timestamp>/
│       │               └── <run-id>
│       └── workflowID/
│           └── <workflow-id>/
│               ├── startTimeout/<timestamp>/<run-id>
│               └── closeTimeout/<timestamp>/<run-id>

File Storage Layout

/mnt/cadence/archival/
├── <domain-id>/
│   ├── history/
│   │   └── <workflow-id>/
│   │       └── <run-id>
│   └── visibility/
│       └── (same as S3)

Querying Archived Data

Viewing Archived Workflow History

# Show archived workflow
cadence --do my-domain workflow show \
  -w workflow-id \
  -r run-id

# Show with full detail
cadence --do my-domain workflow showid workflow-id

Searching Archived Visibility

S3 Visibility Query Syntax

Supported columns:
  • WorkflowID (String) - Required or WorkflowTypeName
  • WorkflowTypeName (String) - Required or WorkflowID
  • StartTime (Date)
  • CloseTime (Date)
  • SearchPrecision (String: Day, Hour, Minute, Second)
Example Queries:
# Search by workflow ID and date
cadence --do my-domain workflow listarchived \
  -q "WorkflowID='order-12345' AND StartTime='2026-01-21T00:00:00Z' AND SearchPrecision='Day'"

# Search by workflow type
cadence --do my-domain workflow listarchived \
  -q "WorkflowTypeName='OrderProcessing' AND CloseTime='2026-01-21T00:00:00Z' AND SearchPrecision='Hour'"

# With page size
cadence --do my-domain workflow listarchived \
  -ps 50 \
  -q "WorkflowTypeName='OrderProcessing' AND StartTime='2026-01-21T00:00:00Z' AND SearchPrecision='Day'"
Limitations:
  • Only = operator supported
  • Date searches require SearchPrecision
  • UTC timezone for all timestamps

Google Cloud Storage Query Syntax

Same as S3, with additional considerations:
  • StartTime and CloseTime are mutually exclusive
  • Result order not guaranteed with pagination

Archival Process

History Archival Flow

  1. Workflow completes
  2. Retention period expires
  3. History scavenger identifies candidate
  4. Archival worker archives to storage
  5. History deleted from primary database
  6. Archive marker persisted

Visibility Archival Flow

  1. Workflow closes
  2. Visibility archival triggered immediately
  3. Visibility record written to archive storage
  4. After retention expires, deleted from primary database

Archival Retry

Failed archival operations are retried:
  • Non-retriable errors: Permanent failure, workflow not archived
  • Retriable errors: Exponential backoff, max attempts
  • Progress tracking: Resume from last successful point

Local Development with LocalStack

Test S3 archival locally:
# Install LocalStack
pip install localstack

# Start LocalStack
SERVICES=s3 localstack start

# Create bucket
aws --endpoint-url=http://localhost:4572 s3 mb s3://cadence-development

# Configure Cadence
archival:
  history:
    status: "enabled"
    enableRead: true
    provider:
      s3store:
        region: "us-east-1"
        endpoint: "http://127.0.0.1:4572"
        s3ForcePathStyle: true

domainDefaults:
  archival:
    history:
      status: "enabled"
      URI: "s3://cadence-development"

Implementing Custom Archivers

History Archiver Interface

type HistoryArchiver interface {
    // Archive workflow history
    Archive(context.Context, URI, *ArchiveHistoryRequest, ...ArchiveOption) error
    
    // Get archived history
    Get(context.Context, URI, *GetHistoryRequest) (*GetHistoryResponse, error)
    
    // Validate URI format
    ValidateURI(URI) error
}

Visibility Archiver Interface

type VisibilityArchiver interface {
    // Archive visibility record
    Archive(context.Context, URI, *ArchiveVisibilityRequest, ...ArchiveOption) error
    
    // Query archived records
    Query(context.Context, URI, *QueryVisibilityRequest) (*QueryVisibilityResponse, error)
    
    // Validate URI format
    ValidateURI(URI) error
}

Implementation Steps

  1. Create package under common/archiver/
  2. Implement HistoryArchiver and/or VisibilityArchiver
  3. Update provider/provider.go to register archiver
  4. Add configuration struct to common/config.go
  5. Write unit tests
  6. Document URI format and query syntax
See filestore, s3store, or gcloud for examples.

Best Practices

Storage Management

  • Bucket Organization: Use separate buckets or prefixes per environment
  • Lifecycle Policies: Configure object expiration for cost management
  • Access Control: Restrict bucket access with IAM policies
  • Encryption: Enable server-side encryption (SSE-S3 or SSE-KMS)

Performance

  • Concurrency: Tune archival worker concurrency for throughput
  • Retention Period: Balance storage costs vs. archival load
  • Query Optimization: Use specific filters to reduce scan scope
  • Batch Retrieval: Retrieve multiple workflows in parallel

Monitoring

  • Archival Rate: Monitor workflows archived per second
  • Archival Latency: Track time from workflow completion to archive
  • Failure Rate: Alert on archival failures
  • Storage Growth: Monitor bucket size and object count

Disaster Recovery

  • Replication: Enable cross-region replication for archives
  • Backup: Regularly backup archival storage
  • Retention: Align archival retention with compliance requirements
  • Testing: Periodically test archival retrieval

Troubleshooting

Archival Not Working

Problem: Workflows not being archived Solution:
# Check domain configuration
cadence --do my-domain domain describe

# Verify archival status
cadence admin domain get-archival-state --domain my-domain

# Check scavenger logs
grep "archival" /var/log/cadence/history.log

# Test storage connectivity
aws s3 ls s3://my-bucket/

Cannot Read Archived Workflows

Problem: Archived workflow retrieval fails Solution:
  • Verify enableRead: true in config
  • Check storage permissions (GetObject)
  • Validate URI format
  • Ensure workflow was actually archived
  • Check for storage backend errors

High Archival Latency

Problem: Slow archival processing Solution:
  • Increase numArchiverConcurrency
  • Scale history service horizontally
  • Optimize storage backend (e.g., S3 request rate limits)
  • Review workflow history size (large histories are slower)

Next Steps

Build docs developers (and LLMs) love