Skip to main content

Overview

Cadence has a well-defined persistence layer that abstracts database operations. The system supports multiple database backends and provides flexibility in how data is stored and accessed.

Persistence Architecture

Cadence divides persistence into two major subsystems:
  1. Core Persistence: Stores workflow execution state, domain metadata, and task lists
  2. Visibility Persistence: Stores data for workflow search and listing operations

Configuration Structure

The top-level persistence configuration looks like this:
persistence:
  defaultStore: datastore1      # Core persistence
  visibilityStore: datastore2   # Visibility persistence
  numHistoryShards: 1024        # Number of history shards
  datastores:
    datastore1:
      nosql:                      # or 'sql'
        pluginName: "cassandra"
        # ... connection params
    datastore2:
      sql:
        pluginName: "mysql"
        # ... connection params

Number of History Shards

The numHistoryShards value is set at cluster provisioning time and cannot be changed after initialization. Choose this value carefully based on expected scale.
How to choose:
  • Small deployments: 128-512 shards
  • Medium deployments: 512-2048 shards
  • Large deployments: 2048-4096 shards
Implications:
  • More shards = higher horizontal scalability
  • More shards = more overhead for small clusters
  • Each History instance should own 100-200 shards for optimal performance

Supported Databases

Cassandra

Best for: Large-scale production deployments requiring high availability
persistence:
  datastores:
    cass-default:
      nosql:
        pluginName: "cassandra"
        hosts: "127.0.0.1,127.0.0.2,127.0.0.3"
        user: "username"
        password: "password"
        keyspace: "cadence"
        datacenter: "us-east-1a"     # Optional: DC filter
        maxConns: 2                  # Connections per host
        connectTimeout: 2s
        timeout: 10s
        consistency: LOCAL_QUORUM    # Read/write consistency
        serialConsistency: LOCAL_SERIAL
Consistency Settings:
  • LOCAL_QUORUM: Recommended for multi-DC setups
  • QUORUM: Single DC deployments
  • LOCAL_SERIAL: For lightweight transactions
Schema Installation:
make install-schema
# Or manually:
cadence-cassandra-tool --ep 127.0.0.1 create -k cadence --rf 3
cadence-cassandra-tool --ep 127.0.0.1 -k cadence setup-schema -v 0.0
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema -d ./schema/cassandra/cadence/versioned

MySQL

Best for: Medium-scale deployments with strong consistency requirements
persistence:
  datastores:
    mysql-default:
      sql:
        pluginName: "mysql"
        databaseName: "cadence"
        connectAddr: "127.0.0.1:3306"
        connectProtocol: "tcp"
        user: "uber"
        password: "uber"
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"
        connectAttributes:
          tx_isolation: "READ-COMMITTED"  # Required for MySQL 5.6
Isolation Level: READ-COMMITTED (default)
For MySQL 5.6 and below, you must explicitly specify tx_isolation: "READ-COMMITTED" in connectAttributes.
Schema Installation:
make install-schema-mysql
# Or manually:
cadence-sql-tool --ep 127.0.0.1 create --db cadence
cadence-sql-tool --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --ep 127.0.0.1 --db cadence update-schema -d ./schema/mysql/v8/cadence/versioned
User Setup:
CREATE USER 'uber'@'%' IDENTIFIED BY 'uber';
GRANT ALL PRIVILEGES ON *.* TO 'uber'@'%';

PostgreSQL

Best for: Deployments preferring open-source SQL databases
persistence:
  datastores:
    postgres-default:
      sql:
        pluginName: "postgres"
        databaseName: "cadence"
        connectAddr: "127.0.0.1:5432"
        connectProtocol: "tcp"
        user: "postgres"
        password: "cadence"
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"
Schema Installation:
make install-schema-postgres
# Or manually:
cadence-sql-tool --plugin postgres --ep 127.0.0.1 create --db cadence
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence update-schema -d ./schema/postgres/cadence/versioned
User Setup:
psql postgres
postgres=# CREATE USER postgres WITH PASSWORD 'cadence';
postgres=# ALTER USER postgres WITH SUPERUSER;

Multiple Database Sharding (SQL)

For large-scale deployments, Cadence supports sharding across multiple SQL databases:
persistence:
  datastores:
    mysql-sharded:
      sql:
        pluginName: "mysql"
        connectProtocol: "tcp"
        maxConnLifetime: "1h"
        useMultipleDatabases: true
        nShards: 4
        multipleDatabasesConfig:
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence0"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence1"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence2"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence3"

Sharding Strategy

How data is sharded:
  1. Workflow Executions: dbShardID = historyShardID % numDBShards
    • Where historyShardID = hash(workflowID) % numHistoryShards
  2. History Events: dbShardID = hash(treeID) % numDBShards
    • treeID is usually the runID
  3. Tasks: dbShardID = hash(domainID + tasklistName) % numDBShards
  4. Visibility: dbShardID = hash(domainID) % numDBShards
    • Requires advanced visibility (Elasticsearch/Pinot)
  5. Domain Metadata: dbShardID = 0 (not sharded)
  6. Queues: dbShardID = 0 (not sharded)
Multiple database mode requires advanced visibility (Elasticsearch or Pinot) due to cross-shard query limitations.

Data Model

Core Schema Tables

Domains Table

Stores domain metadata:
CREATE TABLE domains (
    id BINARY(16) NOT NULL,
    name VARCHAR(255) NOT NULL UNIQUE,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    is_global BOOLEAN NOT NULL,
    PRIMARY KEY (id)
);

Shards Table

Tracks history shard ownership and state:
CREATE TABLE shards (
    shard_id INT NOT NULL,
    range_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id)
);

Executions Table

Stores workflow execution mutable state:
CREATE TABLE executions (
    shard_id INT NOT NULL,
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    type INT NOT NULL,  -- execution type
    data BLOB,
    data_encoding VARCHAR(16),
    PRIMARY KEY (shard_id, domain_id, workflow_id, run_id, type)
);
In Cassandra, shards, executions, and tasks are stored in the same executions table using different type values to leverage Lightweight Transactions (LWT).

History Tables

events Table: Stores immutable workflow history events
CREATE TABLE events (
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    first_event_id BIGINT NOT NULL,
    range_id BIGINT NOT NULL,
    tx_id BIGINT NOT NULL,
    data BLOB,
    data_encoding VARCHAR(16),
    PRIMARY KEY (domain_id, workflow_id, run_id, first_event_id)
);

Task Tables

transfer_tasks: Immediate execution tasks
CREATE TABLE transfer_tasks (
    shard_id INT NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, task_id)
);
timer_tasks: Time-delayed tasks
CREATE TABLE timer_tasks (
    shard_id INT NOT NULL,
    visibility_timestamp DATETIME(6) NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, visibility_timestamp, task_id)
);
replication_tasks: Cross-DC replication tasks
CREATE TABLE replication_tasks (
    shard_id INT NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, task_id)
);

Visibility Schema

Basic visibility (using same database):
CREATE TABLE executions_visibility (
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    start_time DATETIME(6) NOT NULL,
    execution_time DATETIME(6) NOT NULL,
    close_time DATETIME(6),
    workflow_type_name VARCHAR(255) NOT NULL,
    status INT,
    history_length BIGINT,
    memo BLOB,
    encoding VARCHAR(16),
    task_list VARCHAR(255),
    PRIMARY KEY (domain_id, workflow_id, run_id)
);
Indexes:
CREATE INDEX by_type_start_time ON executions_visibility 
    (domain_id, workflow_type_name, status, start_time DESC, run_id);

CREATE INDEX by_workflow_id_start_time ON executions_visibility
    (domain_id, workflow_id, status, start_time DESC, run_id);

CREATE INDEX by_status_by_start_time ON executions_visibility
    (domain_id, status, start_time DESC, run_id);

Advanced Visibility

For full-featured search capabilities, use Elasticsearch or Pinot:

Elasticsearch Configuration

persistence:
  visibilityStore: es-visibility
  datastores:
    es-visibility:
      elasticsearch:
        url:
          scheme: "http"
          host: "127.0.0.1:9200"
        indices:
          visibility: cadence-visibility-dev
        username: "elastic"
        password: "password"
Features:
  • Complex search queries
  • Custom search attributes
  • Full-text search
  • Aggregations

Pinot Configuration

persistence:
  visibilityStore: pinot-visibility
  datastores:
    pinot-visibility:
      pinotvisibility:
        cluster: "pinot-cluster"
        broker: "localhost:8099"
        table: "cadence-visibility"
Features:
  • High-performance analytics
  • Real-time ingestion from Kafka
  • OLAP queries
  • Lower latency than Elasticsearch

Persistence Plugin Interface

Cadence supports custom database implementations through plugin interfaces:

SQL Plugin Interface

// SQL plugin interface
type Plugin interface {
    CreateDB(cfg *config.SQL) (DB, error)
    CreateAdminDB(cfg *config.SQL) (AdminDB, error)
}

type DB interface {
    PluginName() string
    // Domain operations
    InsertIntoDomain(ctx context.Context, row *DomainRow) error
    SelectFromDomain(ctx context.Context, filter *DomainFilter) (*DomainRow, error)
    // Execution operations
    InsertIntoExecutions(ctx context.Context, row *ExecutionRow) error
    SelectFromExecutions(ctx context.Context, filter *ExecutionFilter) (*ExecutionRow, error)
    // ... more operations
}

NoSQL Plugin Interface

// NoSQL plugin interface for Cassandra, DynamoDB, etc.
type Plugin interface {
    CreateDB(cfg *config.NoSQL) (DB, error)
}

type DB interface {
    PluginName() string
    // Supports multi-row conditional writes
    InsertWorkflowExecution(ctx context.Context, row *WorkflowExecutionRow) error
    SelectWorkflowExecution(ctx context.Context, shardID int, 
        domainID, workflowID, runID string) (*WorkflowExecutionRow, error)
    // ... more operations
}
Requirements for Custom Databases: SQL databases must support:
  • Explicit transactions
  • Pessimistic locking
  • ACID guarantees
NoSQL databases must support:
  • Multi-row single-shard conditional writes (like Cassandra LWT)
  • Strong consistency for read/write operations

Database Performance Tuning

Cassandra Optimization

# Recommended settings
consistency: LOCAL_QUORUM
serialConsistency: LOCAL_SERIAL
maxConns: 2  # per host
timeout: 10s
connectTimeout: 2s
Best Practices:
  • Use SSDs for storage
  • Configure appropriate compaction strategy
  • Monitor read/write latencies
  • Set up proper replication factor (RF=3 recommended)

MySQL/PostgreSQL Optimization

maxConns: 20
maxIdleConns: 20
maxConnLifetime: "1h"
Best Practices:
  • Enable connection pooling
  • Configure appropriate buffer pool size
  • Use read replicas for visibility queries
  • Regular VACUUM (PostgreSQL) or OPTIMIZE (MySQL)

Monitoring Metrics

Key persistence metrics to monitor:
  • Latency: P50, P99 for operations
  • Throughput: Requests per second
  • Error Rate: Failed operations
  • Connection Pool: Active/idle connections
  • Queue Depth: Pending operations

Migration and Upgrades

Schema Versioning

Cadence uses versioned schema migrations:
# Check current version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence version

# Update to latest version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema \
    -d ./schema/cassandra/cadence/versioned

Zero-Downtime Upgrades

  1. Apply schema changes (backward compatible)
  2. Deploy new Cadence version
  3. Gradually roll out to all instances
  4. Finalize schema migration if needed

Next Steps

Cross-DC Replication

Set up multi-region active-active deployments

Service Architecture

Learn about Cadence service components

Build docs developers (and LLMs) love