Persistence Layer

Overview

Cadence has a well-defined persistence layer that abstracts database operations. The system supports multiple database backends and provides flexibility in how data is stored and accessed.

Persistence Architecture

Cadence divides persistence into two major subsystems:

Core Persistence: Stores workflow execution state, domain metadata, and task lists
Visibility Persistence: Stores data for workflow search and listing operations

Configuration Structure

The top-level persistence configuration looks like this:

persistence:
  defaultStore: datastore1      # Core persistence
  visibilityStore: datastore2   # Visibility persistence
  numHistoryShards: 1024        # Number of history shards
  datastores:
    datastore1:
      nosql:                      # or 'sql'
        pluginName: "cassandra"
        # ... connection params
    datastore2:
      sql:
        pluginName: "mysql"
        # ... connection params

Number of History Shards

The numHistoryShards value is set at cluster provisioning time and cannot be changed after initialization. Choose this value carefully based on expected scale.

How to choose:

Small deployments: 128-512 shards
Medium deployments: 512-2048 shards
Large deployments: 2048-4096 shards

Implications:

More shards = higher horizontal scalability
More shards = more overhead for small clusters
Each History instance should own 100-200 shards for optimal performance

Supported Databases

Cassandra

Best for: Large-scale production deployments requiring high availability

persistence:
  datastores:
    cass-default:
      nosql:
        pluginName: "cassandra"
        hosts: "127.0.0.1,127.0.0.2,127.0.0.3"
        user: "username"
        password: "password"
        keyspace: "cadence"
        datacenter: "us-east-1a"     # Optional: DC filter
        maxConns: 2                  # Connections per host
        connectTimeout: 2s
        timeout: 10s
        consistency: LOCAL_QUORUM    # Read/write consistency
        serialConsistency: LOCAL_SERIAL

Consistency Settings:

LOCAL_QUORUM: Recommended for multi-DC setups
QUORUM: Single DC deployments
LOCAL_SERIAL: For lightweight transactions

Schema Installation:

make install-schema
# Or manually:
cadence-cassandra-tool --ep 127.0.0.1 create -k cadence --rf 3
cadence-cassandra-tool --ep 127.0.0.1 -k cadence setup-schema -v 0.0
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema -d ./schema/cassandra/cadence/versioned

MySQL

Best for: Medium-scale deployments with strong consistency requirements

persistence:
  datastores:
    mysql-default:
      sql:
        pluginName: "mysql"
        databaseName: "cadence"
        connectAddr: "127.0.0.1:3306"
        connectProtocol: "tcp"
        user: "uber"
        password: "uber"
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"
        connectAttributes:
          tx_isolation: "READ-COMMITTED"  # Required for MySQL 5.6

Isolation Level: READ-COMMITTED (default)

For MySQL 5.6 and below, you must explicitly specify tx_isolation: "READ-COMMITTED" in connectAttributes.

Schema Installation:

make install-schema-mysql
# Or manually:
cadence-sql-tool --ep 127.0.0.1 create --db cadence
cadence-sql-tool --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --ep 127.0.0.1 --db cadence update-schema -d ./schema/mysql/v8/cadence/versioned

User Setup:

CREATE USER 'uber'@'%' IDENTIFIED BY 'uber';
GRANT ALL PRIVILEGES ON *.* TO 'uber'@'%';

PostgreSQL

Best for: Deployments preferring open-source SQL databases

persistence:
  datastores:
    postgres-default:
      sql:
        pluginName: "postgres"
        databaseName: "cadence"
        connectAddr: "127.0.0.1:5432"
        connectProtocol: "tcp"
        user: "postgres"
        password: "cadence"
        maxConns: 20
        maxIdleConns: 20
        maxConnLifetime: "1h"

Schema Installation:

make install-schema-postgres
# Or manually:
cadence-sql-tool --plugin postgres --ep 127.0.0.1 create --db cadence
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence update-schema -d ./schema/postgres/cadence/versioned

User Setup:

psql postgres
postgres=# CREATE USER postgres WITH PASSWORD 'cadence';
postgres=# ALTER USER postgres WITH SUPERUSER;

Multiple Database Sharding (SQL)

For large-scale deployments, Cadence supports sharding across multiple SQL databases:

persistence:
  datastores:
    mysql-sharded:
      sql:
        pluginName: "mysql"
        connectProtocol: "tcp"
        maxConnLifetime: "1h"
        useMultipleDatabases: true
        nShards: 4
        multipleDatabasesConfig:
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence0"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence1"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence2"
          - user: "root"
            password: "cadence"
            connectAddr: "127.0.0.1:3306"
            databaseName: "cadence3"

Sharding Strategy

How data is sharded:

Workflow Executions: dbShardID = historyShardID % numDBShards
- Where historyShardID = hash(workflowID) % numHistoryShards
History Events: dbShardID = hash(treeID) % numDBShards
- treeID is usually the runID
Tasks: dbShardID = hash(domainID + tasklistName) % numDBShards
Visibility: dbShardID = hash(domainID) % numDBShards
- Requires advanced visibility (Elasticsearch/Pinot)
Domain Metadata: dbShardID = 0 (not sharded)
Queues: dbShardID = 0 (not sharded)

Multiple database mode requires advanced visibility (Elasticsearch or Pinot) due to cross-shard query limitations.

Data Model

Core Schema Tables

Domains Table

Stores domain metadata:

CREATE TABLE domains (
    id BINARY(16) NOT NULL,
    name VARCHAR(255) NOT NULL UNIQUE,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    is_global BOOLEAN NOT NULL,
    PRIMARY KEY (id)
);

Shards Table

Tracks history shard ownership and state:

CREATE TABLE shards (
    shard_id INT NOT NULL,
    range_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id)
);

Executions Table

Stores workflow execution mutable state:

CREATE TABLE executions (
    shard_id INT NOT NULL,
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    type INT NOT NULL,  -- execution type
    data BLOB,
    data_encoding VARCHAR(16),
    PRIMARY KEY (shard_id, domain_id, workflow_id, run_id, type)
);

In Cassandra, shards, executions, and tasks are stored in the same executions table using different type values to leverage Lightweight Transactions (LWT).

History Tables

events Table: Stores immutable workflow history events

CREATE TABLE events (
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    first_event_id BIGINT NOT NULL,
    range_id BIGINT NOT NULL,
    tx_id BIGINT NOT NULL,
    data BLOB,
    data_encoding VARCHAR(16),
    PRIMARY KEY (domain_id, workflow_id, run_id, first_event_id)
);

Task Tables

transfer_tasks: Immediate execution tasks

CREATE TABLE transfer_tasks (
    shard_id INT NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, task_id)
);

timer_tasks: Time-delayed tasks

CREATE TABLE timer_tasks (
    shard_id INT NOT NULL,
    visibility_timestamp DATETIME(6) NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, visibility_timestamp, task_id)
);

replication_tasks: Cross-DC replication tasks

CREATE TABLE replication_tasks (
    shard_id INT NOT NULL,
    task_id BIGINT NOT NULL,
    data BLOB NOT NULL,
    data_encoding VARCHAR(16) NOT NULL,
    PRIMARY KEY (shard_id, task_id)
);

Visibility Schema

Basic visibility (using same database):

CREATE TABLE executions_visibility (
    domain_id BINARY(16) NOT NULL,
    workflow_id VARCHAR(255) NOT NULL,
    run_id BINARY(16) NOT NULL,
    start_time DATETIME(6) NOT NULL,
    execution_time DATETIME(6) NOT NULL,
    close_time DATETIME(6),
    workflow_type_name VARCHAR(255) NOT NULL,
    status INT,
    history_length BIGINT,
    memo BLOB,
    encoding VARCHAR(16),
    task_list VARCHAR(255),
    PRIMARY KEY (domain_id, workflow_id, run_id)
);

Indexes:

CREATE INDEX by_type_start_time ON executions_visibility 
    (domain_id, workflow_type_name, status, start_time DESC, run_id);

CREATE INDEX by_workflow_id_start_time ON executions_visibility
    (domain_id, workflow_id, status, start_time DESC, run_id);

CREATE INDEX by_status_by_start_time ON executions_visibility
    (domain_id, status, start_time DESC, run_id);

Advanced Visibility

For full-featured search capabilities, use Elasticsearch or Pinot:

Elasticsearch Configuration

persistence:
  visibilityStore: es-visibility
  datastores:
    es-visibility:
      elasticsearch:
        url:
          scheme: "http"
          host: "127.0.0.1:9200"
        indices:
          visibility: cadence-visibility-dev
        username: "elastic"
        password: "password"

Features:

Complex search queries
Custom search attributes
Full-text search
Aggregations

Pinot Configuration

persistence:
  visibilityStore: pinot-visibility
  datastores:
    pinot-visibility:
      pinotvisibility:
        cluster: "pinot-cluster"
        broker: "localhost:8099"
        table: "cadence-visibility"

Features:

High-performance analytics
Real-time ingestion from Kafka
OLAP queries
Lower latency than Elasticsearch

Persistence Plugin Interface

Cadence supports custom database implementations through plugin interfaces:

SQL Plugin Interface

// SQL plugin interface
type Plugin interface {
    CreateDB(cfg *config.SQL) (DB, error)
    CreateAdminDB(cfg *config.SQL) (AdminDB, error)
}

type DB interface {
    PluginName() string
    // Domain operations
    InsertIntoDomain(ctx context.Context, row *DomainRow) error
    SelectFromDomain(ctx context.Context, filter *DomainFilter) (*DomainRow, error)
    // Execution operations
    InsertIntoExecutions(ctx context.Context, row *ExecutionRow) error
    SelectFromExecutions(ctx context.Context, filter *ExecutionFilter) (*ExecutionRow, error)
    // ... more operations
}

NoSQL Plugin Interface

// NoSQL plugin interface for Cassandra, DynamoDB, etc.
type Plugin interface {
    CreateDB(cfg *config.NoSQL) (DB, error)
}

type DB interface {
    PluginName() string
    // Supports multi-row conditional writes
    InsertWorkflowExecution(ctx context.Context, row *WorkflowExecutionRow) error
    SelectWorkflowExecution(ctx context.Context, shardID int, 
        domainID, workflowID, runID string) (*WorkflowExecutionRow, error)
    // ... more operations
}

Requirements for Custom Databases: SQL databases must support:

Explicit transactions
Pessimistic locking
ACID guarantees

NoSQL databases must support:

Multi-row single-shard conditional writes (like Cassandra LWT)
Strong consistency for read/write operations

Database Performance Tuning

Cassandra Optimization

# Recommended settings
consistency: LOCAL_QUORUM
serialConsistency: LOCAL_SERIAL
maxConns: 2  # per host
timeout: 10s
connectTimeout: 2s

Best Practices:

Use SSDs for storage
Configure appropriate compaction strategy
Monitor read/write latencies
Set up proper replication factor (RF=3 recommended)

MySQL/PostgreSQL Optimization

maxConns: 20
maxIdleConns: 20
maxConnLifetime: "1h"

Best Practices:

Enable connection pooling
Configure appropriate buffer pool size
Use read replicas for visibility queries
Regular VACUUM (PostgreSQL) or OPTIMIZE (MySQL)

Monitoring Metrics

Key persistence metrics to monitor:

Latency: P50, P99 for operations
Throughput: Requests per second
Error Rate: Failed operations
Connection Pool: Active/idle connections
Queue Depth: Pending operations

Migration and Upgrades

Schema Versioning

Cadence uses versioned schema migrations:

# Check current version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence version

# Update to latest version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema \
    -d ./schema/cassandra/cadence/versioned

Zero-Downtime Upgrades

Apply schema changes (backward compatible)
Deploy new Cadence version
Gradually roll out to all instances
Finalize schema migration if needed

Next Steps

Cross-DC Replication

Set up multi-region active-active deployments

Service Architecture

Learn about Cadence service components

Get Started

Core Concepts

Architecture

Deployment

Operations

Client SDKs

​Overview

​Persistence Architecture

​Configuration Structure

​Number of History Shards

​Supported Databases

​Cassandra

​MySQL

​PostgreSQL

​Multiple Database Sharding (SQL)

​Sharding Strategy

​Data Model

​Core Schema Tables

​Domains Table

​Shards Table

​Executions Table

​History Tables

​Task Tables

​Visibility Schema

​Advanced Visibility

​Elasticsearch Configuration

​Pinot Configuration

​Persistence Plugin Interface

​SQL Plugin Interface

​NoSQL Plugin Interface

​Database Performance Tuning

​Cassandra Optimization

​MySQL/PostgreSQL Optimization

​Monitoring Metrics

​Migration and Upgrades

​Schema Versioning

​Zero-Downtime Upgrades

​Next Steps

Cross-DC Replication

Service Architecture

Build docs developers (and LLMs) love

Overview

Persistence Architecture

Configuration Structure

Number of History Shards

Supported Databases

Cassandra

MySQL

PostgreSQL

Multiple Database Sharding (SQL)

Sharding Strategy

Data Model

Core Schema Tables

Domains Table

Shards Table

Executions Table

History Tables

Task Tables

Visibility Schema

Advanced Visibility

Elasticsearch Configuration

Pinot Configuration

Persistence Plugin Interface

SQL Plugin Interface

NoSQL Plugin Interface

Database Performance Tuning

Cassandra Optimization

MySQL/PostgreSQL Optimization

Monitoring Metrics

Migration and Upgrades

Schema Versioning

Zero-Downtime Upgrades

Next Steps