Overview
Cadence has a well-defined persistence layer that abstracts database operations. The system supports multiple database backends and provides flexibility in how data is stored and accessed.
Persistence Architecture
Cadence divides persistence into two major subsystems:
Core Persistence : Stores workflow execution state, domain metadata, and task lists
Visibility Persistence : Stores data for workflow search and listing operations
Configuration Structure
The top-level persistence configuration looks like this:
persistence :
defaultStore : datastore1 # Core persistence
visibilityStore : datastore2 # Visibility persistence
numHistoryShards : 1024 # Number of history shards
datastores :
datastore1 :
nosql : # or 'sql'
pluginName : "cassandra"
# ... connection params
datastore2 :
sql :
pluginName : "mysql"
# ... connection params
Number of History Shards
The numHistoryShards value is set at cluster provisioning time and cannot be changed after initialization. Choose this value carefully based on expected scale.
How to choose :
Small deployments: 128-512 shards
Medium deployments: 512-2048 shards
Large deployments: 2048-4096 shards
Implications :
More shards = higher horizontal scalability
More shards = more overhead for small clusters
Each History instance should own 100-200 shards for optimal performance
Supported Databases
Cassandra
Best for : Large-scale production deployments requiring high availability
persistence :
datastores :
cass-default :
nosql :
pluginName : "cassandra"
hosts : "127.0.0.1,127.0.0.2,127.0.0.3"
user : "username"
password : "password"
keyspace : "cadence"
datacenter : "us-east-1a" # Optional: DC filter
maxConns : 2 # Connections per host
connectTimeout : 2s
timeout : 10s
consistency : LOCAL_QUORUM # Read/write consistency
serialConsistency : LOCAL_SERIAL
Consistency Settings :
LOCAL_QUORUM: Recommended for multi-DC setups
QUORUM: Single DC deployments
LOCAL_SERIAL: For lightweight transactions
Schema Installation :
make install-schema
# Or manually:
cadence-cassandra-tool --ep 127.0.0.1 create -k cadence --rf 3
cadence-cassandra-tool --ep 127.0.0.1 -k cadence setup-schema -v 0.0
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema -d ./schema/cassandra/cadence/versioned
MySQL
Best for : Medium-scale deployments with strong consistency requirements
persistence :
datastores :
mysql-default :
sql :
pluginName : "mysql"
databaseName : "cadence"
connectAddr : "127.0.0.1:3306"
connectProtocol : "tcp"
user : "uber"
password : "uber"
maxConns : 20
maxIdleConns : 20
maxConnLifetime : "1h"
connectAttributes :
tx_isolation : "READ-COMMITTED" # Required for MySQL 5.6
Isolation Level : READ-COMMITTED (default)
For MySQL 5.6 and below, you must explicitly specify tx_isolation: "READ-COMMITTED" in connectAttributes.
Schema Installation :
make install-schema-mysql
# Or manually:
cadence-sql-tool --ep 127.0.0.1 create --db cadence
cadence-sql-tool --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --ep 127.0.0.1 --db cadence update-schema -d ./schema/mysql/v8/cadence/versioned
User Setup :
CREATE USER ' uber '@ '%' IDENTIFIED BY 'uber' ;
GRANT ALL PRIVILEGES ON * . * TO 'uber' @ '%' ;
PostgreSQL
Best for : Deployments preferring open-source SQL databases
persistence :
datastores :
postgres-default :
sql :
pluginName : "postgres"
databaseName : "cadence"
connectAddr : "127.0.0.1:5432"
connectProtocol : "tcp"
user : "postgres"
password : "cadence"
maxConns : 20
maxIdleConns : 20
maxConnLifetime : "1h"
Schema Installation :
make install-schema-postgres
# Or manually:
cadence-sql-tool --plugin postgres --ep 127.0.0.1 create --db cadence
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence setup-schema -v 0.0
cadence-sql-tool --plugin postgres --ep 127.0.0.1 --db cadence update-schema -d ./schema/postgres/cadence/versioned
User Setup :
psql postgres
postgres = # CREATE USER postgres WITH PASSWORD 'cadence' ;
postgres = # ALTER USER postgres WITH SUPERUSER;
Multiple Database Sharding (SQL)
For large-scale deployments, Cadence supports sharding across multiple SQL databases:
persistence :
datastores :
mysql-sharded :
sql :
pluginName : "mysql"
connectProtocol : "tcp"
maxConnLifetime : "1h"
useMultipleDatabases : true
nShards : 4
multipleDatabasesConfig :
- user : "root"
password : "cadence"
connectAddr : "127.0.0.1:3306"
databaseName : "cadence0"
- user : "root"
password : "cadence"
connectAddr : "127.0.0.1:3306"
databaseName : "cadence1"
- user : "root"
password : "cadence"
connectAddr : "127.0.0.1:3306"
databaseName : "cadence2"
- user : "root"
password : "cadence"
connectAddr : "127.0.0.1:3306"
databaseName : "cadence3"
Sharding Strategy
How data is sharded :
Workflow Executions : dbShardID = historyShardID % numDBShards
Where historyShardID = hash(workflowID) % numHistoryShards
History Events : dbShardID = hash(treeID) % numDBShards
treeID is usually the runID
Tasks : dbShardID = hash(domainID + tasklistName) % numDBShards
Visibility : dbShardID = hash(domainID) % numDBShards
Requires advanced visibility (Elasticsearch/Pinot)
Domain Metadata : dbShardID = 0 (not sharded)
Queues : dbShardID = 0 (not sharded)
Multiple database mode requires advanced visibility (Elasticsearch or Pinot) due to cross-shard query limitations.
Data Model
Core Schema Tables
Domains Table
Stores domain metadata:
CREATE TABLE domains (
id BINARY ( 16 ) NOT NULL ,
name VARCHAR ( 255 ) NOT NULL UNIQUE ,
data BLOB NOT NULL ,
data_encoding VARCHAR ( 16 ) NOT NULL ,
is_global BOOLEAN NOT NULL ,
PRIMARY KEY (id)
);
Shards Table
Tracks history shard ownership and state:
CREATE TABLE shards (
shard_id INT NOT NULL ,
range_id BIGINT NOT NULL ,
data BLOB NOT NULL ,
data_encoding VARCHAR ( 16 ) NOT NULL ,
PRIMARY KEY (shard_id)
);
Executions Table
Stores workflow execution mutable state:
CREATE TABLE executions (
shard_id INT NOT NULL ,
domain_id BINARY ( 16 ) NOT NULL ,
workflow_id VARCHAR ( 255 ) NOT NULL ,
run_id BINARY ( 16 ) NOT NULL ,
type INT NOT NULL , -- execution type
data BLOB,
data_encoding VARCHAR ( 16 ),
PRIMARY KEY (shard_id, domain_id, workflow_id, run_id, type )
);
In Cassandra, shards, executions, and tasks are stored in the same executions table using different type values to leverage Lightweight Transactions (LWT).
History Tables
events Table : Stores immutable workflow history events
CREATE TABLE events (
domain_id BINARY ( 16 ) NOT NULL ,
workflow_id VARCHAR ( 255 ) NOT NULL ,
run_id BINARY ( 16 ) NOT NULL ,
first_event_id BIGINT NOT NULL ,
range_id BIGINT NOT NULL ,
tx_id BIGINT NOT NULL ,
data BLOB,
data_encoding VARCHAR ( 16 ),
PRIMARY KEY (domain_id, workflow_id, run_id, first_event_id)
);
Task Tables
transfer_tasks : Immediate execution tasks
CREATE TABLE transfer_tasks (
shard_id INT NOT NULL ,
task_id BIGINT NOT NULL ,
data BLOB NOT NULL ,
data_encoding VARCHAR ( 16 ) NOT NULL ,
PRIMARY KEY (shard_id, task_id)
);
timer_tasks : Time-delayed tasks
CREATE TABLE timer_tasks (
shard_id INT NOT NULL ,
visibility_timestamp DATETIME ( 6 ) NOT NULL ,
task_id BIGINT NOT NULL ,
data BLOB NOT NULL ,
data_encoding VARCHAR ( 16 ) NOT NULL ,
PRIMARY KEY (shard_id, visibility_timestamp, task_id)
);
replication_tasks : Cross-DC replication tasks
CREATE TABLE replication_tasks (
shard_id INT NOT NULL ,
task_id BIGINT NOT NULL ,
data BLOB NOT NULL ,
data_encoding VARCHAR ( 16 ) NOT NULL ,
PRIMARY KEY (shard_id, task_id)
);
Visibility Schema
Basic visibility (using same database):
CREATE TABLE executions_visibility (
domain_id BINARY ( 16 ) NOT NULL ,
workflow_id VARCHAR ( 255 ) NOT NULL ,
run_id BINARY ( 16 ) NOT NULL ,
start_time DATETIME ( 6 ) NOT NULL ,
execution_time DATETIME ( 6 ) NOT NULL ,
close_time DATETIME ( 6 ),
workflow_type_name VARCHAR ( 255 ) NOT NULL ,
status INT ,
history_length BIGINT ,
memo BLOB,
encoding VARCHAR ( 16 ),
task_list VARCHAR ( 255 ),
PRIMARY KEY (domain_id, workflow_id, run_id)
);
Indexes :
CREATE INDEX by_type_start_time ON executions_visibility
(domain_id, workflow_type_name, status , start_time DESC , run_id);
CREATE INDEX by_workflow_id_start_time ON executions_visibility
(domain_id, workflow_id, status , start_time DESC , run_id);
CREATE INDEX by_status_by_start_time ON executions_visibility
(domain_id, status , start_time DESC , run_id);
Advanced Visibility
For full-featured search capabilities, use Elasticsearch or Pinot:
Elasticsearch Configuration
persistence :
visibilityStore : es-visibility
datastores :
es-visibility :
elasticsearch :
url :
scheme : "http"
host : "127.0.0.1:9200"
indices :
visibility : cadence-visibility-dev
username : "elastic"
password : "password"
Features :
Complex search queries
Custom search attributes
Full-text search
Aggregations
Pinot Configuration
persistence :
visibilityStore : pinot-visibility
datastores :
pinot-visibility :
pinotvisibility :
cluster : "pinot-cluster"
broker : "localhost:8099"
table : "cadence-visibility"
Features :
High-performance analytics
Real-time ingestion from Kafka
OLAP queries
Lower latency than Elasticsearch
Persistence Plugin Interface
Cadence supports custom database implementations through plugin interfaces:
SQL Plugin Interface
// SQL plugin interface
type Plugin interface {
CreateDB ( cfg * config . SQL ) ( DB , error )
CreateAdminDB ( cfg * config . SQL ) ( AdminDB , error )
}
type DB interface {
PluginName () string
// Domain operations
InsertIntoDomain ( ctx context . Context , row * DomainRow ) error
SelectFromDomain ( ctx context . Context , filter * DomainFilter ) ( * DomainRow , error )
// Execution operations
InsertIntoExecutions ( ctx context . Context , row * ExecutionRow ) error
SelectFromExecutions ( ctx context . Context , filter * ExecutionFilter ) ( * ExecutionRow , error )
// ... more operations
}
NoSQL Plugin Interface
// NoSQL plugin interface for Cassandra, DynamoDB, etc.
type Plugin interface {
CreateDB ( cfg * config . NoSQL ) ( DB , error )
}
type DB interface {
PluginName () string
// Supports multi-row conditional writes
InsertWorkflowExecution ( ctx context . Context , row * WorkflowExecutionRow ) error
SelectWorkflowExecution ( ctx context . Context , shardID int ,
domainID , workflowID , runID string ) ( * WorkflowExecutionRow , error )
// ... more operations
}
Requirements for Custom Databases :
SQL databases must support :
Explicit transactions
Pessimistic locking
ACID guarantees
NoSQL databases must support :
Multi-row single-shard conditional writes (like Cassandra LWT)
Strong consistency for read/write operations
Cassandra Optimization
# Recommended settings
consistency : LOCAL_QUORUM
serialConsistency : LOCAL_SERIAL
maxConns : 2 # per host
timeout : 10s
connectTimeout : 2s
Best Practices :
Use SSDs for storage
Configure appropriate compaction strategy
Monitor read/write latencies
Set up proper replication factor (RF=3 recommended)
MySQL/PostgreSQL Optimization
maxConns : 20
maxIdleConns : 20
maxConnLifetime : "1h"
Best Practices :
Enable connection pooling
Configure appropriate buffer pool size
Use read replicas for visibility queries
Regular VACUUM (PostgreSQL) or OPTIMIZE (MySQL)
Monitoring Metrics
Key persistence metrics to monitor:
Latency : P50, P99 for operations
Throughput : Requests per second
Error Rate : Failed operations
Connection Pool : Active/idle connections
Queue Depth : Pending operations
Migration and Upgrades
Schema Versioning
Cadence uses versioned schema migrations:
# Check current version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence version
# Update to latest version
cadence-cassandra-tool --ep 127.0.0.1 -k cadence update-schema \
-d ./schema/cassandra/cadence/versioned
Zero-Downtime Upgrades
Apply schema changes (backward compatible)
Deploy new Cadence version
Gradually roll out to all instances
Finalize schema migration if needed
Next Steps
Cross-DC Replication Set up multi-region active-active deployments
Service Architecture Learn about Cadence service components