Skip to main content
Temporal Server persists workflow state, history, and task queues in a database. This guide covers persistence layer operations and optimization.

Persistence Architecture

Temporal uses two types of data stores:
  1. Default Store - Core workflow data, task queues, and system state
  2. Visibility Store - Workflow search and list operations

Data Store Types

Supported databases:
  • Cassandra - Horizontally scalable, high throughput
  • PostgreSQL - ACID compliant, strong consistency
  • MySQL - ACID compliant, widespread support
  • SQLite - Development and testing only

Configuration

Basic Setup

persistence:
  defaultStore: "cassandra-default"
  visibilityStore: "elasticsearch-visibility"
  numHistoryShards: 4096
  datastores:
    cassandra-default:
      cassandra:
        hosts: "127.0.0.1"
        port: 9042
        keyspace: "temporal"
        user: "temporal_user"
        password: "${DB_PASSWORD}"
    elasticsearch-visibility:
      elasticsearch:
        url: "http://elasticsearch:9200"
        indices:
          visibility: "temporal_visibility_v1"

History Shards

Workflow executions are sharded across multiple partitions:
persistence:
  numHistoryShards: 4096  # Must be power of 2: 1, 2, 4, 8, ..., 16384
Shard Selection:
  • Based on workflow ID hash
  • Immutable after cluster creation
  • Higher count = better parallelism
Recommended Shard Counts:
  • Development: 1-4
  • Small production: 128-512
  • Medium production: 1024-2048
  • Large production: 4096-16384

Cassandra Configuration

Connection Settings

persistence:
  datastores:
    default:
      cassandra:
        hosts: "cassandra-0,cassandra-1,cassandra-2"
        port: 9042
        user: "temporal_user"
        password: "${CASSANDRA_PASSWORD}"
        keyspace: "temporal"
        datacenter: "datacenter1"
        maxConns: 20
        connectTimeout: "600ms"
        timeout: "10s"
        writeTimeout: "10s"
Connection Pool:
  • maxConns - Maximum connections per host (default: 2)
  • Recommended: 10-20 for high throughput
  • Total connections = maxConns × number of hosts × number of history nodes

Consistency Configuration

persistence:
  datastores:
    default:
      cassandra:
        consistency:
          default:
            consistency: "LOCAL_QUORUM"      # Read/write consistency
            serialConsistency: "LOCAL_SERIAL" # Serial consistency
Consistency Levels:
  • LOCAL_QUORUM - Majority of replicas in local datacenter (recommended)
  • QUORUM - Majority across all datacenters
  • ONE - Single replica (not recommended)

TLS Configuration

persistence:
  datastores:
    default:
      cassandra:
        tls:
          enabled: true
          certFile: "/path/to/client-cert.pem"
          keyFile: "/path/to/client-key.pem"
          caFile: "/path/to/ca-cert.pem"
          enableHostVerification: true
          serverName: "cassandra.example.com"

Address Translation

For environments where Cassandra returns non-routable IPs:
persistence:
  datastores:
    default:
      cassandra:
        addressTranslator:
          translator: "aws"  # or "gcp", "azure"
          options:
            region: "us-east-1"

Cassandra Best Practices

  1. Replication Factor: 3 minimum for production
  2. Compaction Strategy: LeveledCompactionStrategy for temporal tables
  3. Read Repair: Disabled for better performance
  4. Monitoring: Track read/write latency, compaction lag
  5. Separate Clusters: Use different clusters for default and visibility

SQL Configuration (PostgreSQL/MySQL)

PostgreSQL

persistence:
  datastores:
    default:
      sql:
        pluginName: "postgres12"  # or "postgres12_pgx"
        databaseName: "temporal"
        connectAddr: "postgres.example.com:5432"
        connectProtocol: "tcp"
        user: "temporal_user"
        password: "${DB_PASSWORD}"
        maxConns: 100
        maxIdleConns: 20
        maxConnLifetime: "1h"
        connectAttributes:
          sslmode: "require"

MySQL

persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql8"
        databaseName: "temporal"
        connectAddr: "mysql.example.com:3306"
        connectProtocol: "tcp"
        user: "temporal_user"
        password: "${DB_PASSWORD}"
        maxConns: 100
        maxIdleConns: 20
        maxConnLifetime: "1h"
        connectAttributes:
          tx_isolation: "READ-COMMITTED"
          parseTime: "true"

Connection Pool Tuning

persistence:
  datastores:
    default:
      sql:
        maxConns: 100        # Maximum open connections
        maxIdleConns: 20     # Maximum idle connections
        maxConnLifetime: "1h" # Connection lifetime
Pool Sizing:
maxConns = (numHistoryShards / numHistoryNodes) × 2-3
For 4096 shards across 16 history nodes:
maxConns = (4096 / 16) × 2.5 = 640

SQL TLS Configuration

persistence:
  datastores:
    default:
      sql:
        connectAttributes:
          sslmode: "verify-full"
          sslcert: "/path/to/client-cert.pem"
          sslkey: "/path/to/client-key.pem"
          sslrootcert: "/path/to/ca-cert.pem"

Vitess (MySQL Sharding)

For large-scale MySQL deployments:
persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql8"
        databaseName: "temporal"
        connectAddr: "vtgate.example.com:15306"
        taskScanPartitions: 4  # Number of Vitess shards

Visibility Store Configuration

Elasticsearch

persistence:
  visibilityStore: "elasticsearch-visibility"
  datastores:
    elasticsearch-visibility:
      elasticsearch:
        version: "v7"  # or "v6", "v8"
        url: "http://elasticsearch:9200"
        username: "elastic"
        password: "${ES_PASSWORD}"
        indices:
          visibility: "temporal_visibility_v1"
        numShards: 5
        numReplicas: 1
        logLevel: "error"
Index Sharding:
  • Start with 5 primary shards
  • Increase to 10-20 for > 100M workflows
  • Use 1-2 replicas for production

Dual Visibility

Run two visibility stores simultaneously:
persistence:
  visibilityStore: "elasticsearch-primary"
  secondaryVisibilityStore: "sql-secondary"
  datastores:
    elasticsearch-primary:
      elasticsearch:
        url: "http://elasticsearch:9200"
        indices:
          visibility: "temporal_visibility_v1"
    sql-secondary:
      sql:
        pluginName: "postgres12"
        databaseName: "temporal_visibility"
Useful for:
  • Migration from one visibility store to another
  • Comparing query results
  • Fallback during maintenance

Schema Management

Initial Setup

Temporal provides schema files in /schema directory:
# Setup keyspace and schema
cassandra-tool \
  --ep 127.0.0.1 \
  --keyspace temporal \
  --replication-factor 3 \
  setup-schema \
  --version 1.0

# Setup visibility schema
cassandra-tool \
  --ep 127.0.0.1 \
  --keyspace temporal_visibility \
  --replication-factor 3 \
  setup-schema \
  --version 1.0

Schema Updates

Upgrade to newer Temporal versions:
# Check current version
temporal-sql-tool \
  --plugin postgres12 \
  --ep postgres.example.com \
  --db temporal \
  update-schema \
  --version 1.1

Schema Versioning

Temporal tracks schema version in the database:
  • /schema/cassandra/temporal/versioned/ - Cassandra schemas
  • /schema/postgresql/v12/temporal/versioned/ - PostgreSQL schemas
  • /schema/mysql/v8/temporal/versioned/ - MySQL schemas

Persistence Metrics

Operation Metrics

All persistence operations emit metrics:
# Format: Persistence{Operation}Scope
PersistenceGetWorkflowExecution
PersistenceUpdateWorkflowExecution
PersistenceCreateWorkflowExecution
PersistenceDeleteWorkflowExecution
Each emits:
  • Request count
  • Error count
  • Latency histogram
  • Tagged with db_kind

Monitoring Query

# P99 latency for workflow updates
histogram_quantile(0.99, 
  rate(PersistenceUpdateWorkflowExecution_latency_bucket[5m])
)

# Error rate
rate(PersistenceUpdateWorkflowExecution_errors[5m]) / 
rate(PersistenceUpdateWorkflowExecution_requests[5m])

Critical Metrics

  1. Shard Operations
    • GetOrCreateShard - Should be fast (< 10ms)
    • UpdateShard - Latency impacts failover
  2. Workflow Operations
    • UpdateWorkflowExecution - Most frequent, optimize heavily
    • CreateWorkflowExecution - Directly affects start rate
  3. Task Operations
    • GetTransferTasks - Affects task dispatch latency
    • GetTimerTasks - Affects timer firing accuracy

Data Retention

Workflow Retention

Set retention per namespace:
tctl namespace register \
  --namespace production \
  --retention 30  # Days
Or update existing:
tctl namespace update \
  --namespace production \
  --retention 7
Retention Behavior:
  • Applies to closed workflows only
  • History deleted after retention period
  • Visibility records removed
  • Does not affect running workflows

Database Cleanup

Cassandra:
  • Uses TTL on history tables
  • Automatic compaction removes expired data
  • No manual cleanup needed
SQL Databases:
  • History scavenger deletes old records
  • Runs as system workflow
  • Configure via dynamic config:
worker.executionsScannerEnabled:
  - value: true
    constraints: {}

worker.executionsScannerConcurrency:
  - value: 5
    constraints: {}

Backup and Recovery

Cassandra Backup

# Take snapshot
nodetool snapshot temporal
nodetool snapshot temporal_visibility

# List snapshots
nodetool listsnapshots

# Clear old snapshots
nodetool clearsnapshot -t snapshot_name

PostgreSQL Backup

# Logical backup
pg_dump temporal > temporal_backup.sql
pg_dump temporal_visibility > temporal_visibility_backup.sql

# Point-in-time recovery
pg_basebackup -D /var/lib/postgresql/backup

MySQL Backup

# Using mysqldump
mysqldump temporal > temporal_backup.sql
mysqldump temporal_visibility > temporal_visibility_backup.sql

# Using Percona XtraBackup
xtrabackup --backup --target-dir=/backup/

Recovery Considerations

  1. Consistency: Backup all datastores simultaneously
  2. Downtime: Stop Temporal services during restore
  3. Testing: Regularly test restore procedures
  4. Point-in-Time: Use transaction logs for precise recovery

Troubleshooting

High Latency

Symptoms:
  • Persistence metrics show high p99 latency
  • Workflow operations slow
Solutions:
  1. Check database server metrics (CPU, I/O)
  2. Review query execution plans
  3. Verify connection pool not exhausted
  4. Check network latency to database
  5. Add read replicas (not recommended for writes)

Connection Pool Exhaustion

Symptoms:
  • connection refused errors
  • too many connections errors
Solutions:
  1. Increase maxConns in config
  2. Add more history nodes to distribute load
  3. Increase database connection limits
  4. Check for connection leaks

Data Inconsistency

Symptoms:
  • Workflow state doesn’t match expected
  • Missing history events
Solutions:
  1. Verify consistency settings (Cassandra)
  2. Check for split-brain scenarios
  3. Review replication lag
  4. Verify no partial failures during writes

Schema Version Mismatch

Symptoms:
  • schema version mismatch errors
  • Server fails to start
Solutions:
  1. Check schema version: SELECT * FROM schema_version;
  2. Run schema update tool
  3. Ensure all nodes use same version
  4. Review schema update logs

Performance Optimization

Cassandra

  1. Compaction: Use LeveledCompactionStrategy
  2. Caching: Enable row cache for small workflows
  3. GC: Tune JVM for low pause times
  4. Replication: Use LOCAL_QUORUM for better performance

PostgreSQL

  1. Indexes: Ensure all indexes are healthy
  2. VACUUM: Run auto-vacuum regularly
  3. Shared Buffers: Set to 25% of RAM
  4. Work Memory: Increase for large queries
  5. Connection Pooling: Use pgBouncer

MySQL

  1. InnoDB Buffer Pool: Set to 70-80% of RAM
  2. Binary Logging: Use ROW format
  3. Query Cache: Disable (deprecated in 8.0)
  4. Connection Pooling: Use ProxySQL

See Also

Build docs developers (and LLMs) love