Persistence Architecture
Temporal uses two types of data stores:- Default Store - Core workflow data, task queues, and system state
- Visibility Store - Workflow search and list operations
Data Store Types
Supported databases:- Cassandra - Horizontally scalable, high throughput
- PostgreSQL - ACID compliant, strong consistency
- MySQL - ACID compliant, widespread support
- SQLite - Development and testing only
Configuration
Basic Setup
History Shards
Workflow executions are sharded across multiple partitions:- Based on workflow ID hash
- Immutable after cluster creation
- Higher count = better parallelism
- Development: 1-4
- Small production: 128-512
- Medium production: 1024-2048
- Large production: 4096-16384
Cassandra Configuration
Connection Settings
maxConns- Maximum connections per host (default: 2)- Recommended: 10-20 for high throughput
- Total connections = maxConns × number of hosts × number of history nodes
Consistency Configuration
LOCAL_QUORUM- Majority of replicas in local datacenter (recommended)QUORUM- Majority across all datacentersONE- Single replica (not recommended)
TLS Configuration
Address Translation
For environments where Cassandra returns non-routable IPs:Cassandra Best Practices
- Replication Factor: 3 minimum for production
- Compaction Strategy: LeveledCompactionStrategy for temporal tables
- Read Repair: Disabled for better performance
- Monitoring: Track read/write latency, compaction lag
- Separate Clusters: Use different clusters for default and visibility
SQL Configuration (PostgreSQL/MySQL)
PostgreSQL
MySQL
Connection Pool Tuning
SQL TLS Configuration
- PostgreSQL
- MySQL
Vitess (MySQL Sharding)
For large-scale MySQL deployments:Visibility Store Configuration
Elasticsearch
- Start with 5 primary shards
- Increase to 10-20 for > 100M workflows
- Use 1-2 replicas for production
Dual Visibility
Run two visibility stores simultaneously:- Migration from one visibility store to another
- Comparing query results
- Fallback during maintenance
Schema Management
Initial Setup
Temporal provides schema files in/schema directory:
- Cassandra
- PostgreSQL
- MySQL
Schema Updates
Upgrade to newer Temporal versions:Schema Versioning
Temporal tracks schema version in the database:/schema/cassandra/temporal/versioned/- Cassandra schemas/schema/postgresql/v12/temporal/versioned/- PostgreSQL schemas/schema/mysql/v8/temporal/versioned/- MySQL schemas
Persistence Metrics
Operation Metrics
All persistence operations emit metrics:- Request count
- Error count
- Latency histogram
- Tagged with
db_kind
Monitoring Query
Critical Metrics
-
Shard Operations
GetOrCreateShard- Should be fast (< 10ms)UpdateShard- Latency impacts failover
-
Workflow Operations
UpdateWorkflowExecution- Most frequent, optimize heavilyCreateWorkflowExecution- Directly affects start rate
-
Task Operations
GetTransferTasks- Affects task dispatch latencyGetTimerTasks- Affects timer firing accuracy
Data Retention
Workflow Retention
Set retention per namespace:- Applies to closed workflows only
- History deleted after retention period
- Visibility records removed
- Does not affect running workflows
Database Cleanup
Cassandra:- Uses TTL on history tables
- Automatic compaction removes expired data
- No manual cleanup needed
- History scavenger deletes old records
- Runs as system workflow
- Configure via dynamic config:
Backup and Recovery
Cassandra Backup
PostgreSQL Backup
MySQL Backup
Recovery Considerations
- Consistency: Backup all datastores simultaneously
- Downtime: Stop Temporal services during restore
- Testing: Regularly test restore procedures
- Point-in-Time: Use transaction logs for precise recovery
Troubleshooting
High Latency
Symptoms:- Persistence metrics show high p99 latency
- Workflow operations slow
- Check database server metrics (CPU, I/O)
- Review query execution plans
- Verify connection pool not exhausted
- Check network latency to database
- Add read replicas (not recommended for writes)
Connection Pool Exhaustion
Symptoms:connection refusederrorstoo many connectionserrors
- Increase
maxConnsin config - Add more history nodes to distribute load
- Increase database connection limits
- Check for connection leaks
Data Inconsistency
Symptoms:- Workflow state doesn’t match expected
- Missing history events
- Verify consistency settings (Cassandra)
- Check for split-brain scenarios
- Review replication lag
- Verify no partial failures during writes
Schema Version Mismatch
Symptoms:schema version mismatcherrors- Server fails to start
- Check schema version:
SELECT * FROM schema_version; - Run schema update tool
- Ensure all nodes use same version
- Review schema update logs
Performance Optimization
Cassandra
- Compaction: Use LeveledCompactionStrategy
- Caching: Enable row cache for small workflows
- GC: Tune JVM for low pause times
- Replication: Use LOCAL_QUORUM for better performance
PostgreSQL
- Indexes: Ensure all indexes are healthy
- VACUUM: Run auto-vacuum regularly
- Shared Buffers: Set to 25% of RAM
- Work Memory: Increase for large queries
- Connection Pooling: Use pgBouncer
MySQL
- InnoDB Buffer Pool: Set to 70-80% of RAM
- Binary Logging: Use ROW format
- Query Cache: Disable (deprecated in 8.0)
- Connection Pooling: Use ProxySQL
See Also
- Scaling Guide - Database scaling strategies
- Monitoring - Persistence metrics
- Archival - Long-term history storage