Skip to main content
Temporal Server is designed to scale horizontally across multiple dimensions. This guide covers scaling strategies and performance tuning.

Architecture for Scaling

Temporal consists of four independently scalable services:
  • Frontend - API gateway, request routing
  • History - Workflow state management, sharded by workflow ID
  • Matching - Task queue management, poll handling
  • Worker - System workflows (archival, replication, etc.)

Horizontal Scaling

Frontend Service

Scale based on API request rate:
services:
  frontend:
    rpc:
      grpcPort: 7233
      membershipPort: 6933
Scaling Triggers:
  • service_requests{service_role="frontend"} rate
  • service_latency{service_role="frontend"} p99 > target
  • service_pending_requests{service_role="frontend"} > threshold
Recommended Ratio: 1 frontend per 1000-2000 RPS

History Service

The most resource-intensive service. Scale based on shard count and load:
persistence:
  numHistoryShards: 4096  # Must be power of 2
Shard Distribution:
  • Each history node owns a subset of shards
  • Shards are distributed via consistent hashing
  • Minimum: 1 history node
  • Recommended: Number of nodes ≤ numHistoryShards / 4
Scaling Triggers:
  • ShardController latency
  • UpdateWorkflowExecution latency
  • CPU utilization > 70%
  • Memory utilization > 80%
Example for 1000 shards:
4 history nodes = 250 shards per node
8 history nodes = 125 shards per node
16 history nodes = 62-63 shards per node

Matching Service

Scale based on task queue throughput:
services:
  matching:
    rpc:
      grpcPort: 7235
      membershipPort: 6935
Scaling Triggers:
  • PollWorkflowTaskQueue latency
  • PollActivityTaskQueue latency
  • Task dispatch latency
  • Number of unique task queues
Recommended Ratio: 1 matching node per 5000 task queues or 10000 tasks/sec

Worker Service

Scale based on system workflow load:
services:
  worker:
    rpc:
      grpcPort: 7239
      membershipPort: 6939
Scaling Triggers:
  • Archival backlog
  • Replication lag (multi-cluster)
  • System workflow queue size

Vertical Scaling

CPU Resources

History Service:
  • Minimum: 4 cores
  • Recommended: 8-16 cores
  • High throughput: 32+ cores
Other Services:
  • Minimum: 2 cores
  • Recommended: 4-8 cores

Memory Resources

History Service:
# Estimate: (numHistoryShards / historyNodes) * (avgWorkflowSize * activeWorkflows)
minimum: 4 GB
recommended: 16-32 GB
high_throughput: 64+ GB
Frontend Service:
  • Minimum: 2 GB
  • Recommended: 4-8 GB
Matching Service:
  • Minimum: 2 GB
  • Recommended: 4-8 GB
  • Add 10 MB per 1000 active task queues

Persistence Layer Scaling

Cassandra

Recommended Configuration:
persistence:
  datastores:
    default:
      cassandra:
        hosts: "cassandra-0,cassandra-1,cassandra-2"
        port: 9042
        maxConns: 20  # Per history shard
        connectTimeout: "600ms"
        timeout: "10s"
        writeTimeout: "10s"
        consistency:
          default:
            consistency: "LOCAL_QUORUM"
            serialConsistency: "LOCAL_SERIAL"
Scaling Guidelines:
  • 3-5 node minimum for production
  • Replication factor: 3
  • Add nodes when CPU > 70% or disk > 70%
  • Use separate clusters for default and visibility stores

PostgreSQL/MySQL

Connection Pool Configuration:
persistence:
  datastores:
    default:
      sql:
        maxConns: 100
        maxIdleConns: 20
        maxConnLifetime: "1h"
Connection Pool Sizing:
total_connections = numHistoryShards * (2-3 connections per shard)
max_per_node = total_connections / number_of_history_nodes
Scaling Options:
  1. Vertical scaling (increase instance size)
  2. Read replicas (not recommended for Temporal)
  3. Sharding (Vitess for MySQL)
Vitess Configuration:
persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql8"
        taskScanPartitions: 4  # Number of Vitess shards

Elasticsearch (Visibility)

Recommended Configuration:
persistence:
  datastores:
    visibility:
      elasticsearch:
        url: "http://elasticsearch:9200"
        indices:
          visibility: "temporal_visibility_v1"
        numShards: 5
        numReplicas: 1
Index Sharding:
  • Start with 5 shards
  • Increase to 10-20 for > 100M workflows
  • 1 replica minimum for production
Scaling Triggers:
  • Query latency > 1s p99
  • Index rate > 10000/sec
  • Disk usage > 85%

Dynamic Configuration

Tune performance without server restart:
# config/dynamicconfig/production.yaml

# Rate limiting
frontend.rps:
  - value: 3000
    constraints: {}

frontend.namespaceRPS:
  - value: 1000
    constraints:
      namespace: "high-priority"
  - value: 100
    constraints:
      namespace: "low-priority"

# Matching service
matching.longPollExpirationInterval:
  - value: "60s"
    constraints: {}

matching.maxTaskQueueIdleTime:
  - value: "5m"
    constraints: {}

# History service
history.cacheMaxSize:
  - value: 1024
    constraints: {}

history.historyBuilderMaxSize:
  - value: 1024
    constraints: {}

history.transferProcessorMaxPollInterval:
  - value: "1m"
    constraints: {}

history.timerProcessorMaxPollInterval:
  - value: "5m"
    constraints: {}

# Persistence
persistence.transactionSizeLimit:
  - value: 4194304  # 4MB
    constraints: {}

Dynamic Config Polling

dynamicConfigClient:
  filepath: "config/dynamicconfig/production.yaml"
  pollInterval: "10s"  # Check for updates every 10s

Caching Configuration

History Cache

Cache workflow execution state:
# Dynamic config
history.cacheMaxSize:
  - value: 2048  # Number of workflows cached
    constraints: {}

history.cacheTTL:
  - value: "1h"
    constraints: {}
Sizing:
cache_size = (active_workflows_per_shard * 0.1) to (active_workflows_per_shard * 0.5)

Events Cache

Cache workflow history events:
history.eventsCacheMaxSize:
  - value: 512  # MB
    constraints: {}

history.eventsCacheTTL:
  - value: "1h"
    constraints: {}

Workflow Limits

Enforce limits to prevent resource exhaustion:
# Workflow execution limits
limit.maxIDLength:
  - value: 1000
    constraints: {}

limit.historySize:
  - value: 51200  # 50KB default, 50MB max
    constraints: {}

limit.historyCount:
  - value: 51200
    constraints: {}

limit.pendingActivities:
  - value: 2000
    constraints: {}

limit.pendingChildWorkflows:
  - value: 2000
    constraints: {}

limit.pendingCancelRequests:
  - value: 2000
    constraints: {}

Network Optimization

gRPC Keep-Alive

services:
  frontend:
    rpc:
      keepAliveServerConfig:
        keepAliveServerParameters:
          maxConnectionIdle: "15m"
          maxConnectionAge: "30m"
          maxConnectionAgeGrace: "5m"
          keepAliveTime: "30s"
          keepAliveTimeout: "10s"
        keepAliveEnforcementPolicy:
          minTime: "10s"
          permitWithoutStream: true
      clientConnectionConfig:
        keepAliveClientParameters:
          keepAliveTime: "30s"
          keepAliveTimeout: "10s"
          keepAlivePermitWithoutStream: true

Connection Limits

services:
  frontend:
    rpc:
      grpcPort: 7233
      # Set OS-level limits:
      # - Max open files: 65536+
      # - TCP backlog: 4096+

Monitoring for Scaling

Key Metrics

Throughput:
# Workflow starts per second
rate(service_requests{operation="StartWorkflowExecution"}[5m])

# Task completions per second
rate(service_requests{operation=~"RespondWorkflowTaskCompleted|RespondActivityTaskCompleted"}[5m])
Latency:
# API latency p99
histogram_quantile(0.99, rate(service_latency_bucket[5m]))

# Persistence latency p99
histogram_quantile(0.99, rate(UpdateWorkflowExecution_latency_bucket[5m]))
Resource Utilization:
# Pending requests (should be low)
service_pending_requests

# Active connections
service_grpc_conn_active

# Lock contention
rate(lock_latency_sum[5m]) / rate(lock_latency_count[5m])

Scaling Decision Matrix

SymptomScaleConfiguration
High frontend latencyFrontend nodesAdd instances
High persistence latencyDatabaseVertical scale or add nodes
Shard ownership changesHistory nodesAdd instances
Task dispatch delaysMatching nodesAdd instances
CPU > 80%Service nodesAdd instances or increase CPU
Memory > 85%Service nodesAdd instances or increase memory
High cache missesHistory cacheIncrease cacheMaxSize
DLQ messagesWorker nodesAdd instances

Performance Best Practices

1. Shard Count Planning

Choose shard count at deployment time:
Low volume (< 100 workflows/sec): 512 shards
Medium (100-1000/sec): 2048 shards
High (> 1000/sec): 4096-8192 shards
Note: Cannot change shard count after deployment.

2. Task Queue Design

  • Use fewer, busier task queues vs. many idle queues
  • Limit to < 10,000 unique task queues per cluster
  • Use task queue routing for versioning

3. History Growth

  • Use Continue-As-New for long-running workflows
  • Keep workflow histories < 50KB when possible
  • Monitor history_size and history_count metrics

4. Namespace Organization

  • Separate high and low priority workloads
  • Use namespaces for isolation and rate limiting
  • Monitor per-namespace metrics

5. Batch Operations

  • Use batch API operations where available
  • Reduce individual RPC calls
  • Bundle signal sends when possible

6. Client Connection Pooling

In SDK clients:
client, err := client.NewClient(client.Options{
    HostPort: "frontend:7233",
    ConnectionOptions: client.ConnectionOptions{
        MaxConnectionIdle:     15 * time.Minute,
        MaxConnectionAge:      30 * time.Minute,
    },
})

Load Testing

Benchmarking Setup

  1. Start with baseline load
  2. Gradually increase by 20% every 30 minutes
  3. Monitor all metrics
  4. Identify bottlenecks
  5. Scale and repeat

Test Scenarios

Scenario 1: High Throughput
  • Metric: Workflow starts/sec
  • Target: Saturate frontend/history
Scenario 2: High Active Workflows
  • Metric: Concurrent workflow executions
  • Target: Saturate history service memory
Scenario 3: High Task Dispatch
  • Metric: Task polls/sec
  • Target: Saturate matching service
Scenario 4: Large Workflows
  • Metric: History size
  • Target: Test persistence limits

Troubleshooting Performance

High Latency

  1. Check persistence latency
  2. Review cache hit rates
  3. Verify network connectivity
  4. Check for lock contention

Memory Issues

  1. Reduce cache sizes
  2. Decrease workflow retention
  3. Add more history nodes
  4. Review workflow history sizes

CPU Saturation

  1. Add service instances
  2. Optimize workflow code
  3. Reduce task processing frequency
  4. Check for inefficient queries

See Also

Build docs developers (and LLMs) love