Scaling and Performance Tuning

Temporal Server is designed to scale horizontally across multiple dimensions. This guide covers scaling strategies and performance tuning.

Architecture for Scaling

Temporal consists of four independently scalable services:

Frontend - API gateway, request routing
History - Workflow state management, sharded by workflow ID
Matching - Task queue management, poll handling
Worker - System workflows (archival, replication, etc.)

Horizontal Scaling

Frontend Service

Scale based on API request rate:

services:
  frontend:
    rpc:
      grpcPort: 7233
      membershipPort: 6933

Scaling Triggers:

service_requests{service_role="frontend"} rate
service_latency{service_role="frontend"} p99 > target
service_pending_requests{service_role="frontend"} > threshold

Recommended Ratio: 1 frontend per 1000-2000 RPS

History Service

The most resource-intensive service. Scale based on shard count and load:

persistence:
  numHistoryShards: 4096  # Must be power of 2

Shard Distribution:

Each history node owns a subset of shards
Shards are distributed via consistent hashing
Minimum: 1 history node
Recommended: Number of nodes ≤ numHistoryShards / 4

Scaling Triggers:

ShardController latency
UpdateWorkflowExecution latency
CPU utilization > 70%
Memory utilization > 80%

Example for 1000 shards:

history nodes = 250 shards per node
history nodes = 125 shards per node
history nodes = 62-63 shards per node

Matching Service

Scale based on task queue throughput:

services:
  matching:
    rpc:
      grpcPort: 7235
      membershipPort: 6935

Scaling Triggers:

PollWorkflowTaskQueue latency
PollActivityTaskQueue latency
Task dispatch latency
Number of unique task queues

Recommended Ratio: 1 matching node per 5000 task queues or 10000 tasks/sec

Worker Service

Scale based on system workflow load:

services:
  worker:
    rpc:
      grpcPort: 7239
      membershipPort: 6939

Scaling Triggers:

Archival backlog
Replication lag (multi-cluster)
System workflow queue size

Vertical Scaling

CPU Resources

History Service:

Minimum: 4 cores
Recommended: 8-16 cores
High throughput: 32+ cores

Other Services:

Minimum: 2 cores
Recommended: 4-8 cores

Memory Resources

History Service:

# Estimate: (numHistoryShards / historyNodes) * (avgWorkflowSize * activeWorkflows)
minimum: 4 GB
recommended: 16-32 GB
high_throughput: 64+ GB

Frontend Service:

Minimum: 2 GB
Recommended: 4-8 GB

Matching Service:

Minimum: 2 GB
Recommended: 4-8 GB
Add 10 MB per 1000 active task queues

Persistence Layer Scaling

Cassandra

Recommended Configuration:

persistence:
  datastores:
    default:
      cassandra:
        hosts: "cassandra-0,cassandra-1,cassandra-2"
        port: 9042
        maxConns: 20  # Per history shard
        connectTimeout: "600ms"
        timeout: "10s"
        writeTimeout: "10s"
        consistency:
          default:
            consistency: "LOCAL_QUORUM"
            serialConsistency: "LOCAL_SERIAL"

Scaling Guidelines:

3-5 node minimum for production
Replication factor: 3
Add nodes when CPU > 70% or disk > 70%
Use separate clusters for default and visibility stores

PostgreSQL/MySQL

Connection Pool Configuration:

persistence:
  datastores:
    default:
      sql:
        maxConns: 100
        maxIdleConns: 20
        maxConnLifetime: "1h"

Connection Pool Sizing:

total_connections = numHistoryShards * (2-3 connections per shard)
max_per_node = total_connections / number_of_history_nodes

Scaling Options:

Vertical scaling (increase instance size)
Read replicas (not recommended for Temporal)
Sharding (Vitess for MySQL)

Vitess Configuration:

persistence:
  datastores:
    default:
      sql:
        pluginName: "mysql8"
        taskScanPartitions: 4  # Number of Vitess shards

Elasticsearch (Visibility)

Recommended Configuration:

persistence:
  datastores:
    visibility:
      elasticsearch:
        url: "http://elasticsearch:9200"
        indices:
          visibility: "temporal_visibility_v1"
        numShards: 5
        numReplicas: 1

Index Sharding:

Start with 5 shards
Increase to 10-20 for > 100M workflows
1 replica minimum for production

Scaling Triggers:

Query latency > 1s p99
Index rate > 10000/sec
Disk usage > 85%

Dynamic Configuration

Tune performance without server restart:

# config/dynamicconfig/production.yaml

# Rate limiting
frontend.rps:
  - value: 3000
    constraints: {}

frontend.namespaceRPS:
  - value: 1000
    constraints:
      namespace: "high-priority"
  - value: 100
    constraints:
      namespace: "low-priority"

# Matching service
matching.longPollExpirationInterval:
  - value: "60s"
    constraints: {}

matching.maxTaskQueueIdleTime:
  - value: "5m"
    constraints: {}

# History service
history.cacheMaxSize:
  - value: 1024
    constraints: {}

history.historyBuilderMaxSize:
  - value: 1024
    constraints: {}

history.transferProcessorMaxPollInterval:
  - value: "1m"
    constraints: {}

history.timerProcessorMaxPollInterval:
  - value: "5m"
    constraints: {}

# Persistence
persistence.transactionSizeLimit:
  - value: 4194304  # 4MB
    constraints: {}

Dynamic Config Polling

dynamicConfigClient:
  filepath: "config/dynamicconfig/production.yaml"
  pollInterval: "10s"  # Check for updates every 10s

Caching Configuration

History Cache

Cache workflow execution state:

# Dynamic config
history.cacheMaxSize:
  - value: 2048  # Number of workflows cached
    constraints: {}

history.cacheTTL:
  - value: "1h"
    constraints: {}

Sizing:

cache_size = (active_workflows_per_shard * 0.1) to (active_workflows_per_shard * 0.5)

Events Cache

Cache workflow history events:

history.eventsCacheMaxSize:
  - value: 512  # MB
    constraints: {}

history.eventsCacheTTL:
  - value: "1h"
    constraints: {}

Workflow Limits

Enforce limits to prevent resource exhaustion:

# Workflow execution limits
limit.maxIDLength:
  - value: 1000
    constraints: {}

limit.historySize:
  - value: 51200  # 50KB default, 50MB max
    constraints: {}

limit.historyCount:
  - value: 51200
    constraints: {}

limit.pendingActivities:
  - value: 2000
    constraints: {}

limit.pendingChildWorkflows:
  - value: 2000
    constraints: {}

limit.pendingCancelRequests:
  - value: 2000
    constraints: {}

Network Optimization

gRPC Keep-Alive

services:
  frontend:
    rpc:
      keepAliveServerConfig:
        keepAliveServerParameters:
          maxConnectionIdle: "15m"
          maxConnectionAge: "30m"
          maxConnectionAgeGrace: "5m"
          keepAliveTime: "30s"
          keepAliveTimeout: "10s"
        keepAliveEnforcementPolicy:
          minTime: "10s"
          permitWithoutStream: true
      clientConnectionConfig:
        keepAliveClientParameters:
          keepAliveTime: "30s"
          keepAliveTimeout: "10s"
          keepAlivePermitWithoutStream: true

Connection Limits

services:
  frontend:
    rpc:
      grpcPort: 7233
      # Set OS-level limits:
      # - Max open files: 65536+
      # - TCP backlog: 4096+

Monitoring for Scaling

Key Metrics

Throughput:

# Workflow starts per second
rate(service_requests{operation="StartWorkflowExecution"}[5m])

# Task completions per second
rate(service_requests{operation=~"RespondWorkflowTaskCompleted|RespondActivityTaskCompleted"}[5m])

Latency:

# API latency p99
histogram_quantile(0.99, rate(service_latency_bucket[5m]))

# Persistence latency p99
histogram_quantile(0.99, rate(UpdateWorkflowExecution_latency_bucket[5m]))

Resource Utilization:

# Pending requests (should be low)
service_pending_requests

# Active connections
service_grpc_conn_active

# Lock contention
rate(lock_latency_sum[5m]) / rate(lock_latency_count[5m])

Scaling Decision Matrix

Symptom	Scale	Configuration
High frontend latency	Frontend nodes	Add instances
High persistence latency	Database	Vertical scale or add nodes
Shard ownership changes	History nodes	Add instances
Task dispatch delays	Matching nodes	Add instances
CPU > 80%	Service nodes	Add instances or increase CPU
Memory > 85%	Service nodes	Add instances or increase memory
High cache misses	History cache	Increase `cacheMaxSize`
DLQ messages	Worker nodes	Add instances

Performance Best Practices

1. Shard Count Planning

Choose shard count at deployment time:

Low volume (< 100 workflows/sec): 512 shards
Medium (100-1000/sec): 2048 shards
High (> 1000/sec): 4096-8192 shards

Note: Cannot change shard count after deployment.

2. Task Queue Design

Use fewer, busier task queues vs. many idle queues
Limit to < 10,000 unique task queues per cluster
Use task queue routing for versioning

3. History Growth

Use Continue-As-New for long-running workflows
Keep workflow histories < 50KB when possible
Monitor history_size and history_count metrics

4. Namespace Organization

Separate high and low priority workloads
Use namespaces for isolation and rate limiting
Monitor per-namespace metrics

5. Batch Operations

Use batch API operations where available
Reduce individual RPC calls
Bundle signal sends when possible

6. Client Connection Pooling

In SDK clients:

client, err := client.NewClient(client.Options{
    HostPort: "frontend:7233",
    ConnectionOptions: client.ConnectionOptions{
        MaxConnectionIdle:     15 * time.Minute,
        MaxConnectionAge:      30 * time.Minute,
    },
})

Load Testing

Benchmarking Setup

Start with baseline load
Gradually increase by 20% every 30 minutes
Monitor all metrics
Identify bottlenecks
Scale and repeat

Test Scenarios

Scenario 1: High Throughput

Metric: Workflow starts/sec
Target: Saturate frontend/history

Scenario 2: High Active Workflows

Metric: Concurrent workflow executions
Target: Saturate history service memory

Scenario 3: High Task Dispatch

Metric: Task polls/sec
Target: Saturate matching service

Scenario 4: Large Workflows

Metric: History size
Target: Test persistence limits

Troubleshooting Performance

High Latency

Check persistence latency
Review cache hit rates
Verify network connectivity
Check for lock contention

Memory Issues

Reduce cache sizes
Decrease workflow retention
Add more history nodes
Review workflow history sizes

CPU Saturation

Add service instances
Optimize workflow code
Reduce task processing frequency
Check for inefficient queries

Getting Started

Core Concepts

Deployment

Architecture

Operations

Advanced Features

​Architecture for Scaling

​Horizontal Scaling

​Frontend Service

​History Service

​Matching Service

​Worker Service

​Vertical Scaling

​CPU Resources

​Memory Resources

​Persistence Layer Scaling

​Cassandra

​PostgreSQL/MySQL

​Elasticsearch (Visibility)

​Dynamic Configuration

​Dynamic Config Polling

​Caching Configuration

​History Cache

​Events Cache

​Workflow Limits

​Network Optimization

​gRPC Keep-Alive

​Connection Limits

​Monitoring for Scaling

​Key Metrics

​Scaling Decision Matrix

​Performance Best Practices

​1. Shard Count Planning

​2. Task Queue Design

​3. History Growth

​4. Namespace Organization

​5. Batch Operations

​6. Client Connection Pooling

​Load Testing

​Benchmarking Setup

​Test Scenarios

​Troubleshooting Performance

​High Latency

​Memory Issues

​CPU Saturation

​See Also

Build docs developers (and LLMs) love

Architecture for Scaling

Horizontal Scaling

Frontend Service

History Service

Matching Service

Worker Service

Vertical Scaling

CPU Resources

Memory Resources

Persistence Layer Scaling

Cassandra

PostgreSQL/MySQL

Elasticsearch (Visibility)

Dynamic Configuration

Dynamic Config Polling

Caching Configuration

History Cache

Events Cache

Workflow Limits

Network Optimization

gRPC Keep-Alive

Connection Limits

Monitoring for Scaling

Key Metrics

Scaling Decision Matrix

Performance Best Practices

1. Shard Count Planning

2. Task Queue Design

3. History Growth

4. Namespace Organization

5. Batch Operations

6. Client Connection Pooling

Load Testing

Benchmarking Setup

Test Scenarios

Troubleshooting Performance

High Latency

Memory Issues

CPU Saturation

See Also