Skip to main content

Worker Service (Internal)

The Worker Service is an internal component of the Temporal cluster that hosts background processing tasks. Unlike user-hosted workers that execute workflow and activity code, the Worker Service performs cluster maintenance and system operations.
Do not confuse the Worker Service (internal cluster component) with Worker processes (user-hosted processes that execute workflow/activity code).

Overview

The Worker Service provides a framework for running internal Temporal Workflows and Activities that handle:
  • Cross-cluster replication
  • Workflow history archival
  • History and task queue scanning
  • Batch operations
  • System maintenance tasks

Core Components

Replicator

Handles cross-cluster replication in multi-region deployments:
Consumes replication tasks generated by remote Temporal clusters:
  • History replication tasks (workflow events)
  • Sync shard status tasks
  • Sync activity tasks
  • Namespace replication tasks
Resolves conflicts when the same workflow is modified in multiple clusters:
  • Version vectors for causality tracking
  • Last-write-wins with version comparison
  • Handles namespace failover scenarios
Maintains a queue of pending replication tasks:
  • Persisted to handle worker restarts
  • Processed in order per workflow execution
  • Retries on failure with backoff
// Code entrypoint: service/worker/replicator/
// Replication task processing logic

Multi-Cluster Setup

For cross-cluster replication:
  1. Configure remote cluster endpoints
  2. Create global namespaces with multiple clusters
  3. Replicator pulls replication tasks from remote clusters
  4. Tasks are applied to local History Service
# Example: Connecting two clusters
tctl --address 127.0.0.1:7233 adm cluster add-remote \
  --frontend_address "localhost:8233"

tctl --address 127.0.0.1:8233 adm cluster add-remote \
  --frontend_address "localhost:7233"
See Worker Service README for development quickstart.

Scanner

Performs maintenance scans on cluster data:

Executions Scanner

  • History Scavenger: Cleans up orphaned history branches
  • Execution Fixer: Repairs corrupted workflow executions
  • Concrete Execution Scanner: Validates execution data integrity

Task Queue Scanner

  • Cleans up stale task queues
  • Removes abandoned partitions
  • Consolidates task queue metadata

History Archival

Archives completed workflow histories to long-term storage:
  • Moves history from primary database to archival storage (S3, GCS, etc.)
  • Configurable retention period before archival
  • Reduces load on primary database
  • Enables compliance and auditing
// Code entrypoint: service/worker/scanner/
// Scanner workflow and activity implementations

Configuration

Scanners are configured via dynamic config:
  • Scan interval and batch size
  • Enable/disable specific scanners
  • Concurrent execution limits
  • Error handling policies

Batcher

Processes batch operations across multiple workflows: Use Cases:
  • Batch signal workflows matching a query
  • Batch cancel workflows
  • Batch terminate workflows
  • Batch reset workflows
Operation:
  1. Query visibility to find target workflows
  2. Process workflows in batches
  3. Rate limit to avoid cluster overload
  4. Report progress and errors
// Code entrypoint: service/worker/batcher/
// Batch operation workflow and activities

Batch Workflow

The batcher runs as a workflow:
// Simplified batch workflow structure:
func BatchWorkflow(ctx Context, params BatchParams) error {
    // 1. Query visibility for target workflows
    workflows := QueryVisibility(params.Query)
    
    // 2. Process in batches
    for batch := range Paginate(workflows, params.BatchSize) {
        activities.ProcessBatch(batch, params.Operation)
    }
    
    return nil
}

Parent Close Policy Worker

Handles child workflow cleanup when parent workflows close: Parent Close Policies:
  • ABANDON: Leave child workflows running
  • TERMINATE: Terminate all child workflows
  • REQUEST_CANCEL: Send cancel request to children
Implementation:
  • Triggered when parent workflow completes
  • Queries for child workflows
  • Applies policy to each child
  • Retries on failure
// Code entrypoint: service/worker/parentclosepolicy/
// Parent close policy workflow

Per-Namespace Workers

Dynamically creates and manages workers for specific namespaces: Features:
  • Worker count per namespace is configurable
  • Workers start/stop dynamically based on config
  • Each namespace can have custom worker options
  • Used for namespace-specific workflows (e.g., migrations)
// Code entrypoint: service/worker/pernamespaceworker.go
// Per-namespace worker management
Configuration:
# Dynamic config example
worker.perNamespaceWorkerCount:
  - namespace: special-namespace
    value: 5  # Run 5 workers for this namespace

worker.perNamespaceWorkerOptions:
  - namespace: special-namespace
    value:
      maxConcurrentActivityExecutionSize: 100
      maxConcurrentWorkflowTaskExecutionSize: 50

Worker Manager

The Worker Service includes a workerManager that:
  • Initializes and starts all internal workers
  • Manages worker lifecycle (start, stop, health check)
  • Coordinates shutdown during service termination
  • Handles worker registration with the cluster
// Code entrypoint: service/worker/worker.go
// WorkerManager implementation

SDK Client Factory

The Worker Service uses the Temporal SDK to run internal workflows:
  • Creates SDK clients connected to local cluster
  • Configures workers with appropriate task queues
  • Handles authentication and TLS
  • Manages connection pooling
// Internal workers use the same SDK as user workers
// This ensures consistency and leverages SDK features

System Namespaces

Internal workers operate on special system namespaces:
  • temporal-system: Main system namespace for internal workflows
  • Scanner workflows: Run in temporal-system
  • Batcher workflows: Run in temporal-system
  • Isolated from user namespaces

Task Queues

Internal workers poll specific task queues:
  • Replicator: temporal-replicator task queue
  • Scanner: temporal-scanner-taskqueue-{type} task queues
  • Batcher: temporal-sys-batcher-taskqueue
  • Parent Close Policy: temporal-sys-pcp-taskqueue

Service Configuration

// Code entrypoint: service/worker/service.go
type Config struct {
    // Scanner configuration
    ScannerCfg *scanner.Config
    
    // Parent close policy configuration
    ParentCloseCfg *parentclosepolicy.Config
    
    // Rate limiting
    PersistenceMaxQPS int
    
    // Batcher configuration
    EnableBatcher        bool
    BatcherRPS           int  // Per-namespace
    BatcherConcurrency   int  // Per-namespace
    
    // Per-namespace workers
    PerNamespaceWorkerCount   int  // Per-namespace
    PerNamespaceWorkerOptions sdkworker.Options
}

Failure Handling

Worker Restarts

Internal workers are resilient to restarts:
  • Workflows are durable and resume after restart
  • In-progress activities are retried
  • Replication queue persists pending tasks
  • Scanner state is checkpointed

Error Handling

Internal workflows implement retry policies:
  • Transient errors: Exponential backoff retry
  • Permanent errors: Fail workflow and alert
  • Rate limiting: Back off and retry

Monitoring

The Worker Service emits metrics:
  • Replication lag per remote cluster
  • Scanner progress and error rates
  • Batcher operation success/failure
  • Per-namespace worker health

Deployment Considerations

Dedicated Worker Service Instances

In production:
  • Run Worker Service on dedicated instances
  • Separate from Frontend/History/Matching for resource isolation
  • Scale independently based on workload
  • Consider resource-intensive tasks (scanner, replicator)

Resource Requirements

Worker Service needs:
  • CPU: For workflow execution and activity processing
  • Memory: For SDK worker caches and in-flight workflows
  • Network: For replication streams and archival uploads
  • Disk: For local activity retries and temporary storage

High Availability

Worker Service instances:
  • Multiple instances can run concurrently
  • Work is distributed via task queues (Matching Service)
  • No single point of failure
  • Automatic failover on instance failure

Development and Testing

Local Development

To run Worker Service locally:
# Start development server with all services
make start

# Worker Service starts automatically
# Logs show internal worker activity

Adding New Internal Workflows

To add a new internal workflow:
  1. Create workflow and activity implementations
  2. Register with worker manager
  3. Configure task queue and worker options
  4. Add metrics and logging
  5. Write tests
// Example: Registering a new internal worker
func (s *Service) Start() {
    s.workerManager.RegisterWorker(
        "my-internal-workflow",
        "my-task-queue",
        myWorkflow,
        myActivities,
    )
}

Further Reading

History Service

How workflow executions are managed

Matching Service

Task distribution for internal workers

Architecture Overview

High-level system architecture

Workflow Lifecycle

How workflows execute

Build docs developers (and LLMs) love