Skip to main content

Workers

Workers are the compute processes that execute your workflow and activity code. They run in your infrastructure, continuously polling Temporal Server for tasks and executing them using the Temporal SDK.

Worker Architecture

A worker process is a long-running application that:
  1. Registers workflow and activity implementations with the SDK
  2. Polls the Temporal Server for workflow and activity tasks
  3. Executes tasks by running your code
  4. Reports results back to the server
  5. Repeats continuously until shut down

Polling Mechanism

Workers use long-polling to receive tasks from the server with minimal latency:

Long-Poll Request

1

Worker sends poll request

Worker makes PollWorkflowTaskQueue or PollActivityTaskQueue RPC to Frontend Service
2

Frontend routes to Matching Service

Request is routed to the Matching Service instance responsible for that task queue
3

Matching waits for task

If no task immediately available, Matching Service holds the connection open (long-poll) waiting for a task
4

Task arrives

When History Service dispatches a task, it’s matched with the waiting poller
5

Task returned to worker

Matching Service returns the task details and closes the poll request
// From service/matching/matcher.go
type TaskMatcher struct {
    // Channels for matching tasks with pollers
    taskC      chan *internalTask  // Regular tasks
    queryTaskC chan *internalTask  // Query tasks
    
    // Track when pollers were last seen
    lastPoller atomic.Int64  // unix nanos of most recent poll
    
    // Rate limiting for task dispatch
    rateLimiter quotas.RateLimiter
}

// Offer matches a task with a waiting poller
func (tm *TaskMatcher) Offer(ctx context.Context, task *internalTask) (bool, error) {
    select {
    case tm.taskC <- task:  // Poller waiting, immediate match
        return true, nil
    default:
        // No poller available, try forwarding to parent partition
        return tm.fwdr.ForwardTask(ctx, task)
    }
}

Sync Match vs Async Match

Sync Match

Instant delivery: Task dispatched directly to waiting poller without touching database. Lowest latency path (~1ms).

Async Match

Queued delivery: No poller available, task written to database and retrieved later when poller arrives. Higher latency but ensures delivery.

Workflow Task Execution

When a worker receives a workflow task:

Execution Flow

1

Worker receives task

Task contains workflow execution history (all events since start)
2

SDK replays history

SDK feeds history events through workflow code to reconstruct current state. Must be deterministic.
3

Workflow code executes

Code runs until blocked (waiting for activity, timer, signal) or completes
4

Generate commands

SDK collects commands like ScheduleActivityTask, StartTimer, CompleteWorkflowExecution
5

Send response

Worker sends RespondWorkflowTaskCompleted with command list to History Service
type RespondWorkflowTaskCompletedRequest struct {
    TaskToken []byte      // Identifies which task this is completing
    Commands  []*Command  // List of commands to execute
    
    // Possible commands:
    // - ScheduleActivityTask
    // - StartTimer 
    // - CompleteWorkflowExecution
    // - FailWorkflowExecution
    // - CancelTimer
    // - RequestCancelActivity
    // - StartChildWorkflowExecution
    // - SignalExternalWorkflowExecution
    // - ContinueAsNewWorkflowExecution
    // - and more...
}

Workflow Task Failures

Workflow tasks can fail for several reasons:

Non-Determinism

Workflow code produced different commands during replay. SDK detects this and fails the task.

Worker Crash

Worker dies during execution. Task timeout fires and task is retried on another worker.

Transient Error

Temporary failure like network issue. Task automatically retried with backoff.

Bad Code

Unhandled exception in workflow code. Can be configured to fail workflow or retry task.

Activity Task Execution

Activity execution is simpler than workflow execution:
1

Worker receives activity task

Task contains activity input, attempt number, and heartbeat details (if resuming)
2

Execute activity function

Activity code runs with full access to I/O, databases, APIs, etc. No determinism required.
3

Send heartbeats (optional)

Long-running activities periodically heartbeat to prove they’re alive
4

Return result or error

Worker sends RespondActivityTaskCompleted or RespondActivityTaskFailed
// Activity execution is straightforward
func (w *Worker) executeActivity(task *ActivityTask) {
    // Get registered activity function
    activityFn := w.registry.GetActivity(task.ActivityType)
    
    // Execute with timeout and context
    ctx, cancel := context.WithTimeout(context.Background(), task.StartToCloseTimeout)
    defer cancel()
    
    result, err := activityFn(ctx, task.Input)
    
    // Report result back to server
    if err != nil {
        w.client.RespondActivityTaskFailed(ctx, task.TaskToken, err)
    } else {
        w.client.RespondActivityTaskCompleted(ctx, task.TaskToken, result)
    }
}

Worker Configuration

Workers have several important configuration parameters:

Concurrency Settings

Controls how many workflow tasks can execute simultaneously. Each task runs in its own goroutine/thread.Default: 100 Considerations: Workflow tasks are typically CPU-bound (replay logic). Scale based on CPU cores.
Controls how many activity tasks can execute simultaneously.Default: 100 Considerations: Activities often wait on I/O. Can typically be higher than workflow task concurrency.
Controls local activity execution parallelism.Default: 100 Considerations: Local activities execute inline with workflow task, sharing its resources.

Polling Configuration

type WorkerOptions struct {
    MaxConcurrentWorkflowTaskPollers int  // Number of parallel pollers
    MaxConcurrentActivityTaskPollers int  // Number of parallel pollers
    
    WorkflowPollersCount int  // Deprecated, use above
    ActivityPollersCount int  // Deprecated, use above
    
    // How long to wait for task before considering poll timed out
    WorkflowTaskTimeout time.Duration
    
    // Other options...
}
More pollers increases throughput but also increases load on Matching Service. Typical values: 2-10 pollers per worker process.

Worker Identity and Tracking

Each worker has an identity used for observability:
  • Worker Identity: Unique identifier (hostname + process ID + UUID)
  • Binary Checksum: Hash of worker binary for detecting bad deployments
  • Build ID: Worker version for routing workflows to compatible workers
// Server tracks which worker executed each task
type ActivityInfo struct {
    StartedIdentity         string  // Which worker started this attempt
    RetryLastWorkerIdentity string  // Which worker tried last attempt
    LastWorkerDeploymentVersion *deploymentpb.WorkerDeploymentVersion
    // ...
}

Sticky Execution

For performance, Temporal uses “sticky execution” to route workflow tasks back to the same worker:

How Sticky Execution Works

1

Worker caches workflow state

After completing a workflow task, worker keeps workflow state in memory
2

Worker specifies sticky task queue

Response includes a sticky task queue name unique to this worker
3

Next task routed to sticky queue

History Service dispatches next workflow task to sticky queue first
4

Fast path if worker available

Same worker picks up task, skips replay, resumes from cached state
5

Fallback to normal queue

If worker unavailable after timeout, task dispatched to normal queue
Performance: Skip replaying history (can save 100ms+ for large histories)Reduced Server Load: No need to fetch and transmit full historyLower Latency: Workflow tasks complete faster, workflows make progress quickerTradeoff: Workflow execution pinned to specific worker. If worker dies, new worker must do full replay.

Worker Failure Modes

Worker Process Crash

Impact: In-flight tasks timeout and retry on other workers. Sticky cache lost, requiring history replay.Mitigation: Workflow tasks automatically redistributed. Minimal impact if worker pool is healthy.

Worker Hangs

Impact: Tasks never complete, timeouts fire, tasks retried elsewhere.Mitigation: Set appropriate workflow task timeout. Monitor task latency.

Network Partition

Impact: Worker can’t reach server, can’t poll for tasks or report results.Mitigation: Workers retry with backoff. Tasks timeout and retry on healthy workers.

Bad Deployment

Impact: All workers deploy broken code, all tasks fail.Mitigation: Binary checksum tracking, gradual rollouts, automated rollback.

Worker Versioning

Temporal supports routing tasks based on worker version:

Build ID-Based Versioning

  • Workers register with a Build ID (e.g., git commit, version number)
  • Workflows can be pinned to specific Build ID sets
  • New workflows use latest Build ID
  • Running workflows stay on their Build ID or migrate per rules
Use worker versioning for safe, gradual deployments of workflow code changes without disrupting running workflows.

Monitoring Workers

Key metrics to monitor:
  • Poll Success Rate: Are workers successfully getting tasks?
  • Task Execution Latency: How long do tasks take?
  • Task Failure Rate: How many tasks are failing?
  • Worker Count: How many workers are active?
  • Sticky Cache Hit Rate: How often is sticky execution working?
  • Queue Backlog: Are tasks piling up?

Build docs developers (and LLMs) love