Workers
Workers are the compute processes that execute your workflow and activity code. They run in your infrastructure, continuously polling Temporal Server for tasks and executing them using the Temporal SDK.Worker Architecture
A worker process is a long-running application that:- Registers workflow and activity implementations with the SDK
- Polls the Temporal Server for workflow and activity tasks
- Executes tasks by running your code
- Reports results back to the server
- Repeats continuously until shut down
Polling Mechanism
Workers use long-polling to receive tasks from the server with minimal latency:Long-Poll Request
Worker sends poll request
Worker makes PollWorkflowTaskQueue or PollActivityTaskQueue RPC to Frontend Service
Frontend routes to Matching Service
Request is routed to the Matching Service instance responsible for that task queue
Matching waits for task
If no task immediately available, Matching Service holds the connection open (long-poll) waiting for a task
Sync Match vs Async Match
Sync Match
Instant delivery: Task dispatched directly to waiting poller without touching database. Lowest latency path (~1ms).
Async Match
Queued delivery: No poller available, task written to database and retrieved later when poller arrives. Higher latency but ensures delivery.
Workflow Task Execution
When a worker receives a workflow task:Execution Flow
SDK replays history
SDK feeds history events through workflow code to reconstruct current state. Must be deterministic.
Generate commands
SDK collects commands like ScheduleActivityTask, StartTimer, CompleteWorkflowExecution
Workflow Task Response Structure
Workflow Task Response Structure
Workflow Task Failures
Workflow tasks can fail for several reasons:Non-Determinism
Workflow code produced different commands during replay. SDK detects this and fails the task.
Worker Crash
Worker dies during execution. Task timeout fires and task is retried on another worker.
Transient Error
Temporary failure like network issue. Task automatically retried with backoff.
Bad Code
Unhandled exception in workflow code. Can be configured to fail workflow or retry task.
Activity Task Execution
Activity execution is simpler than workflow execution:Worker receives activity task
Task contains activity input, attempt number, and heartbeat details (if resuming)
Execute activity function
Activity code runs with full access to I/O, databases, APIs, etc. No determinism required.
Worker Configuration
Workers have several important configuration parameters:Concurrency Settings
Max Concurrent Workflow Tasks
Max Concurrent Workflow Tasks
Controls how many workflow tasks can execute simultaneously. Each task runs in its own goroutine/thread.Default: 100
Considerations: Workflow tasks are typically CPU-bound (replay logic). Scale based on CPU cores.
Max Concurrent Activity Tasks
Max Concurrent Activity Tasks
Controls how many activity tasks can execute simultaneously.Default: 100
Considerations: Activities often wait on I/O. Can typically be higher than workflow task concurrency.
Max Concurrent Local Activity Tasks
Max Concurrent Local Activity Tasks
Controls local activity execution parallelism.Default: 100
Considerations: Local activities execute inline with workflow task, sharing its resources.
Polling Configuration
More pollers increases throughput but also increases load on Matching Service. Typical values: 2-10 pollers per worker process.
Worker Identity and Tracking
Each worker has an identity used for observability:- Worker Identity: Unique identifier (hostname + process ID + UUID)
- Binary Checksum: Hash of worker binary for detecting bad deployments
- Build ID: Worker version for routing workflows to compatible workers
Sticky Execution
For performance, Temporal uses “sticky execution” to route workflow tasks back to the same worker:How Sticky Execution Works
Worker caches workflow state
After completing a workflow task, worker keeps workflow state in memory
Next task routed to sticky queue
History Service dispatches next workflow task to sticky queue first
Sticky Execution Benefits
Sticky Execution Benefits
Performance: Skip replaying history (can save 100ms+ for large histories)Reduced Server Load: No need to fetch and transmit full historyLower Latency: Workflow tasks complete faster, workflows make progress quickerTradeoff: Workflow execution pinned to specific worker. If worker dies, new worker must do full replay.
Worker Failure Modes
Worker Process Crash
Impact: In-flight tasks timeout and retry on other workers. Sticky cache lost, requiring history replay.Mitigation: Workflow tasks automatically redistributed. Minimal impact if worker pool is healthy.
Worker Hangs
Impact: Tasks never complete, timeouts fire, tasks retried elsewhere.Mitigation: Set appropriate workflow task timeout. Monitor task latency.
Network Partition
Impact: Worker can’t reach server, can’t poll for tasks or report results.Mitigation: Workers retry with backoff. Tasks timeout and retry on healthy workers.
Bad Deployment
Impact: All workers deploy broken code, all tasks fail.Mitigation: Binary checksum tracking, gradual rollouts, automated rollback.
Worker Versioning
Temporal supports routing tasks based on worker version:Build ID-Based Versioning
- Workers register with a Build ID (e.g., git commit, version number)
- Workflows can be pinned to specific Build ID sets
- New workflows use latest Build ID
- Running workflows stay on their Build ID or migrate per rules
Monitoring Workers
Key metrics to monitor:- Poll Success Rate: Are workers successfully getting tasks?
- Task Execution Latency: How long do tasks take?
- Task Failure Rate: How many tasks are failing?
- Worker Count: How many workers are active?
- Sticky Cache Hit Rate: How often is sticky execution working?
- Queue Backlog: Are tasks piling up?
Related Concepts
- Task Queues - How workers discover tasks
- Workflows - What workers execute
- Activities - Activity execution in workers