Overview
The History Service is an internal Cadence service responsible for maintaining workflow execution state and event history. It manages the complete lifecycle of workflow executions, handles state transitions, and coordinates with the Matching Service for task dispatch. Service Location:service/history/handler/interface.go
Health & Lifecycle
Health
Check service health status.Start
Start the history service.Stop
Stop the history service gracefully.PrepareToStop
Prepare the service for shutdown.Workflow Execution APIs
StartWorkflowExecution
Internal API to start a new workflow execution.UUID of the domain
The workflow start request from frontend
Task list partition configuration
The unique run ID for this workflow execution
- Checks for duplicate workflow IDs based on
WorkflowIdReusePolicy - Validates workflow and activity timeouts
- Ensures domain is active
- Validates retry policies and cron schedules
SignalWorkflowExecution
Send a signal to a running workflow.UUID of the domain
Signal parameters from frontend
SignalWithStartWorkflowExecution
Signal a workflow, starting it if it doesn’t exist.StartWorkflowExecution and SignalWorkflowExecution in a single atomic operation.
TerminateWorkflowExecution
Terminate a running workflow execution.UUID of the domain
Termination parameters
- Immediately closes the workflow execution
- Cancels all pending activities and child workflows
- Records
WorkflowExecutionTerminatedevent - No further decisions will be scheduled
RequestCancelWorkflowExecution
Request cancellation of a workflow execution.UUID of the domain
Cancellation request parameters
- Records
WorkflowExecutionCancelRequestedevent - Schedules a new decision task
- Workflow can handle cancellation gracefully
ResetWorkflowExecution
Reset a workflow execution to a previous decision task.UUID of the domain
Reset parameters
New run ID after reset
- Terminates current workflow execution
- Creates a new execution with history up to reset point
- Optionally reapplies signals that occurred after reset point
- Schedules a new decision task
Mutable State APIs
GetMutableState
Retrieve the current mutable state of a workflow execution.UUID of the domain
Workflow execution identifier
Expected next event ID for validation
Current history branch token
Version history for conflict detection
Workflow execution information
The workflow type
Next event ID in the history
Event ID of previous decision task started
Current task list
Sticky task list if enabled
Whether the workflow is currently running
Internal workflow state
Close state if workflow is closed
Version histories for multi-cluster replication
PollMutableState
Long poll for mutable state changes.DescribeMutableState
Get detailed mutable state information for debugging.JSON representation of in-memory mutable state
JSON representation of persisted mutable state
DescribeWorkflowExecution
Get workflow execution details.Decision Task APIs
RecordDecisionTaskStarted
Record that a decision task has started.UUID of the domain
Workflow execution identifier
Schedule event ID of the decision task
Task ID from the task queue
Unique request identifier for idempotency
The original poll request
The workflow type
Previous decision task started event ID
Decision task scheduled event ID
Decision task started event ID
Next event ID to be assigned
Attempt number for this decision task
Whether sticky execution is enabled
Workflow execution history
Normal task list for the workflow
Event store version
Branch token for history events
Pending queries to be answered
RespondDecisionTaskCompleted
Complete a decision task with decisions.UUID of the domain
Completion request from worker
- Validates all decisions
- Applies decisions to mutable state
- Schedules activity tasks, timers, child workflows
- Records
DecisionTaskCompletedevent - Determines if new decision task is needed
DecisionTaskFailedCauseUnhandledDecision: Invalid decision typeDecisionTaskFailedCauseBadScheduleActivityAttributes: Invalid activity parametersDecisionTaskFailedCauseBadBinary: Workflow worker binary is marked as bad
RespondDecisionTaskFailed
Report a decision task failure.UUID of the domain
Failure information
ScheduleDecisionTask
Schedule a new decision task for a workflow.Activity Task APIs
RecordActivityTaskStarted
Record that an activity task has started.UUID of the domain
Workflow execution identifier
Schedule event ID of the activity
Task ID from the task queue
Unique request identifier
The original poll request
Activity scheduled event
Timestamp when activity started
Retry attempt number
When this attempt was scheduled
Last recorded heartbeat details
The workflow type
Domain name of the workflow
RespondActivityTaskCompleted
Complete an activity task successfully.UUID of the domain
Completion request from worker
- Records
ActivityTaskCompletedevent - Schedules a new decision task
- Passes result to workflow for next decision
RespondActivityTaskFailed
Report an activity task failure.UUID of the domain
Failure information
- Records
ActivityTaskFailedevent - Applies retry policy if configured
- Schedules new decision task
RespondActivityTaskCanceled
Report an activity task cancellation.RecordActivityTaskHeartbeat
Record a heartbeat for a long-running activity.UUID of the domain
Heartbeat details
Whether the activity has been requested to cancel
- Updates last heartbeat timestamp
- Stores heartbeat details for retry
- Checks if cancel was requested
- Extends activity timeout
Child Workflow APIs
RecordChildExecutionCompleted
Record the completion of a child workflow execution.UUID of the parent workflow’s domain
Parent workflow execution
Event ID that initiated the child workflow
Child workflow execution
Child workflow completion event
Query APIs
QueryWorkflow
Query a workflow execution.UUID of the domain
Query request from frontend
- If workflow has pending decision task, schedules query with that task
- Otherwise, creates a new transient decision task for the query
- Sends query to matching service
- Returns query result or timeout
Replication APIs
ReplicateEventsV2
Replicate workflow events from another cluster.UUID of the domain
Workflow execution being replicated
Version history items
Serialized history events
Events for new run (if workflow continued as new)
GetReplicationMessages
Get replication messages for cross-cluster replication.GetDLQReplicationMessages
Get replication messages from the DLQ.NotifyFailoverMarkers
Notify about failover markers for graceful failover.Cross-Cluster APIs
GetCrossClusterTasks
Get cross-cluster tasks for processing.RespondCrossClusterTasksCompleted
Acknowledge completion of cross-cluster tasks.DLQ Management APIs
ReadDLQMessages
Read messages from the dead letter queue.CountDLQMessages
Count messages in the DLQ.PurgeDLQMessages
Purge messages from the DLQ.MergeDLQMessages
Merge DLQ messages back into the main queue.Administrative APIs
DescribeHistoryHost
Get information about a history service host.Address of the host to describe
Shard ID to get info for
Get host info for this execution
CloseShard
Close a shard on a history host.RemoveTask
Remove a task from a shard.ResetStickyTaskList
Reset sticky task list for a workflow.DescribeQueue
Get information about a task queue.ResetQueue
Reset a task queue.RefreshWorkflowTasks
Refresh workflow tasks.RemoveSignalMutableState
Remove a signal from mutable state.ReapplyEvents
Reapply events to a workflow execution.SyncActivity
Sync activity state across clusters.SyncShardStatus
Sync shard status across hosts.GetFailoverInfo
Get failover information for a domain.RatelimitUpdate
Update rate limit settings.State Management
Mutable State
The History Service maintains workflow mutable state in memory and persists it to the database. Mutable state includes:- Execution Info: Workflow ID, Run ID, state, timeouts
- Pending Activities: Scheduled/started activities
- Pending Decisions: Scheduled/started decision tasks
- Pending Timers: Active timers
- Pending Child Workflows: Child workflow executions
- Pending Signals: Buffered signals
- Activity Info: Heartbeat details, retry state
- Timer Info: Timer details and fire times
Event History
Workflow event history is stored in a versioned tree structure:- Branch: A sequence of history events
- Version History: Tracks history across multiple clusters
- Event Store Version: Schema version for events
Sharding
Workflows are distributed across shards:- Shard Count: Typically 1024-16384 shards
- Shard Assignment: Based on workflow ID hash
- Shard Ownership: Each shard owned by one history host
- Shard Movement: Supports dynamic rebalancing
Error Handling
ShardOwnershipLostError
Shard ownership moved to another host.EventAlreadyStartedError
Attempt to start an already started task.RetryTaskV2Error
Task should be retried with exponential backoff.Best Practices
Idempotency
All mutating operations use idempotency tokens:RequestIdfor API callsTaskIDfor task processing- Event IDs for history deduplication
Conflict Resolution
When concurrent operations occur:- Use version histories for conflict detection
- Retry with updated version information
- Handle
CurrentBranchChangedError
Performance Optimization
- Sticky Execution: Keeps mutable state in memory
- Batch Operations: Group history events
- Lazy Loading: Load history on demand
- Event Pagination: Stream large histories
See Also
- Frontend Service API - Public workflow APIs
- Matching Service API - Task distribution
- Workflow Types - Workflow type definitions