Overview
When a request is sent to a node, if it takes longer than a specified threshold to respond, the driver sends additional speculative requests to other nodes in parallel. The first response received is used, and the remaining requests are cancelled.When to Use
Speculative execution is beneficial for:- Read-heavy workloads
- Latency-sensitive applications
- Scenarios where occasional slow nodes are expected
- Reducing tail latencies (p99, p999)
- Write-heavy workloads
- Non-idempotent operations
- Bandwidth-constrained environments
- When consistent latency is more important than tail latency
SpeculativeExecutionPolicy Trait
Any type implementing this trait can be used:SimpleSpeculativeExecutionPolicy
Sends a fixed number of speculative requests at fixed intervals:Configuration
- max_retry_count: Maximum number of speculative requests (excludes original request)
- retry_interval: Delay between each speculative request
Example Timeline
PercentileSpeculativeExecutionPolicy
Triggers speculative execution based on latency percentiles (requiresmetrics feature):
How It Works
- Uses collected metrics to calculate the specified percentile latency
- Sets retry_interval to that percentile value
- Adapts automatically to cluster performance
- Falls back to 100ms if metrics are unavailable
Configuration
- max_retry_count: Maximum number of speculative requests
- percentile: Latency percentile to use as threshold (e.g., 99.0 for p99)
Enabling Metrics
ForPercentileSpeculativeExecutionPolicy, enable the metrics feature:
Using with Execution Profiles
Attach the policy via execution profile:Idempotency Requirement
Only use speculative execution with idempotent queries:- Duplicate writes
- Inconsistent data
- Unexpected behavior
Choosing Parameters
max_retry_count
retry_interval
Error Handling
Some errors can be ignored if they appear on one node:Monitoring
Track speculative execution effectiveness:Performance Considerations
Benefits
- Reduced tail latencies (p95, p99, p999)
- Better user experience for time-sensitive operations
- Protection against occasional slow nodes
Costs
- Increased network bandwidth usage
- Higher cluster load
- More complex debugging
- Resource usage even when not needed
Load Impact
Withmax_retry_count=2 and requests taking longer than retry_interval:
- Original request: 100% of nodes
- First speculative: +100% load
- Second speculative: +100% load
- Total: Up to 3x load
Custom Policy
Implement the trait for custom behavior:Best Practices
- Start with conservative settings (max_retry_count=1)
- Only use with idempotent queries
- Monitor cluster load and adjust parameters
- Set retry_interval based on your p95/p99 latency
- Consider using
PercentileSpeculativeExecutionPolicyfor auto-tuning - Test under load to ensure cluster can handle additional requests
- Disable for write-heavy workloads
- Use query history to debug and tune settings
Interaction with Other Policies
- Load Balancing: Determines which nodes receive speculative requests
- Retry Policy: Works independently; both retries and speculative requests can happen
- Timeouts: Request timeout applies to all speculative executions collectively
Example: Read-Heavy Application
Next Steps
- Retry Policy - Control retry behavior
- Load Balancing - Control node selection
- Metrics - Monitor performance
- Tracing - Debug query execution
