Overview
The SpeculativeExecutionPolicy trait controls speculative execution - a mechanism where the driver sends additional requests to other nodes when the current request is taking too long to respond. This can reduce latency for slow requests due to network issues or node load.
Speculative execution is disabled by default. Enable it explicitly when tail latency is a concern.
Source: scylla/src/policies/speculative_execution.rs:32
Trait Definition
pub trait SpeculativeExecutionPolicy: std::fmt::Debug + Send + Sync {
fn max_retry_count(&self, context: &Context) -> usize;
fn retry_interval(&self, context: &Context) -> Duration;
}
Methods
max_retry_count
Returns the maximum number of speculative executions to trigger.
fn max_retry_count(&self, context: &Context) -> usize
Context with metrics and other information
Maximum number of speculative executions (does not include the initial request)
If max_retry_count returns 2, up to 3 total requests will be in flight: 1 initial + 2 speculative.
retry_interval
Returns the delay before sending each speculative request.
fn retry_interval(&self, context: &Context) -> Duration
Context with metrics and other information
Time to wait before sending the next speculative request
Supporting Types
Context
Provides context for policy decisions.
pub struct Context {
#[cfg(feature = "metrics")]
pub metrics: Arc<Metrics>,
}
(With metrics feature) Metrics for latency-based decisions
Built-in Policies
SimpleSpeculativeExecutionPolicy
Sends a fixed number of speculative requests at regular intervals.
pub struct SimpleSpeculativeExecutionPolicy {
pub max_retry_count: usize,
pub retry_interval: Duration,
}
Maximum number of speculative executions
Fixed delay between speculative requests
Example:
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;
let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
max_retry_count: 2,
retry_interval: Duration::from_millis(100),
});
PercentileSpeculativeExecutionPolicy
(Requires metrics feature)
Triggers speculative execution based on latency percentiles.
#[cfg(feature = "metrics")]
pub struct PercentileSpeculativeExecutionPolicy {
pub max_retry_count: usize,
pub percentile: f64,
}
Maximum number of speculative executions
Percentile threshold for triggering speculation (e.g., 99.0 for p99)
The retry interval is dynamically calculated based on the specified percentile of observed request latencies.
Example:
#[cfg(feature = "metrics")]
use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;
use std::sync::Arc;
let policy = Arc::new(PercentileSpeculativeExecutionPolicy {
max_retry_count: 2,
percentile: 99.0, // Trigger when request exceeds p99 latency
});
How Speculative Execution Works
- Initial Request: Driver sends request to the first node
- Wait: Driver waits for
retry_interval
- Check: If initial request hasn’t completed:
- Send speculative request to next node in load balancing plan
- Continue with both requests in parallel
- Repeat: Continue until
max_retry_count reached or a response is received
- Return: Return the first successful response, or last error if all fail
Speculative executions only happen if the initial (or previous speculative) request hasn’t completed yet. They don’t run if the request fails quickly.
Ignorable Errors
The driver ignores certain errors from speculative executions and continues waiting for other responses:
- Connection pool errors
- Network errors (broken connection, stream allocation failure)
- Certain database errors that are node-specific
Errors that apply to all nodes (e.g., serialization errors, invalid queries) immediately stop all speculative executions.
Configuration
Session-Level
use scylla::{SessionBuilder, ExecutionProfile};
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;
let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
max_retry_count: 2,
retry_interval: Duration::from_millis(100),
});
let profile = ExecutionProfile::builder()
.speculative_execution_policy(Some(policy))
.build();
let session = SessionBuilder::new()
.known_node("127.0.0.1:9042")
.default_execution_profile_handle(profile.into_handle())
.build()
.await?;
Per-Statement
use scylla::ExecutionProfile;
let speculative_profile = ExecutionProfile::builder()
.speculative_execution_policy(Some(policy))
.build();
let mut stmt = session.prepare("SELECT * FROM users WHERE id = ?").await?;
stmt.set_execution_profile_handle(Some(speculative_profile.into_handle()));
Examples
Simple Fixed-Interval Policy
use scylla::{SessionBuilder, ExecutionProfile};
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Speculative execution after 100ms, up to 2 times
let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
max_retry_count: 2,
retry_interval: Duration::from_millis(100),
});
let profile = ExecutionProfile::builder()
.speculative_execution_policy(Some(policy))
.build();
let session = SessionBuilder::new()
.known_node("127.0.0.1:9042")
.default_execution_profile_handle(profile.into_handle())
.build()
.await?;
// Queries will use speculative execution
let result = session
.query_unpaged("SELECT * FROM users WHERE id = ?", (42,))
.await?;
Ok(())
}
Percentile-Based Policy
#[cfg(feature = "metrics")]
use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;
#[cfg(feature = "metrics")]
let policy = Arc::new(PercentileSpeculativeExecutionPolicy {
max_retry_count: 3,
percentile: 99.0, // Trigger when request takes longer than p99
});
let profile = ExecutionProfile::builder()
.speculative_execution_policy(Some(policy))
.build();
Read-Only Speculative Execution
use scylla::ExecutionProfile;
// Create profile with speculative execution for reads
let read_profile = ExecutionProfile::builder()
.speculative_execution_policy(Some(speculative_policy.clone()))
.build();
// Create profile without speculative execution for writes
let write_profile = ExecutionProfile::builder()
.speculative_execution_policy(None)
.build();
// Use appropriate profile for each operation
let mut select = session.prepare("SELECT * FROM users WHERE id = ?").await?;
select.set_execution_profile_handle(Some(read_profile.into_handle()));
let mut insert = session.prepare("INSERT INTO users (id, name) VALUES (?, ?)").await?;
insert.set_execution_profile_handle(Some(write_profile.into_handle()));
Aggressive Speculative Execution
// Retry quickly and multiple times for latency-sensitive reads
let aggressive_policy = Arc::new(SimpleSpeculativeExecutionPolicy {
max_retry_count: 4,
retry_interval: Duration::from_millis(50),
});
Conservative Speculative Execution
// Retry only once after longer delay
let conservative_policy = Arc::new(SimpleSpeculativeExecutionPolicy {
max_retry_count: 1,
retry_interval: Duration::from_millis(200),
});
Custom Policy
use scylla::policies::speculative_execution::{
SpeculativeExecutionPolicy, Context
};
use std::time::Duration;
#[derive(Debug)]
struct AdaptiveSpeculativePolicy {
min_interval_ms: u64,
max_retries: usize,
}
impl SpeculativeExecutionPolicy for AdaptiveSpeculativePolicy {
fn max_retry_count(&self, _context: &Context) -> usize {
self.max_retries
}
fn retry_interval(&self, context: &Context) -> Duration {
#[cfg(feature = "metrics")]
{
// Use p95 latency as interval, with minimum
let latency = context.metrics
.get_latency_percentile_ms(95.0)
.unwrap_or(self.min_interval_ms);
Duration::from_millis(latency.max(self.min_interval_ms))
}
#[cfg(not(feature = "metrics"))]
{
Duration::from_millis(self.min_interval_ms)
}
}
}
// Usage
let policy = Arc::new(AdaptiveSpeculativePolicy {
min_interval_ms: 50,
max_retries: 2,
});
Best Practices
- Use for read-heavy workloads where latency is critical
- Don’t use for writes unless they are idempotent
- Start conservative (1-2 retries, 100-200ms interval)
- Monitor metrics to tune interval and retry count
- Consider cost - more speculative requests increase cluster load
- Test under load to ensure benefit outweighs cost
Speculative execution increases load: Each speculative request adds work to your cluster. Use judiciously and monitor cluster health.
When to Use Speculative Execution
✅ Good Use Cases:
- Latency-sensitive read queries
- Applications with strict SLA requirements
- Handling occasional slow nodes or network hiccups
- Read queries with high variance in response time
❌ Bad Use Cases:
- Write operations (unless idempotent)
- Already fast queries (< 10ms)
- Resource-intensive queries
- During cluster overload
- When cluster capacity is limited
Tuning Guidelines
| Workload | max_retry_count | retry_interval | Notes |
|---|
| Low latency reads | 2-3 | 50-100ms | Aggressive |
| Normal reads | 1-2 | 100-200ms | Balanced |
| Heavy queries | 0-1 | 200-500ms | Conservative |
| Writes | 0 | N/A | Disabled |
Metrics and Monitoring
#[cfg(feature = "metrics")]
use scylla::observability::metrics::Metrics;
// Access metrics to tune policy
let metrics = session.get_metrics();
// Get current latency percentiles
let p50 = metrics.get_latency_percentile_ms(50.0)?;
let p99 = metrics.get_latency_percentile_ms(99.0)?;
println!("P50: {}ms, P99: {}ms", p50, p99);
// Adjust policy based on metrics
if p99 > 200 {
// Consider enabling or tuning speculative execution
}
See Also