Skip to main content

Overview

The SpeculativeExecutionPolicy trait controls speculative execution - a mechanism where the driver sends additional requests to other nodes when the current request is taking too long to respond. This can reduce latency for slow requests due to network issues or node load.
Speculative execution is disabled by default. Enable it explicitly when tail latency is a concern.
Source: scylla/src/policies/speculative_execution.rs:32

Trait Definition

pub trait SpeculativeExecutionPolicy: std::fmt::Debug + Send + Sync {
    fn max_retry_count(&self, context: &Context) -> usize;
    
    fn retry_interval(&self, context: &Context) -> Duration;
}

Methods

max_retry_count

Returns the maximum number of speculative executions to trigger.
fn max_retry_count(&self, context: &Context) -> usize
context
&Context
required
Context with metrics and other information
count
usize
Maximum number of speculative executions (does not include the initial request)
If max_retry_count returns 2, up to 3 total requests will be in flight: 1 initial + 2 speculative.

retry_interval

Returns the delay before sending each speculative request.
fn retry_interval(&self, context: &Context) -> Duration
context
&Context
required
Context with metrics and other information
interval
Duration
Time to wait before sending the next speculative request

Supporting Types

Context

Provides context for policy decisions.
pub struct Context {
    #[cfg(feature = "metrics")]
    pub metrics: Arc<Metrics>,
}
metrics
Arc<Metrics>
(With metrics feature) Metrics for latency-based decisions

Built-in Policies

SimpleSpeculativeExecutionPolicy

Sends a fixed number of speculative requests at regular intervals.
pub struct SimpleSpeculativeExecutionPolicy {
    pub max_retry_count: usize,
    pub retry_interval: Duration,
}
max_retry_count
usize
Maximum number of speculative executions
retry_interval
Duration
Fixed delay between speculative requests
Example:
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;

let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
    max_retry_count: 2,
    retry_interval: Duration::from_millis(100),
});

PercentileSpeculativeExecutionPolicy

(Requires metrics feature) Triggers speculative execution based on latency percentiles.
#[cfg(feature = "metrics")]
pub struct PercentileSpeculativeExecutionPolicy {
    pub max_retry_count: usize,
    pub percentile: f64,
}
max_retry_count
usize
Maximum number of speculative executions
percentile
f64
Percentile threshold for triggering speculation (e.g., 99.0 for p99)
The retry interval is dynamically calculated based on the specified percentile of observed request latencies. Example:
#[cfg(feature = "metrics")]
use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;
use std::sync::Arc;

let policy = Arc::new(PercentileSpeculativeExecutionPolicy {
    max_retry_count: 2,
    percentile: 99.0,  // Trigger when request exceeds p99 latency
});

How Speculative Execution Works

  1. Initial Request: Driver sends request to the first node
  2. Wait: Driver waits for retry_interval
  3. Check: If initial request hasn’t completed:
    • Send speculative request to next node in load balancing plan
    • Continue with both requests in parallel
  4. Repeat: Continue until max_retry_count reached or a response is received
  5. Return: Return the first successful response, or last error if all fail
Speculative executions only happen if the initial (or previous speculative) request hasn’t completed yet. They don’t run if the request fails quickly.

Ignorable Errors

The driver ignores certain errors from speculative executions and continues waiting for other responses:
  • Connection pool errors
  • Network errors (broken connection, stream allocation failure)
  • Certain database errors that are node-specific
Errors that apply to all nodes (e.g., serialization errors, invalid queries) immediately stop all speculative executions.

Configuration

Session-Level

use scylla::{SessionBuilder, ExecutionProfile};
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;

let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
    max_retry_count: 2,
    retry_interval: Duration::from_millis(100),
});

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(policy))
    .build();

let session = SessionBuilder::new()
    .known_node("127.0.0.1:9042")
    .default_execution_profile_handle(profile.into_handle())
    .build()
    .await?;

Per-Statement

use scylla::ExecutionProfile;

let speculative_profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(policy))
    .build();

let mut stmt = session.prepare("SELECT * FROM users WHERE id = ?").await?;
stmt.set_execution_profile_handle(Some(speculative_profile.into_handle()));

Examples

Simple Fixed-Interval Policy

use scylla::{SessionBuilder, ExecutionProfile};
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::time::Duration;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Speculative execution after 100ms, up to 2 times
    let policy = Arc::new(SimpleSpeculativeExecutionPolicy {
        max_retry_count: 2,
        retry_interval: Duration::from_millis(100),
    });

    let profile = ExecutionProfile::builder()
        .speculative_execution_policy(Some(policy))
        .build();

    let session = SessionBuilder::new()
        .known_node("127.0.0.1:9042")
        .default_execution_profile_handle(profile.into_handle())
        .build()
        .await?;

    // Queries will use speculative execution
    let result = session
        .query_unpaged("SELECT * FROM users WHERE id = ?", (42,))
        .await?;

    Ok(())
}

Percentile-Based Policy

#[cfg(feature = "metrics")]
use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;

#[cfg(feature = "metrics")]
let policy = Arc::new(PercentileSpeculativeExecutionPolicy {
    max_retry_count: 3,
    percentile: 99.0,  // Trigger when request takes longer than p99
});

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(policy))
    .build();

Read-Only Speculative Execution

use scylla::ExecutionProfile;

// Create profile with speculative execution for reads
let read_profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(speculative_policy.clone()))
    .build();

// Create profile without speculative execution for writes
let write_profile = ExecutionProfile::builder()
    .speculative_execution_policy(None)
    .build();

// Use appropriate profile for each operation
let mut select = session.prepare("SELECT * FROM users WHERE id = ?").await?;
select.set_execution_profile_handle(Some(read_profile.into_handle()));

let mut insert = session.prepare("INSERT INTO users (id, name) VALUES (?, ?)").await?;
insert.set_execution_profile_handle(Some(write_profile.into_handle()));

Aggressive Speculative Execution

// Retry quickly and multiple times for latency-sensitive reads
let aggressive_policy = Arc::new(SimpleSpeculativeExecutionPolicy {
    max_retry_count: 4,
    retry_interval: Duration::from_millis(50),
});

Conservative Speculative Execution

// Retry only once after longer delay
let conservative_policy = Arc::new(SimpleSpeculativeExecutionPolicy {
    max_retry_count: 1,
    retry_interval: Duration::from_millis(200),
});

Custom Policy

use scylla::policies::speculative_execution::{
    SpeculativeExecutionPolicy, Context
};
use std::time::Duration;

#[derive(Debug)]
struct AdaptiveSpeculativePolicy {
    min_interval_ms: u64,
    max_retries: usize,
}

impl SpeculativeExecutionPolicy for AdaptiveSpeculativePolicy {
    fn max_retry_count(&self, _context: &Context) -> usize {
        self.max_retries
    }

    fn retry_interval(&self, context: &Context) -> Duration {
        #[cfg(feature = "metrics")]
        {
            // Use p95 latency as interval, with minimum
            let latency = context.metrics
                .get_latency_percentile_ms(95.0)
                .unwrap_or(self.min_interval_ms);
            Duration::from_millis(latency.max(self.min_interval_ms))
        }
        
        #[cfg(not(feature = "metrics"))]
        {
            Duration::from_millis(self.min_interval_ms)
        }
    }
}

// Usage
let policy = Arc::new(AdaptiveSpeculativePolicy {
    min_interval_ms: 50,
    max_retries: 2,
});

Best Practices

  1. Use for read-heavy workloads where latency is critical
  2. Don’t use for writes unless they are idempotent
  3. Start conservative (1-2 retries, 100-200ms interval)
  4. Monitor metrics to tune interval and retry count
  5. Consider cost - more speculative requests increase cluster load
  6. Test under load to ensure benefit outweighs cost
Speculative execution increases load: Each speculative request adds work to your cluster. Use judiciously and monitor cluster health.

When to Use Speculative Execution

Good Use Cases:
  • Latency-sensitive read queries
  • Applications with strict SLA requirements
  • Handling occasional slow nodes or network hiccups
  • Read queries with high variance in response time
Bad Use Cases:
  • Write operations (unless idempotent)
  • Already fast queries (< 10ms)
  • Resource-intensive queries
  • During cluster overload
  • When cluster capacity is limited

Tuning Guidelines

Workloadmax_retry_countretry_intervalNotes
Low latency reads2-350-100msAggressive
Normal reads1-2100-200msBalanced
Heavy queries0-1200-500msConservative
Writes0N/ADisabled

Metrics and Monitoring

#[cfg(feature = "metrics")]
use scylla::observability::metrics::Metrics;

// Access metrics to tune policy
let metrics = session.get_metrics();

// Get current latency percentiles
let p50 = metrics.get_latency_percentile_ms(50.0)?;
let p99 = metrics.get_latency_percentile_ms(99.0)?;

println!("P50: {}ms, P99: {}ms", p50, p99);

// Adjust policy based on metrics
if p99 > 200 {
    // Consider enabling or tuning speculative execution
}

See Also

Build docs developers (and LLMs) love