Skip to main content
Speculative execution allows the driver to send parallel requests to multiple nodes when the current node takes too long to respond. This can significantly reduce tail latencies for time-sensitive queries.

Overview

When a request is sent to a node, if it takes longer than a specified threshold to respond, the driver sends additional speculative requests to other nodes in parallel. The first response received is used, and the remaining requests are cancelled.

When to Use

Speculative execution is beneficial for:
  • Read-heavy workloads
  • Latency-sensitive applications
  • Scenarios where occasional slow nodes are expected
  • Reducing tail latencies (p99, p999)
Not recommended for:
  • Write-heavy workloads
  • Non-idempotent operations
  • Bandwidth-constrained environments
  • When consistent latency is more important than tail latency

SpeculativeExecutionPolicy Trait

Any type implementing this trait can be used:
pub trait SpeculativeExecutionPolicy: std::fmt::Debug + Send + Sync {
    fn max_retry_count(&self, context: &Context) -> usize;
    fn retry_interval(&self, context: &Context) -> Duration;
}

SimpleSpeculativeExecutionPolicy

Sends a fixed number of speculative requests at fixed intervals:
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use scylla::client::execution_profile::ExecutionProfile;
use std::sync::Arc;
use std::time::Duration;

let policy = SimpleSpeculativeExecutionPolicy {
    max_retry_count: 2,
    retry_interval: Duration::from_millis(100),
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

Configuration

  • max_retry_count: Maximum number of speculative requests (excludes original request)
  • retry_interval: Delay between each speculative request

Example Timeline

t=0ms:   Send original request to Node1
t=100ms: No response yet, send speculative request to Node2
t=200ms: No response yet, send speculative request to Node3
t=250ms: Node2 responds, cancel Node1 and Node3 requests

PercentileSpeculativeExecutionPolicy

Triggers speculative execution based on latency percentiles (requires metrics feature):
use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;

let policy = PercentileSpeculativeExecutionPolicy {
    max_retry_count: 2,
    percentile: 99.0,
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

How It Works

  • Uses collected metrics to calculate the specified percentile latency
  • Sets retry_interval to that percentile value
  • Adapts automatically to cluster performance
  • Falls back to 100ms if metrics are unavailable

Configuration

  • max_retry_count: Maximum number of speculative requests
  • percentile: Latency percentile to use as threshold (e.g., 99.0 for p99)

Enabling Metrics

For PercentileSpeculativeExecutionPolicy, enable the metrics feature:
[dependencies]
scylla = { version = "*", features = ["metrics"] }

Using with Execution Profiles

Attach the policy via execution profile:
use scylla::client::execution_profile::ExecutionProfile;
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::sync::Arc;
use std::time::Duration;

let policy = SimpleSpeculativeExecutionPolicy {
    max_retry_count: 3,
    retry_interval: Duration::from_millis(100),
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

let handle = profile.into_handle();

// Use with session
let session = SessionBuilder::new()
    .known_node("127.0.0.1:9042")
    .default_execution_profile_handle(handle)
    .build()
    .await?;

Idempotency Requirement

Only use speculative execution with idempotent queries:
let mut query = Statement::from("SELECT * FROM keyspace.table");
query.set_is_idempotent(true);
Speculative execution with non-idempotent queries can lead to:
  • Duplicate writes
  • Inconsistent data
  • Unexpected behavior

Choosing Parameters

max_retry_count

// Conservative (lower resource usage)
max_retry_count: 1

// Moderate (balanced)
max_retry_count: 2

// Aggressive (best tail latency, higher load)
max_retry_count: 3

retry_interval

// For low-latency clusters (sub-10ms p50)
retry_interval: Duration::from_millis(20)

// For medium-latency clusters (10-50ms p50)
retry_interval: Duration::from_millis(100)

// For high-latency clusters (>50ms p50)
retry_interval: Duration::from_millis(200)
Rule of thumb: Set retry_interval slightly below your p99 latency.

Error Handling

Some errors can be ignored if they appear on one node:
// These errors trigger retry on next node:
- ConnectionPoolError
- BrokenConnectionError
- UnableToAllocStreamId
- Database errors that can be speculative_retry'd

// These errors terminate all speculative executions:
- SerializationError
- CqlRequestSerialization
- ResultParseError

Monitoring

Track speculative execution effectiveness:
// Check if original request was slower than retry_interval
let history = session.get_query_history();

// Count speculative fibers in history
for request in &history.requests {
    let spec_count = request.speculative_fibers.len();
    println!("Speculative requests sent: {}", spec_count);
}

Performance Considerations

Benefits

  • Reduced tail latencies (p95, p99, p999)
  • Better user experience for time-sensitive operations
  • Protection against occasional slow nodes

Costs

  • Increased network bandwidth usage
  • Higher cluster load
  • More complex debugging
  • Resource usage even when not needed

Load Impact

With max_retry_count=2 and requests taking longer than retry_interval:
  • Original request: 100% of nodes
  • First speculative: +100% load
  • Second speculative: +100% load
  • Total: Up to 3x load

Custom Policy

Implement the trait for custom behavior:
use scylla::policies::speculative_execution::{
    SpeculativeExecutionPolicy, Context
};
use std::time::Duration;

#[derive(Debug)]
struct AdaptivePolicy {
    base_interval: Duration,
}

impl SpeculativeExecutionPolicy for AdaptivePolicy {
    fn max_retry_count(&self, _context: &Context) -> usize {
        // Adjust based on context
        2
    }

    fn retry_interval(&self, _context: &Context) -> Duration {
        // Could adjust based on time of day, cluster state, etc.
        self.base_interval
    }
}

Best Practices

  • Start with conservative settings (max_retry_count=1)
  • Only use with idempotent queries
  • Monitor cluster load and adjust parameters
  • Set retry_interval based on your p95/p99 latency
  • Consider using PercentileSpeculativeExecutionPolicy for auto-tuning
  • Test under load to ensure cluster can handle additional requests
  • Disable for write-heavy workloads
  • Use query history to debug and tune settings

Interaction with Other Policies

  • Load Balancing: Determines which nodes receive speculative requests
  • Retry Policy: Works independently; both retries and speculative requests can happen
  • Timeouts: Request timeout applies to all speculative executions collectively

Example: Read-Heavy Application

use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use scylla::client::execution_profile::ExecutionProfile;
use std::sync::Arc;
use std::time::Duration;

// Profile for latency-sensitive reads
let read_profile = ExecutionProfile::builder()
    .consistency(Consistency::LocalQuorum)
    .speculative_execution_policy(Some(Arc::new(
        SimpleSpeculativeExecutionPolicy {
            max_retry_count: 2,
            retry_interval: Duration::from_millis(50),
        }
    )))
    .build();

let mut query = Statement::from("SELECT * FROM users WHERE id = ?");
query.set_is_idempotent(true);
query.set_execution_profile_handle(Some(read_profile.into_handle()));

Next Steps

Build docs developers (and LLMs) love