Speculative Execution - ScyllaDB Rust Driver

Learn more about Mintlify

Enter your email to receive updates about new features and product releases.

Overview
When to Use
SpeculativeExecutionPolicy Trait
SimpleSpeculativeExecutionPolicy
Configuration
Example Timeline
PercentileSpeculativeExecutionPolicy
How It Works
Configuration
Enabling Metrics
Using with Execution Profiles
Idempotency Requirement
Choosing Parameters
max_retry_count
retry_interval
Error Handling
Monitoring
Performance Considerations
Benefits
Costs
Load Impact
Custom Policy
Best Practices
Interaction with Other Policies
Example: Read-Heavy Application
Next Steps

Speculative execution allows the driver to send parallel requests to multiple nodes when the current node takes too long to respond. This can significantly reduce tail latencies for time-sensitive queries.

Overview

When a request is sent to a node, if it takes longer than a specified threshold to respond, the driver sends additional speculative requests to other nodes in parallel. The first response received is used, and the remaining requests are cancelled.

When to Use

Speculative execution is beneficial for:

Read-heavy workloads
Latency-sensitive applications
Scenarios where occasional slow nodes are expected
Reducing tail latencies (p99, p999)

Not recommended for:

Write-heavy workloads
Non-idempotent operations
Bandwidth-constrained environments
When consistent latency is more important than tail latency

SpeculativeExecutionPolicy Trait

Any type implementing this trait can be used:

pub trait SpeculativeExecutionPolicy: std::fmt::Debug + Send + Sync {
    fn max_retry_count(&self, context: &Context) -> usize;
    fn retry_interval(&self, context: &Context) -> Duration;
}

SimpleSpeculativeExecutionPolicy

Sends a fixed number of speculative requests at fixed intervals:

use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use scylla::client::execution_profile::ExecutionProfile;
use std::sync::Arc;
use std::time::Duration;

let policy = SimpleSpeculativeExecutionPolicy {
    max_retry_count: 2,
    retry_interval: Duration::from_millis(100),
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

Configuration

max_retry_count: Maximum number of speculative requests (excludes original request)
retry_interval: Delay between each speculative request

Example Timeline

t=0ms:   Send original request to Node1
t=100ms: No response yet, send speculative request to Node2
t=200ms: No response yet, send speculative request to Node3
t=250ms: Node2 responds, cancel Node1 and Node3 requests

PercentileSpeculativeExecutionPolicy

Triggers speculative execution based on latency percentiles (requires metrics feature):

use scylla::policies::speculative_execution::PercentileSpeculativeExecutionPolicy;

let policy = PercentileSpeculativeExecutionPolicy {
    max_retry_count: 2,
    percentile: 99.0,
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

How It Works

Uses collected metrics to calculate the specified percentile latency
Sets retry_interval to that percentile value
Adapts automatically to cluster performance
Falls back to 100ms if metrics are unavailable

Configuration

max_retry_count: Maximum number of speculative requests
percentile: Latency percentile to use as threshold (e.g., 99.0 for p99)

Enabling Metrics

For PercentileSpeculativeExecutionPolicy, enable the metrics feature:

[dependencies]
scylla = { version = "*", features = ["metrics"] }

Using with Execution Profiles

Attach the policy via execution profile:

use scylla::client::execution_profile::ExecutionProfile;
use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use std::sync::Arc;
use std::time::Duration;

let policy = SimpleSpeculativeExecutionPolicy {
    max_retry_count: 3,
    retry_interval: Duration::from_millis(100),
};

let profile = ExecutionProfile::builder()
    .speculative_execution_policy(Some(Arc::new(policy)))
    .build();

let handle = profile.into_handle();

// Use with session
let session = SessionBuilder::new()
    .known_node("127.0.0.1:9042")
    .default_execution_profile_handle(handle)
    .build()
    .await?;

Idempotency Requirement

Only use speculative execution with idempotent queries:

let mut query = Statement::from("SELECT * FROM keyspace.table");
query.set_is_idempotent(true);

Speculative execution with non-idempotent queries can lead to:

Duplicate writes
Inconsistent data
Unexpected behavior

Choosing Parameters

max_retry_count

// Conservative (lower resource usage)
max_retry_count: 1

// Moderate (balanced)
max_retry_count: 2

// Aggressive (best tail latency, higher load)
max_retry_count: 3

retry_interval

// For low-latency clusters (sub-10ms p50)
retry_interval: Duration::from_millis(20)

// For medium-latency clusters (10-50ms p50)
retry_interval: Duration::from_millis(100)

// For high-latency clusters (>50ms p50)
retry_interval: Duration::from_millis(200)

Rule of thumb: Set retry_interval slightly below your p99 latency.

Error Handling

Some errors can be ignored if they appear on one node:

// These errors trigger retry on next node:
- ConnectionPoolError
- BrokenConnectionError
- UnableToAllocStreamId
- Database errors that can be speculative_retry'd

// These errors terminate all speculative executions:
- SerializationError
- CqlRequestSerialization
- ResultParseError

Monitoring

Track speculative execution effectiveness:

// Check if original request was slower than retry_interval
let history = session.get_query_history();

// Count speculative fibers in history
for request in &history.requests {
    let spec_count = request.speculative_fibers.len();
    println!("Speculative requests sent: {}", spec_count);
}

Performance Considerations

Benefits

Reduced tail latencies (p95, p99, p999)
Better user experience for time-sensitive operations
Protection against occasional slow nodes

Costs

Increased network bandwidth usage
Higher cluster load
More complex debugging
Resource usage even when not needed

Load Impact

With max_retry_count=2 and requests taking longer than retry_interval:

Original request: 100% of nodes
First speculative: +100% load
Second speculative: +100% load
Total: Up to 3x load

Custom Policy

Implement the trait for custom behavior:

use scylla::policies::speculative_execution::{
    SpeculativeExecutionPolicy, Context
};
use std::time::Duration;

#[derive(Debug)]
struct AdaptivePolicy {
    base_interval: Duration,
}

impl SpeculativeExecutionPolicy for AdaptivePolicy {
    fn max_retry_count(&self, _context: &Context) -> usize {
        // Adjust based on context
        2
    }

    fn retry_interval(&self, _context: &Context) -> Duration {
        // Could adjust based on time of day, cluster state, etc.
        self.base_interval
    }
}

Best Practices

Start with conservative settings (max_retry_count=1)
Only use with idempotent queries
Monitor cluster load and adjust parameters
Set retry_interval based on your p95/p99 latency
Consider using PercentileSpeculativeExecutionPolicy for auto-tuning
Test under load to ensure cluster can handle additional requests
Disable for write-heavy workloads
Use query history to debug and tune settings

Interaction with Other Policies

Load Balancing: Determines which nodes receive speculative requests
Retry Policy: Works independently; both retries and speculative requests can happen
Timeouts: Request timeout applies to all speculative executions collectively

Example: Read-Heavy Application

use scylla::policies::speculative_execution::SimpleSpeculativeExecutionPolicy;
use scylla::client::execution_profile::ExecutionProfile;
use std::sync::Arc;
use std::time::Duration;

// Profile for latency-sensitive reads
let read_profile = ExecutionProfile::builder()
    .consistency(Consistency::LocalQuorum)
    .speculative_execution_policy(Some(Arc::new(
        SimpleSpeculativeExecutionPolicy {
            max_retry_count: 2,
            retry_interval: Duration::from_millis(50),
        }
    )))
    .build();

let mut query = Statement::from("SELECT * FROM users WHERE id = ?");
query.set_is_idempotent(true);
query.set_execution_profile_handle(Some(read_profile.into_handle()));

Next Steps

Retry Policy - Control retry behavior
Load Balancing - Control node selection
Metrics - Monitor performance
Tracing - Debug query execution

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us