Skip to main content
ScyllaDB and Cassandra provide built-in tracing capabilities that record detailed information about query execution across the cluster. The driver can enable tracing and retrieve this information for debugging and performance analysis.

Overview

When tracing is enabled for a query, the cluster records:
  • Which coordinator handled the query
  • Query execution parameters
  • Events that occurred during execution
  • Which nodes were contacted
  • Timing information
  • Thread/shard information
Tracing data is stored in system_traces.sessions and system_traces.events tables.

Enabling Tracing

Enable tracing on a per-statement basis:
use scylla::statement::unprepared::Statement;

let mut query = Statement::from("SELECT * FROM keyspace.table");
query.set_tracing(true);

let result = session.execute(&query, &[]).await?;

For Prepared Statements

use scylla::statement::prepared::PreparedStatement;

let mut prepared = session.prepare("SELECT * FROM keyspace.table").await?;
prepared.set_tracing(true);

let result = session.execute(&prepared, &[]).await?;

For Batch Statements

use scylla::batch::Batch;

let mut batch = Batch::default();
batch.set_tracing(true);

batch.append_statement("INSERT INTO table (a) VALUES (?)");
let result = session.batch(&batch, (1,)).await?;

Retrieving Tracing Information

Get the tracing ID from query results:
let result = session.execute(&query, &[]).await?;

if let Some(tracing_id) = result.tracing_id {
    println!("Tracing ID: {}", tracing_id);
    
    // Retrieve full tracing information
    let tracing_info = session.get_tracing_info(&tracing_id).await?;
}

TracingInfo Structure

The driver provides structured tracing information:
use scylla::tracing::TracingInfo;

pub struct TracingInfo {
    pub client: Option<IpAddr>,
    pub command: Option<String>,
    pub coordinator: Option<IpAddr>,
    pub duration: Option<i32>,  // microseconds
    pub parameters: Option<HashMap<String, String>>,
    pub request: Option<String>,
    pub started_at: Option<CqlTimestamp>,
    pub events: Vec<TracingEvent>,
}

Examining Tracing Data

Basic Information

let tracing_info = session.get_tracing_info(&tracing_id).await?;

println!("Coordinator: {:?}", tracing_info.coordinator);
println!("Duration: {} μs", tracing_info.duration.unwrap_or(0));
println!("Request: {}", tracing_info.request.unwrap_or_default());
println!("Command: {}", tracing_info.command.unwrap_or_default());

Query Parameters

if let Some(params) = &tracing_info.parameters {
    for (key, value) in params {
        println!("{}: {}", key, value);
    }
}

// Common parameters:
// - consistency_level
// - page_size
// - query
// - serial_consistency_level
// - user_timestamp

Events

Examine events that occurred during execution:
for event in &tracing_info.events {
    println!("[{:?}] {} - {}",
        event.source,
        event.source_elapsed.unwrap_or(0),
        event.activity.as_ref().unwrap_or(&String::from("Unknown"))
    );
}

TracingEvent Structure

pub struct TracingEvent {
    pub event_id: CqlTimeuuid,
    pub activity: Option<String>,
    pub source: Option<IpAddr>,
    pub source_elapsed: Option<i32>,  // microseconds
    pub thread: Option<String>,
}

Example Event Activities

  • “Execute CQL3 query”
  • “Parsing a statement [shard 1]”
  • “Sending a mutation to /127.0.0.1 [shard 1]”
  • “Request complete”
  • “Computing ranges to query”
  • “Submitting range requests on N ranges”

Analyzing Node Involvement

let nodes = tracing_info.nodes();
println!("Query involved {} nodes:", nodes.len());
for node in nodes {
    println!("  - {}", node);
}

Complete Example

use scylla::{Session, SessionBuilder};
use scylla::statement::unprepared::Statement;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let session: Session = SessionBuilder::new()
        .known_node("127.0.0.1:9042")
        .build()
        .await?;

    // Enable tracing
    let mut query = Statement::from("SELECT * FROM keyspace.table WHERE pk = ?");
    query.set_tracing(true);

    // Execute query
    let result = session.execute(&query, (42,)).await?;

    // Get tracing information
    if let Some(tracing_id) = result.tracing_id {
        // Wait a bit for tracing data to be written
        tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
        
        let tracing_info = session.get_tracing_info(&tracing_id).await?;

        println!("\n=== Query Tracing ===");
        println!("Coordinator: {:?}", tracing_info.coordinator);
        println!("Duration: {} μs", tracing_info.duration.unwrap_or(0));
        println!("Nodes involved: {}", tracing_info.nodes().len());
        
        println!("\n=== Events ===");
        for event in &tracing_info.events {
            println!("[{:6} μs] [{:?}] [{}] {}",
                event.source_elapsed.unwrap_or(0),
                event.source.unwrap(),
                event.thread.as_ref().unwrap_or(&String::from("unknown")),
                event.activity.as_ref().unwrap_or(&String::from("Unknown"))
            );
        }
    }

    Ok(())
}

Timing Considerations

Tracing data is written asynchronously:
// Query completes
let result = session.execute(&query, &[]).await?;

// Wait for tracing data to be written (ScyllaDB writes it with 0-10ms delay)
tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;

// Now retrieve tracing info
if let Some(tracing_id) = result.tracing_id {
    let tracing_info = session.get_tracing_info(&tracing_id).await?;
}

Retention

Tracing data has a default TTL:
  • ScyllaDB: 24 hours
  • Cassandra: 24 hours
You can adjust TTL in system_traces keyspace settings.

Use Cases

Debugging Slow Queries

let mut query = Statement::from("SELECT * FROM large_table");
query.set_tracing(true);

let result = session.execute(&query, &[]).await?;

if let Some(tracing_id) = result.tracing_id {
    tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
    let tracing = session.get_tracing_info(&tracing_id).await?;
    
    // Find the slowest operation
    let max_elapsed = tracing.events.iter()
        .filter_map(|e| e.source_elapsed)
        .max()
        .unwrap_or(0);
    
    println!("Query took {} μs", tracing.duration.unwrap_or(0));
    println!("Slowest operation: {} μs", max_elapsed);
}

Verifying Query Routing

let mut query = Statement::from("SELECT * FROM keyspace.table WHERE pk = ?");
query.set_tracing(true);

let result = session.execute(&query, (42,)).await?;

if let Some(tracing_id) = result.tracing_id {
    tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
    let tracing = session.get_tracing_info(&tracing_id).await?;
    
    println!("Coordinator: {:?}", tracing.coordinator);
    println!("Contacted nodes: {:?}", tracing.nodes());
    
    // Verify token-aware routing worked
    if tracing.nodes().len() == 1 {
        println!("Query was token-aware (single node contacted)");
    } else {
        println!("Query required {} nodes", tracing.nodes().len());
    }
}

Consistency Level Verification

if let Some(params) = &tracing_info.parameters {
    if let Some(cl) = params.get("consistency_level") {
        println!("Query used consistency level: {}", cl);
    }
}

Performance Impact

Tracing has overhead:
  • Cluster writes tracing data to system_traces tables
  • Additional latency (typically 1-5ms)
  • Storage overhead
  • CPU overhead for generating traces
Do not enable tracing in production for all queries. Use it selectively for debugging.

Best Practices

  • Enable tracing only when debugging specific issues
  • Wait 100-200ms before retrieving tracing data
  • Disable tracing in production unless investigating issues
  • Use tracing to verify:
    • Token-aware routing is working
    • Expected consistency levels
    • Node involvement in queries
    • Query performance bottlenecks
  • Consider using query history for production monitoring
  • Store tracing IDs for queries you want to investigate later

Limitations

  • Tracing adds latency and overhead
  • Tracing data is eventually consistent
  • May not capture all details for very fast queries
  • Retention period is limited (default 24h)
  • Cannot trace internal driver operations

Tracing vs. History

FeatureTracingHistory
GranularityCluster eventsDriver events
LocationServer-sideClient-side
OverheadModerateMinimal
CoverageQuery executionRetries, speculative
Storagesystem_tracesIn-memory
Retention24h defaultUntil cleared
Use both together for complete visibility:
  • Tracing: What happened in the cluster
  • History: What the driver did

Next Steps

Build docs developers (and LLMs) love