Query Processing Pipeline
Understanding how Snuba processes queries helps you write better ones:- Parsing: Query string → AST
- Validation: Check entity requirements
- Logical Processing: Apply entity-specific transformations
- Storage Selection: Choose optimal table/view
- Translation: Logical → Physical query
- Physical Processing: Apply optimizations
- Execution: Run on ClickHouse
Key Optimization Principles
1. Always Include Required Conditions
Most entities require specific conditions to prevent expensive queries.2. Use Appropriate Time Ranges
Smaller time ranges are more efficient:- ✅ Query hours or days for real-time dashboards
- ✅ Query weeks for trend analysis
- ⚠️ Query months sparingly
- ❌ Avoid querying years of data
3. Leverage Indexes
ClickHouse indexes certain columns for faster queries: Indexed Columns (typically):project_idtimestamp- Tags (when promoted)
event_id,group_id,transaction_id
4. Use Appropriate Limits
Always specify reasonable limits:Default limit is 1000 rows. For pagination, use
LIMIT with OFFSET:SnQL Optimization Techniques
Tag Queries
Snuba optimizes tag queries in different ways:- Use equality (
=) when possible - Use
INfor multiple values:tags[environment] IN array('prod', 'staging') - Avoid
LIKEon tags unless necessary
Aggregation Optimization
Use Appropriate Aggregation Functions
- Fast Aggregations
- Slower Aggregations
count()sum()min(),max()uniq()- Uses HyperLogLog (approximate)
Limit Grouping Cardinality
High-cardinality grouping is expensive:- Group by categorical data (environment, status codes)
- Avoid grouping by UUIDs, timestamps, or high-cardinality strings
- Use
LIMIT BYto limit results per group
Sampling for Exploration
Use sampling for exploratory queries:turbo flag:
Subquery Optimization
Subqueries add overhead. Use them strategically:Join Optimization
Joins are expensive. Optimize by:- Push filters into subqueries: Filter before joining
- Use appropriate join types: Defined by entity relationships
- Join on indexed columns: Join keys are typically indexed
MQL Optimization Techniques
Choose Appropriate Granularity
Granularity affects query performance:- Match
granularityto storage configuration - Use larger intervals for longer time ranges
- 1 minute (60s) for real-time dashboards
- 1 hour (3600s) for daily views
- 1 day (86400s) for weekly/monthly views
Filter Early
Filters in the MQL string are more efficient than post-processing:Use Appropriate Aggregations
Match aggregation to metric type:- Counters:
sum(),count() - Distributions:
quantiles(),avg(),sum() - Sets:
uniq() - Gauges:
avg(),last(),min(),max()
Limit Grouping Dimensions
Limit the number of group-by tags:Storage-Level Optimizations
Final Queries
Snuba may applyFINAL to ClickHouse queries to handle deduplication:
- Forces merge of duplicate rows
- Increases query time
- Applied automatically when necessary
- Queries on ReplacingMergeTree tables
- When consistent results are required
- Using
turbo: true(disables FINAL, may return duplicates) - Querying read-replica storages (already deduplicated)
Storage Selection
Snuba may have multiple storages for an entity:- Primary: For consistent queries
- Replica: For high-throughput queries
- Materialized Views: Pre-aggregated for specific queries
- Query pattern
consistentflag- Available storages
Monitoring Query Performance
Use debug mode to understand query performance:Interpret Timing Breakdown
execute: Time spent in ClickHouse - optimize query if highprepare_query: Query processing time - usually lowvalidate_schema: Validation time - should be minimal
Interpret Stats
final: true- Consider using turbo mode if duplicates are acceptableresult_rows: 100000- Reduce limit or use pagination- Many results with no grouping - Add filtering or grouping
Review Generated SQL
Debug mode includes the generated SQL:PREWHEREvsWHERE- PREWHERE is faster for indexed columnsFINALkeyword - Indicates deduplication overhead- Index usage - Conditions on indexed columns
Common Performance Issues
Issue: Slow Tag Queries
- Problem
- Solution
Issue: High Cardinality Grouping
- Problem
- Solution
Issue: Missing Required Conditions
- Problem
- Solution
Issue: Large Time Range
- Problem
- Solution
Query Optimization Checklist
✅ Required Conditions
- Include
project_idcondition - Include timestamp range
- Use
IN array()for multiple projects
✅ Time Ranges
- Use smallest reasonable time range
- Prefer hours/days over weeks/months
- Use appropriate datetime format
✅ Filtering
- Filter on indexed columns when possible
- Use equality over LIKE for tags
- Push filters before joins/aggregations
✅ Aggregations
- Use appropriate functions for metric types
- Prefer approximate functions (uniq vs uniqExact)
- Group by low-cardinality columns
✅ Limits
- Always use reasonable limits (≤1000)
- Use LIMIT BY for per-group limits
- Paginate large result sets
✅ Sampling
- Use SAMPLE for exploratory queries
- Enable turbo mode during development
- Scale results appropriately
Advanced Techniques
Concurrent Query Limits
Snuba tracks concurrent queries per project and globally:- Caching results
- Batching queries
- Reducing query frequency
Rate Limiting
Snuba has rate limits per referrer and organization:- Using appropriate polling intervals
- Caching query results
- Batching similar queries
Allocation Policies
Snuba uses allocation policies to manage resources:- Bytes Scanned: Rejects queries scanning too much data
- Concurrent Queries: Limits parallel queries
- Query Duration: May throttle long-running queries
Testing Query Performance
When optimizing queries:- Enable debug mode: See timing and generated SQL
- Compare variations: Test different approaches
- Check stats: Look for red flags (FINAL, high row counts)
- Use sampling first: Test on small data samples
- Monitor production: Track query performance over time
Query performance can vary based on:
- Time of day (concurrent load)
- Data volume
- ClickHouse cluster health
- Recent cache state
Next Steps
SnQL Syntax
Learn complete SnQL syntax
MQL Syntax
Learn complete MQL syntax
Query Overview
Understand query fundamentals