Query Optimization

Query optimization is crucial for maintaining good performance in Snuba. This guide covers techniques, best practices, and common pitfalls to help you write efficient queries.

Query Processing Pipeline

Understanding how Snuba processes queries helps you write better ones:

Parsing: Query string → AST
Validation: Check entity requirements
Logical Processing: Apply entity-specific transformations
Storage Selection: Choose optimal table/view
Translation: Logical → Physical query
Physical Processing: Apply optimizations
Execution: Run on ClickHouse

Optimizations happen at multiple stages, but writing efficient queries from the start is most important.

Key Optimization Principles

1. Always Include Required Conditions

Most entities require specific conditions to prevent expensive queries.

Required Conditions (typically):

project_id condition
Timestamp range condition

Queries without these will be rejected or perform very poorly.

MATCH (events)
SELECT count() AS total
WHERE project_id IN array(1, 2, 3)
  AND timestamp >= toDateTime('2024-01-01')
  AND timestamp < toDateTime('2024-01-02')

2. Use Appropriate Time Ranges

Smaller time ranges are more efficient:

✅ Query hours or days for real-time dashboards
✅ Query weeks for trend analysis
⚠️ Query months sparingly
❌ Avoid querying years of data

-- Good: 24 hour window
WHERE timestamp >= toDateTime('2024-01-01T00:00:00')
  AND timestamp < toDateTime('2024-01-02T00:00:00')

-- Better: 1 hour window for real-time
WHERE timestamp >= toDateTime('2024-01-01T12:00:00')
  AND timestamp < toDateTime('2024-01-01T13:00:00')

3. Leverage Indexes

ClickHouse indexes certain columns for faster queries: Indexed Columns (typically):

project_id
timestamp
Tags (when promoted)
event_id, group_id, transaction_id

WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01')
  AND tags[environment] = 'production'

4. Use Appropriate Limits

Always specify reasonable limits:

-- Good: Explicit reasonable limit
LIMIT 100

-- Acceptable: Use default (1000)
-- (no LIMIT clause)

-- Bad: Very large limit
LIMIT 100000

Default limit is 1000 rows. For pagination, use LIMIT with OFFSET:

LIMIT 100 OFFSET 200

SnQL Optimization Techniques

Tag Queries

Snuba optimizes tag queries in different ways:

WHERE tags[environment] = 'production'
-- Uses hashmap with bloom filter

Best Practices:

Use equality (=) when possible
Use IN for multiple values: tags[environment] IN array('prod', 'staging')
Avoid LIKE on tags unless necessary

Aggregation Optimization

Use Appropriate Aggregation Functions

Fast Aggregations
Slower Aggregations

count()
sum()
min(), max()
uniq() - Uses HyperLogLog (approximate)

uniqExact() - Exact unique count
quantile() - With high precision
groupArray() - Collects all values

-- Fast: Approximate unique count
SELECT uniq(user_id) AS unique_users

-- Slow: Exact unique count
SELECT uniqExact(user_id) AS unique_users

Limit Grouping Cardinality

High-cardinality grouping is expensive:

SELECT count() AS total
BY environment, status_code
-- Few unique combinations

Tips:

Group by categorical data (environment, status codes)
Avoid grouping by UUIDs, timestamps, or high-cardinality strings
Use LIMIT BY to limit results per group

Sampling for Exploration

Use sampling for exploratory queries:

MATCH (events SAMPLE 0.1)  -- Sample 10%
SELECT count() * 10 AS estimated_total  -- Scale up result
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01')

Or use the turbo flag:

{
  "query": "...",
  "turbo": true
}

Subquery Optimization

Subqueries add overhead. Use them strategically:

MATCH {
  MATCH (transactions)
  SELECT avg(duration) AS avg_d BY transaction
  WHERE project_id = 1
    AND timestamp >= toDateTime('2024-01-01')
    AND timestamp < toDateTime('2024-01-02')
}
SELECT max(avg_d) AS max_avg

Join Optimization

Joins are expensive. Optimize by:

Push filters into subqueries: Filter before joining
Use appropriate join types: Defined by entity relationships
Join on indexed columns: Join keys are typically indexed

MATCH (e: events) -[grouped]-> (gm: groupedmessage)
SELECT count() AS total BY e.group_id
WHERE e.project_id = 1  -- Filter early
  AND gm.project_id = 1
  AND e.timestamp >= toDateTime('2024-01-01')
  AND e.timestamp < toDateTime('2024-01-02')
  AND gm.status = 0  -- Filter on join target

MQL Optimization Techniques

Choose Appropriate Granularity

Granularity affects query performance:

{
  "rollup": {
    "interval": 3600,      // 1 hour intervals
    "granularity": 3600    // Matches storage granularity
  }
}

Guidelines:

Match granularity to storage configuration
Use larger intervals for longer time ranges
1 minute (60s) for real-time dashboards
1 hour (3600s) for daily views
1 day (86400s) for weekly/monthly views

Filter Early

Filters in the MQL string are more efficient than post-processing:

sum(transaction.duration){environment:prod, status_code:200}

Use Appropriate Aggregations

Match aggregation to metric type:

Counters: sum(), count()
Distributions: quantiles(), avg(), sum()
Sets: uniq()
Gauges: avg(), last(), min(), max()

-- Good: Appropriate for distribution metric
quantiles(0.5, 0.95, 0.99)(d:transactions/duration@millisecond)

-- Bad: Sum is less meaningful for distributions
sum(d:transactions/duration@millisecond)

Limit Grouping Dimensions

Limit the number of group-by tags:

-- Good: 1-2 dimensions
sum(transaction.duration) by environment

-- Acceptable: 2-3 dimensions
sum(transaction.duration) by environment, status_code

-- Avoid: Many dimensions
sum(transaction.duration) by environment, status_code, transaction, user_id

Storage-Level Optimizations

Final Queries

Snuba may apply FINAL to ClickHouse queries to handle deduplication:

SELECT ... FROM events_local FINAL WHERE ...

Impact:

Forces merge of duplicate rows
Increases query time
Applied automatically when necessary

When Applied:

Queries on ReplacingMergeTree tables
When consistent results are required

Avoid by:

Using turbo: true (disables FINAL, may return duplicates)
Querying read-replica storages (already deduplicated)

Storage Selection

Snuba may have multiple storages for an entity:

Primary: For consistent queries
Replica: For high-throughput queries
Materialized Views: Pre-aggregated for specific queries

Snuba automatically selects the best storage based on:

Query pattern
consistent flag
Available storages

Monitoring Query Performance

Use debug mode to understand query performance:

{
  "query": "...",
  "debug": true
}

Interpret Timing Breakdown

{
  "timing": {
    "duration_ms": 95,
    "marks_ms": {
      "cache_get": 1,
      "validate_schema": 34,
      "prepare_query": 10,
      "rate_limit": 4,
      "execute": 39,
      "cache_set": 4
    }
  }
}

Key Metrics:

execute: Time spent in ClickHouse - optimize query if high
prepare_query: Query processing time - usually low
validate_schema: Validation time - should be minimal

Interpret Stats

{
  "stats": {
    "clickhouse_table": "events_local",
    "final": false,
    "sample": null,
    "result_rows": 100,
    "result_cols": 3
  }
}

Red Flags:

final: true - Consider using turbo mode if duplicates are acceptable
result_rows: 100000 - Reduce limit or use pagination
Many results with no grouping - Add filtering or grouping

Review Generated SQL

Debug mode includes the generated SQL:

{
  "sql": "SELECT count() FROM events_local WHERE project_id = 1 AND timestamp >= ..."
}

Check for:

PREWHERE vs WHERE - PREWHERE is faster for indexed columns
FINAL keyword - Indicates deduplication overhead
Index usage - Conditions on indexed columns

Common Performance Issues

Issue: Slow Tag Queries

Problem
Solution

WHERE tags[custom_tag] LIKE '%value%'

Scanning all tag arrays is slow

WHERE tags[custom_tag] = 'value'

Use equality for hashmap optimization

Issue: High Cardinality Grouping

Problem
Solution

SELECT count() BY event_id, user_id, url

Too many unique combinations

SELECT count() BY environment, status_code
LIMIT 100

Group by categorical data with limits

Issue: Missing Required Conditions

Problem
Solution

MATCH (events)
SELECT count()
WHERE type = 'error'

No project_id or timestamp

MATCH (events)
SELECT count()
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01')
  AND timestamp < toDateTime('2024-01-02')
  AND type = 'error'

Always include requirements

Issue: Large Time Range

Problem
Solution

WHERE timestamp >= toDateTime('2020-01-01')
  AND timestamp < toDateTime('2024-01-01')

Querying 4 years of data

WHERE timestamp >= toDateTime('2024-01-01')
  AND timestamp < toDateTime('2024-01-02')

Use smaller time windows