System Architecture
Snuba’s architecture consists of three main subsystems that work together to provide a complete data pipeline:Storage Layer
ClickHouse-backed storage with flexible table engines and replication
Ingestion Pipeline
Kafka-based event ingestion with message processing and batching
Query Engine
Multi-stage query processing with optimization and execution
Core Components
Storage
ClickHouse was chosen as the backing storage because it provides:- Real-time performance - Fast query execution for time-series data
- Distributed architecture - Horizontal scalability across multiple nodes
- Flexible storage engines - Multiple table engines for different use cases
- Replication - Data redundancy and high availability
- Materialized views - Pre-aggregated data for optimized queries
Ingestion
Snuba does not provide a direct API endpoint to insert rows (except in debug mode). Instead, data is loaded through a Kafka-based pipeline:- Kafka Topics - Events arrive on topics like
events,transactions,outcomes - Consumers - Process messages in batches from Kafka
- Message Processors - Transform Kafka messages to ClickHouse rows
- Batch Writers - Write batched data to ClickHouse tables
Each ClickHouse table is written to by exactly one consumer, enabling consistency guarantees through proper table engine selection.
Query Processing
Queries are expressed in SnQL (Snuba Query Language) or MQL (Metrics Query Language) and sent as HTTP POST requests. The query engine:- Parses the query into an Abstract Syntax Tree (AST)
- Validates against the data model schema
- Applies logical processors for product-level transformations
- Selects storage from available options
- Translates to physical ClickHouse query
- Applies physical processors for query optimization
- Executes on ClickHouse and returns results
Query Processing Pipeline
Query Processing Pipeline
Streaming Queries (Subscriptions)
Beyond point-in-time queries, Snuba supports streaming queries through the Subscription Engine:- Client registers a subscription query via HTTP endpoint
- Subscription Consumer monitors relevant Kafka topics
- Query runs periodically as new data arrives
- Results are produced to subscription result topics
Data Consistency Models
Snuba supports multiple consistency models:Eventual Consistency (Default)
- Queries can hit any ClickHouse replica
- No guarantee replicas are synchronized
- Optimal for high-throughput queries
Strong Consistency (Optional)
- Forces ClickHouse to reach consistency (FINAL keyword)
- Queries directed to specific replica consumer writes to
- Achieves sequential consistency
- Higher latency and resource usage
Deployment Architecture
In a typical Sentry deployment, Snuba processes multiple data pipelines:Errors and Transactions Pipeline
- Input:
eventsKafka topic (shared by errors and transactions) - Consumers: Errors Consumer and Transactions Consumer
- Tables:
errors_local/errors_distandtransactions_local/transactions_dist - Commit Log:
snuba-commit-logfor synchronization - Subscriptions: Separate subscription consumers for alerts
- Replacements: Errors mutations (merge/unmerge) via
replacementstopic
Sessions Pipeline
- Input:
sessionsKafka topic - Consumer: Sessions Consumer
- Purpose: Powers Release Health features
Outcomes Pipeline
- Input:
outcomesKafka topic - Consumer: Outcomes Consumer
- Purpose: Billing and stats data
Key Design Principles
- Horizontal scaling - Independent datasets and storage sets
- Batch processing - Efficient Kafka and ClickHouse operations
- Query optimization - Multi-stage processing pipeline
- Consistency tradeoffs - Configurable consistency models
- Multi-tenancy - Dataset slicing for isolation
Component Location in Source
- Storage:
snuba/clusters/cluster.py,snuba/datasets/storage.py - Ingestion:
snuba/consumers/,snuba/processor.py - Query:
snuba/web/query.py,snuba/pipeline/query_pipeline.py - Datasets:
snuba/datasets/configuration/
Next Steps
Data Model
Learn about Datasets, Entities, and Storages
Query Processing
Deep dive into the query pipeline
Ingestion
Understand Kafka consumers and message processing
Storage
Explore ClickHouse storage implementation