Architecture Overview
The data model consists of three core concepts arranged hierarchically:The logical model (Datasets and Entities) is what clients see through the query language. The physical model (Storages) maps to actual database tables.
Datasets
A Dataset is a namespace over Snuba data. It provides its own schema and is independent from other datasets in both logical and physical models.Characteristics
- Isolated - No relationships between different datasets
- Self-contained - Each has its own schema and configuration
- Query scoped - Every query targets exactly one dataset
Examples
discover- Events and transactions for the Discover productoutcomes- Billing and quota datasessions- Release health metricsmetrics- Custom metrics aggregations
Entities and Entity Types
The fundamental building block of the logical data model is the Entity. An entity represents an instance of an abstract concept (like a transaction or an error).Entity vs Entity Type
- Entity - A single instance (e.g., one error event)
- Entity Type - The class of entities (e.g., all Errors or all Transactions)
In practice, an Entity corresponds to a row in a database table. The Entity Type defines the schema for all such rows.
Entity Schema
Each Entity Type has:- Column set - Fields with abstract data types
- Validators - Query validation rules
- Processors - Logical query transformations
- Relationships - Joins to other entity types
Entity Relationships
Entity Types within a Dataset can be related in two ways:1. Entity Set Relationship (Foreign Keys)
Mimics foreign key relationships for joins between Entity Types:- Supports one-to-one and one-to-many relationships
- Enables JOIN queries across entities
- Example: Errors can join with GroupedMessage
2. Inheritance Relationship (Subtyping)
Mimics nominal subtyping where entity types share a parent:- Subtypes inherit schema from parent type
- Parent represents union of all subtypes
- Queries can target parent to query all subtypes
- Example: Events entity is parent of Errors and Transactions
Example: Discover Dataset with Inheritance
Example: Discover Dataset with Inheritance
Entity Type and Consistency
The Entity Type is the largest unit where Snuba can provide strong consistency guarantees:- Possible to query with Serializable Consistency
- Does not extend to multi-entity queries
- Subscription queries work on one Entity Type at a time
Storages
Storages represent the physical data model - they map directly to ClickHouse database concepts.Storage Characteristics
- Physical mapping - Each storage is a ClickHouse table or materialized view
- Entity relationship - Each storage backs exactly one entity type
- Schema definition - Reflects physical database schema
- DDL generation - Provides details to generate CREATE TABLE statements
Storage Types
ReadableStorage
Anything that can be queried:- ClickHouse tables
- ClickHouse views
- Materialized views
- Provides query processors for optimization
WritableStorage
Anything that can be written to:- ClickHouse tables (not views)
- Provides table writer for inserts
- Connected to Kafka stream loader
Storage-Entity Mapping Rules
- Readable Storages: Each entity type must be backed by at least one readable storage (can have multiple for optimization)
- Writable Storage: Each entity type must have exactly one writable storage for data ingestion
- Exclusive Relationship: Each storage backs exclusively one entity type
Data Model Examples
Single Entity Dataset
Simple dataset with one entity type and multiple storages for performance:Multi-Entity Dataset with Inheritance
Discover dataset demonstrating inheritance:- Errors has two storages: main table and read-only replicas
- Events entity provides unified view over both
- Queries to Events return union of Errors and Transactions
Dataset with Joins
Dataset supporting joins between entities:Configuration Example
Storage configuration from YAML:Key Takeaways
Logical Abstraction
Datasets and Entities provide stable client-facing interface
Physical Flexibility
Storages enable internal optimization without breaking API
Multiple Storages
Pre-aggregations and read replicas improve performance
Consistency Boundaries
Entity Type defines maximum consistency scope
Related Topics
- Storage - ClickHouse storage implementation details
- Query Processing - How queries traverse the data model
- Slicing - Multi-tenancy and data partitioning