Storage Layer

Snuba’s storage layer is built on ClickHouse, a columnar database optimized for analytical queries on time-series data. This architecture provides the real-time performance and scalability required for Sentry’s query infrastructure.

Why ClickHouse?

ClickHouse was selected as Snuba’s backing storage for several key reasons:

Real-Time Performance

Fast query execution on large datasets with columnar storage

Distributed Architecture

Built-in sharding and replication for horizontal scaling

Flexible Storage Engines

Multiple table engines for different consistency/performance tradeoffs

SQL Interface

Familiar query language with powerful extensions

ClickHouse Cluster Architecture

Snuba manages connections to ClickHouse through cluster abstractions:

# From snuba/clusters/cluster.py
class ClickhouseCluster:
    """
    Provides reader, writer and ClickHouse connections shared by all 
    storages on the cluster.
    """
    
    def __init__(
        self,
        host: str,
        port: int,
        database: str,
        http_port: int,
        storage_sets: Set[str],
        single_node: bool,
        cluster_name: Optional[str] = None,
        distributed_cluster_name: Optional[str] = None,
    ):
        self.__query_node = ClickhouseNode(host, port)
        self.__database = database
        self.__single_node = single_node
        self.__cluster_name = cluster_name

Cluster Types

Single Node Cluster

One ClickHouse server instance
Simplified operations and migrations
No distributed tables required
Suitable for development and small deployments

Multi-Node Cluster

Multiple shards and replicas
Requires cluster_name for local tables
Requires distributed_cluster_name for distributed tables
Enables horizontal scaling

A single proxy address is used for all read/write operations, but DDL operations must run on each individual node.

Cluster Configuration

Clusters are defined in settings.py:

CLUSTERS = [
    {
        "host": "clickhouse-server",
        "port": 9000,
        "http_port": 8123,
        "user": "default",
        "password": "",
        "database": "default",
        "storage_sets": {"events", "transactions"},
        "single_node": False,
        "cluster_name": "cluster_one_sh",
        "distributed_cluster_name": "cluster_one_sh_dist",
    }
]

Table Structure

Local vs Distributed Tables

In multi-node deployments, Snuba creates two types of tables:

Local Tables

Suffix: _local
Physical storage on each shard
Data written directly here
Contains actual data files

-- Example: errors_local table
CREATE TABLE IF NOT EXISTS errors_local
(
    project_id UInt64,
    timestamp DateTime,
    event_id UUID,
    platform String,
    message String,
    -- ... more columns
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{shard}/errors_local', '{replica}')
PARTITION BY (retention_days, toMonday(timestamp))
ORDER BY (project_id, toStartOfDay(timestamp), event_id)
SAMPLE BY cityHash64(event_id)

Distributed Tables

Suffix: _dist
Virtual table that routes queries to local tables
Queries distributed across all shards
No data stored locally

-- Example: errors_dist distributed table
CREATE TABLE IF NOT EXISTS errors_dist
AS errors_local
ENGINE = Distributed('cluster_one_sh', default, errors_local, cityHash64(event_id))

Single-node deployments only create local tables. Distributed tables are unnecessary overhead.

Table Engines

ClickHouse provides multiple table engines. Snuba primarily uses these:

ReplicatedReplacingMergeTree

Most common engine for Snuba tables:

ENGINE = ReplicatedReplacingMergeTree(
    '/clickhouse/tables/{shard}/errors_local',  -- ZooKeeper path
    '{replica}',                                  -- Replica name
    deleted                                       -- Version column
)

Features:

Replication: Automatic data replication via ZooKeeper
Deduplication: Removes duplicate rows with same primary key
Replacement: Keeps row with highest version column value
Eventual consistency: Deduplication happens during merges

Use case: Primary storage for events, transactions, errors

ReplacingMergeTree deduplication is not guaranteed until merge. Use FINAL modifier for guaranteed deduplication (at performance cost).

ReplicatedMergeTree

Basic replicated storage without deduplication:

ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/table', '{replica}')

Features:

Replication via ZooKeeper
No automatic deduplication
Faster inserts than ReplacingMergeTree

Use case: Data where duplicates are acceptable or handled upstream

ReplicatedAggregatingMergeTree

Pre-aggregates data during merges:

ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/table', '{replica}')

Features:

Stores intermediate aggregation states
Combines states during merges
Requires AggregateFunction column types

Use case: Metrics pre-aggregations, materialized views

Materialized Views

Creates derived tables that auto-update:

CREATE MATERIALIZED VIEW outcomes_hourly_mv
TO outcomes_hourly_local
AS
SELECT
    toStartOfHour(timestamp) as hour,
    project_id,
    outcome,
    count() as count
FROM outcomes_raw_local
GROUP BY hour, project_id, outcome

Use case: Pre-aggregated views for faster queries

Storage Schema

Snuba storages are defined in YAML configuration:

# From snuba/datasets/configuration/events/storages/errors.yaml
version: v1
kind: writable_storage
name: errors
storage:
  key: errors
  set_key: events  # Cluster mapping
schema:
  columns:
    - { name: project_id, type: UInt, args: { size: 64 } }
    - { name: timestamp, type: DateTime }
    - { name: event_id, type: UUID }
    - { name: tags, type: Nested, args: {
        subcolumns: [
          { name: key, type: String },
          { name: value, type: String }
        ]
      }}
  local_table_name: errors_local
  dist_table_name: errors_dist
  partition_format:
    - retention_days
    - date

Key Schema Elements

Partition Format

Defines how data is partitioned on disk:

partition_format:
  - retention_days    # Data retention partition
  - date              # Time-based partition

Benefits:

Efficient data deletion (drop entire partitions)
Pruning irrelevant partitions during queries
Parallel processing across partitions

Order By / Primary Key

Defines sort order and primary key (from DDL):

ORDER BY (project_id, toStartOfDay(timestamp), event_id)

Impact:

Determines data locality on disk
Enables efficient range scans
Critical for query performance

Choose ORDER BY columns based on most common query patterns. First columns should be most selective.

Data Types

Snuba uses ClickHouse’s rich type system:

Primitive Types

columns:
  - { name: project_id, type: UInt, args: { size: 64 } }  # UInt64
  - { name: timestamp, type: DateTime }                    # DateTime
  - { name: event_id, type: UUID }                         # UUID
  - { name: message, type: String }                        # String
  - { name: retention_days, type: UInt, args: { size: 16 } }  # UInt16

Complex Types

Nested

Stores arrays of structs:

- name: tags
  type: Nested
  args:
    subcolumns:
      - { name: key, type: String }
      - { name: value, type: String }

Stored as parallel arrays: tags.key and tags.value

Array

Homogeneous arrays:

- name: sdk_integrations
  type: Array
  args:
    inner_type: { type: String }

Nullable

Allows NULL values (use sparingly):

- name: user_id
  type: String
  args:
    schema_modifiers: [nullable]

Nullable columns have performance overhead. Use empty strings or zero values instead when possible.

Connection Management

Snuba maintains connection pools for efficiency:

class ConnectionCache:
    """Caches ClickHouse connection pools per configuration"""
    
    def get_node_connection(
        self,
        client_settings: ClickhouseClientSettings,
        node: ClickhouseNode,
        user: str,
        password: str,
        database: str,
    ) -> ClickhousePool:
        # Reuse existing connection or create new one
        cache_key = (node, client_settings, user, password, database)
        if cache_key not in self.__cache:
            self.__cache[cache_key] = ClickhousePool(
                node.host_name,
                node.port,
                user,
                password,
                database,
                client_settings=settings,
            )
        return self.__cache[cache_key]

Client Settings

Different operations use different ClickHouse settings:

class ClickhouseClientSettings(Enum):
    QUERY = ClickhouseClientSettingsType({}, None)
    INSERT = ClickhouseClientSettingsType({}, None)
    MIGRATE = ClickhouseClientSettingsType({
        "replication_alter_partitions_sync": 2,
        "mutations_sync": 2,
        "alter_sync": 2,
    }, 300000)  # 5 minute timeout
    DELETE = ClickhouseClientSettingsType({"mutations_sync": 1}, None)
    REPLACE = ClickhouseClientSettingsType({
        "max_block_size": 8192,
        "max_memory_usage": 10_000_000_000,
    }, None)

Batch Writing

Snuba writes to ClickHouse in batches for efficiency:

class HTTPBatchWriter(BatchWriter[JSONRow]):
    """Batches JSON rows and sends via HTTP to ClickHouse"""
    
    def write(self, rows: Iterable[JSONRow]) -> None:
        # Format rows as JSON or native format
        # Send to ClickHouse HTTP endpoint
        # Handle retries and errors

Batch Configuration:

Chunk size: Number of rows per HTTP request
Buffer size: Memory buffer for accumulating rows
Max connections: Connection pool size

Storage Sets

Storages are grouped into Storage Sets that share a cluster:

# From snuba/clusters/storage_sets.py
class StorageSetKey(Enum):
    EVENTS = "events"
    TRANSACTIONS = "transactions" 
    METRICS = "metrics"
    SESSIONS = "sessions"
    OUTCOMES = "outcomes"

Purpose:

Co-locate related storages on same cluster
Share connection pools and resources
Enable cross-storage optimizations

Performance Considerations

Query Optimization

PREWHERE clause: Filter before reading all columns
Index usage: Leverage ORDER BY for range scans
Partition pruning: Filter by partition key columns
Sampling: Use SAMPLE clause for approximate queries

Write Optimization

Batch inserts: Write thousands of rows per request
Partition alignment: Respect partition boundaries
Deduplication: Rely on ReplacingMergeTree, not application

Storage Optimization

Compression: ClickHouse compresses data automatically
Materialized views: Pre-aggregate common queries
TTL policies: Auto-delete old partitions
Column selection: Only include necessary columns

ClickHouse’s columnar storage means queries only read columns they need. Add columns liberally without significant cost.

Data Model - Understanding Storages in the data model
Ingestion - How data is written to storage
Query Processing - How storage is queried
Slicing - Multi-tenant storage configuration

Get Started

Architecture

Query Languages

Datasets

Configuration

Migrations

Operations

​Why ClickHouse?

Real-Time Performance

Distributed Architecture

Flexible Storage Engines

SQL Interface

​ClickHouse Cluster Architecture

​Cluster Types

​Single Node Cluster

​Multi-Node Cluster

​Cluster Configuration

​Table Structure

​Local vs Distributed Tables

​Local Tables

​Distributed Tables

​Table Engines

​ReplicatedReplacingMergeTree

​ReplicatedMergeTree

​ReplicatedAggregatingMergeTree

​Materialized Views

​Storage Schema

​Key Schema Elements

​Partition Format

​Order By / Primary Key

​Data Types

​Primitive Types

​Complex Types

​Nested

​Array

​Nullable

​Connection Management

​Client Settings

​Batch Writing

​Storage Sets

​Performance Considerations

​Query Optimization

​Write Optimization

​Storage Optimization

​Related Topics

Build docs developers (and LLMs) love

Why ClickHouse?

ClickHouse Cluster Architecture

Cluster Types

Single Node Cluster

Multi-Node Cluster

Cluster Configuration

Table Structure

Local vs Distributed Tables

Local Tables

Distributed Tables

Table Engines

ReplicatedReplacingMergeTree

ReplicatedMergeTree

ReplicatedAggregatingMergeTree

Materialized Views

Storage Schema

Key Schema Elements

Partition Format

Order By / Primary Key

Data Types

Primitive Types

Complex Types

Nested

Array

Nullable

Connection Management

Client Settings

Batch Writing

Storage Sets

Performance Considerations

Query Optimization

Write Optimization

Storage Optimization

Related Topics