Skip to main content
Storage configurations define physical data storage in ClickHouse. Snuba supports readable storages (read-only views) and writable storages (tables with stream consumers).

Overview

Storage configurations specify:
  • Schema: Column definitions and table names
  • Storage keys: Unique identifiers and cluster assignments
  • Readiness state: Deployment environment availability
  • Query processors: Storage-level query optimizations
  • Allocation policies: Resource limits and rate limiting
  • Stream loader (writable only): Kafka consumer configuration
  • Deletion settings: Configuration for data deletion operations

Storage Types

Readable Storage

Read-only storage backed by ClickHouse tables or views. Used for immutable or replicated data.

Writable Storage

Read-write storage with Kafka stream consumers. Supports real-time data ingestion and updates.

Readable Storage Schema

version
string
required
Schema version. Must be v1.
kind
string
required
Component type. Must be readable_storage.
name
string
required
Unique name for the storage.
storage
object
required
Storage identification:
  • key: Unique storage identifier
  • set_key: Storage set/cluster identifier
readiness_state
string
required
Deployment readiness: limited, partial, complete, experimental, or deprecate.
schema
object
required
Table schema definition:
  • columns: Array of column definitions
  • local_table_name: Local table name in ClickHouse
  • dist_table_name: Distributed table name
  • partition_format: Optional partition format
  • not_deleted_mandatory_condition: Column for soft deletion
query_processors
array
Array of query processor configurations for storage-level optimizations.
allocation_policies
array
Resource allocation and rate limiting policies.
mandatory_condition_checkers
array
Security checks to enforce required query conditions.
deletion_settings
object
Configuration for deletion operations.
required_time_column
string
Name of the required time column for time-based queries.

Writable Storage Schema

Writable storage includes all readable storage fields plus:
kind
string
required
Must be writable_storage.
stream_loader
object
required
Kafka consumer configuration:
  • processor: Message processor class name
  • default_topic: Primary Kafka topic
  • commit_log_topic: Commit log topic
  • subscription_scheduled_topic: Subscription scheduling topic
  • subscription_result_topic: Subscription results topic
  • subscription_scheduler_mode: Scheduler mode (partition/global)
  • replacement_topic: Replacements/deletions topic
  • dlq_topic: Dead letter queue topic
  • pre_filter: Optional message filter
replacer_processor
object
Configuration for handling replacements (updates/deletions).
writer_options
object
ClickHouse writer-specific options.

Readable Storage Example

errors_ro.yaml
version: v1
kind: readable_storage
name: errors_ro

storage:
  key: errors_ro
  set_key: events_ro

readiness_state: complete

schema:
  columns:
    - name: project_id
      type: UInt
      args:
        size: 64
    - name: timestamp
      type: DateTime
    - name: event_id
      type: UUID
    - name: message
      type: String
    - name: environment
      type: String
      args:
        schema_modifiers: [nullable]
    - name: tags
      type: Nested
      args:
        subcolumns:
          - name: key
            type: String
          - name: value
            type: String
    - name: deleted
      type: UInt
      args:
        size: 8
        
  local_table_name: errors_local
  dist_table_name: errors_dist_ro
  not_deleted_mandatory_condition: deleted

allocation_policies:
  - name: ReferrerGuardRailPolicy
    args:
      required_tenant_types:
        - referrer
      default_config_overrides:
        is_enforced: 1
        
  - name: ConcurrentRateLimitAllocationPolicy
    args:
      required_tenant_types:
        - organization_id
        - referrer
        - project_id

query_processors:
  - processor: UUIDColumnProcessor
    args:
      columns: [event_id, trace_id]
      
  - processor: PrewhereProcessor
    args:
      prewhere_candidates:
        - event_id
        - project_id
        - timestamp

mandatory_condition_checkers:
  - condition: ProjectIdEnforcer

Writable Storage Example

errors.yaml
version: v1
kind: writable_storage
name: errors

storage:
  key: errors
  set_key: events

readiness_state: complete

schema:
  columns:
    - name: project_id
      type: UInt
      args:
        size: 64
    - name: timestamp
      type: DateTime
    - name: event_id
      type: UUID
    - name: message
      type: String
    - name: deleted
      type: UInt
      args:
        size: 8
        
  local_table_name: errors_local
  dist_table_name: errors_dist
  
  partition_format:
    - retention_days
    - date
    
  not_deleted_mandatory_condition: deleted

stream_loader:
  processor: ErrorsProcessor
  default_topic: events
  commit_log_topic: snuba-commit-log
  replacement_topic: event-replacements
  subscription_scheduler_mode: partition
  subscription_scheduled_topic: scheduled-subscriptions-events
  subscription_result_topic: events-subscription-results
  subscription_synchronization_timestamp: received_p99
  subscription_delay_seconds: 30

query_processors:
  - processor: UniqInSelectAndHavingProcessor
  
  - processor: MappingColumnPromoter
    args:
      mapping_specs:
        tags:
          environment: environment
          sentry:release: release
          
  - processor: UUIDColumnProcessor
    args:
      columns: [event_id, primary_hash, trace_id]
      
  - processor: MappingOptimizer
    args:
      column_name: tags
      hash_map_name: _tags_hash_map
      killswitch: events_tags_hash_map_enabled
      
  - processor: PrewhereProcessor
    args:
      omit_if_final:
        - environment
        - release
      prewhere_candidates:
        - event_id
        - project_id
        - timestamp

allocation_policies:
  - name: ReferrerGuardRailPolicy
    args:
      required_tenant_types:
        - referrer
      default_config_overrides:
        is_enforced: 1
        
  - name: BytesScannedRejectingPolicy
    args:
      required_tenant_types:
        - organization_id
        - project_id
        - referrer
      default_config_overrides:
        is_enforced: 1

mandatory_condition_checkers:
  - condition: ProjectIdEnforcer

replacer_processor:
  processor: ErrorsReplacer
  args:
    required_columns:
      - event_id
      - project_id
      - group_id
      - timestamp
    state_name: errors
    storage_key_str: errors

Schema Configuration

Column Definitions

Columns must specify name, type, and optional arguments:
columns:
  - name: project_id
    type: UInt
    args:
      size: 64
      
  - name: timestamp
    type: DateTime
    
  - name: event_id
    type: UUID

Table Names

local_table_name
string
required
Name of the local table on each ClickHouse node. For single-node deployments, this is the only table.
dist_table_name
string
required
Name of the distributed table that queries should use. In distributed ClickHouse, this table routes queries to local tables.

Partition Format

Define how data is partitioned:
Partition Configuration
schema:
  partition_format:
    - retention_days  # Partition by retention policy
    - date            # Partition by date

Soft Deletion

Configure soft deletion with a flag column:
Soft Deletion
schema:
  columns:
    - name: deleted
      type: UInt
      args:
        size: 8
        
  not_deleted_mandatory_condition: deleted
This ensures all queries automatically filter out deleted rows.

Readiness States

Control where storages are available:
Only available in CI and local development. Use for new storages under development.
Available in staging environments. Use for testing before production.
Fully available in all environments including production.
Available but marked as experimental. May have stability issues.
Marked for deprecation. Will be removed in a future release.
Readiness State
readiness_state: complete  # Production-ready

Stream Loader Configuration

For writable storages, configure Kafka consumers:
Stream Loader
stream_loader:
  processor: ErrorsProcessor
  default_topic: events
  commit_log_topic: snuba-commit-log
  replacement_topic: event-replacements
  
  # Subscription configuration
  subscription_scheduler_mode: partition
  subscription_scheduled_topic: scheduled-subscriptions-events
  subscription_result_topic: events-subscription-results
  subscription_synchronization_timestamp: received_p99
  subscription_delay_seconds: 30
  
  # Dead letter queue
  dlq_topic: snuba-dead-letter-queue
  
  # Pre-filtering
  pre_filter:
    type: KafkaHeaderSelectFilter
    args:
      header_key: event_type
      header_value: error

Processor Types

Common message processors:
  • ErrorsProcessor - Processes error events
  • TransactionsProcessor - Processes transaction events
  • OutcomesProcessor - Processes outcomes data
  • MetricsProcessor - Processes metrics data

Subscription Scheduler Modes

  • partition - Schedule per Kafka partition
  • global - Global scheduling across all partitions

Synchronization Timestamps

  • orig_message_ts - Use original message timestamp
  • received_p99 - Use 99th percentile of received time

Query Processors

Storage-level query processors optimize queries:
query_processors:
  - processor: UUIDColumnProcessor
    args:
      columns:
        - event_id
        - trace_id
        - replay_id

Common Query Processors

Optimizes uniq() functions in SELECT and HAVING clauses.
Promotes frequently-used mapping columns to first-class columns.
Optimizes UUID column queries.
Implements ClickHouse PREWHERE optimization for faster queries.
Optimizes queries on nested/mapping columns using hash maps.

Allocation Policies

Control resource allocation and rate limiting:
Allocation Policies
allocation_policies:
  - name: ReferrerGuardRailPolicy
    args:
      required_tenant_types:
        - referrer
      default_config_overrides:
        is_enforced: 1
        
  - name: ConcurrentRateLimitAllocationPolicy
    args:
      required_tenant_types:
        - organization_id
        - referrer
        - project_id
        
  - name: BytesScannedRejectingPolicy
    args:
      required_tenant_types:
        - organization_id
        - project_id
        - referrer
      default_config_overrides:
        is_enforced: 1
        
  - name: BytesScannedWindowAllocationPolicy
    args:
      required_tenant_types:
        - organization_id
        - referrer
      default_config_overrides:
        throttled_thread_number: 1
        org_limit_bytes_scanned: 10000000

Policy Types

Enforces that queries include a referrer for tracking and debugging.
Limits concurrent queries per tenant (organization, project, referrer).
Rejects queries that would scan too many bytes.
Throttles queries based on bytes scanned within a time window.
Special limits for cross-organization queries.

Mandatory Condition Checkers

Enforce security requirements:
Mandatory Conditions
mandatory_condition_checkers:
  - condition: ProjectIdEnforcer
  - condition: OrgIdEnforcer
These ensure queries always filter by required columns like project_id to prevent data leakage.

Deletion Settings

Configure deletion operations:
Deletion Configuration
deletion_settings:
  is_enabled: 1
  tables:
    - errors_local
  allowed_columns:
    - project_id
    - event_id
    - timestamp
  max_rows_to_delete: 10000
  bulk_delete_only: false
  partition_column: timestamp

Replacer Processor

Handle updates and deletions:
Replacer Configuration
replacer_processor:
  processor: ErrorsReplacer
  args:
    required_columns:
      - event_id
      - primary_hash
      - project_id
      - group_id
      - timestamp
      - deleted
    tag_column_map:
      tags:
        environment: environment
        sentry:release: release
    state_name: errors
    storage_key_str: errors

Best Practices

Security First

Always configure mandatory condition checkers to enforce multi-tenancy.

Resource Limits

Set up allocation policies to prevent resource exhaustion.

Optimize Queries

Use PREWHERE and mapping optimizers for better performance.

Partition Wisely

Choose partition formats that align with query patterns.

Entities

Connect storages to entities

Datasets

Organize entities in datasets

Overview

Configuration system overview

Build docs developers (and LLMs) love