Skip to main content
Entities represent logical data models in Snuba. They define schemas, connect to storage backends, and configure query processing, validation, and translation rules.

Overview

An entity configuration defines:
  • Schema: Column definitions and data types
  • Storage connections: Which physical storages back this entity
  • Storage selector: Logic to choose between multiple storages
  • Query processors: Query transformations and optimizations
  • Validators: Query validation and security rules
  • Translation mappers: Column and function name mappings
  • Subscription rules: Real-time subscription configuration

Schema Reference

version
string
required
Schema version. Must be v1.
kind
string
required
Component type. Must be entity.
name
string
required
Unique name for the entity.
schema
array
required
Array of column definitions. Each column specifies name, type, and optional args.
storages
array
required
Array of storage connections with optional translation mappers.
storage_selector
object
required
Configuration for the storage selector class.
query_processors
array
required
Array of query processor configurations.
validators
array
required
Array of validator configurations.
required_time_column
string
required
Name of the time column used for time-based queries.
translation_mappers
object
Translation rules for columns, functions, and subscriptables.
validate_data_model
string
Validation mode: do_nothing, warn, or error.
partition_key_column_name
string
Column name for partition-based query routing.
subscription_processors
array
Array of subscription processor configurations.
subscription_validators
array
Array of subscription validator configurations.
join_relationships
object
Definitions for entity joins in multi-entity queries.

Basic Example

A minimal entity configuration:
entity.yaml
version: v1
kind: entity
name: events

schema:
  - name: project_id
    type: UInt
    args:
      size: 64
  - name: timestamp
    type: DateTime
  - name: event_id
    type: UUID
  - name: message
    type: String

required_time_column: timestamp

storages:
  - storage: errors
    is_writable: true

storage_selector:
  selector: SimpleQueryStorageSelector
  args:
    storage: errors

query_processors: []

validators:
  - validator: EntityRequiredColumnValidator
    args:
      required_filter_columns:
        - project_id

Schema Definition

Column Types

Define columns with name, type, and optional arguments:
schema:
  - name: project_id
    type: UInt
    args:
      size: 64
      
  - name: timestamp
    type: DateTime
    
  - name: event_id
    type: UUID
    
  - name: message
    type: String

Schema Modifiers

Available schema modifiers:
  • nullable - Column can contain NULL values
  • readonly - Computed/derived column, not directly writable
  • lowcardinality - Optimized for columns with few distinct values

Storage Connections

Entities can connect to multiple storages:
Multiple Storages
storages:
  - storage: errors_ro
    is_writable: false
    translation_mappers:
      columns:
        - mapper: ColumnToColumn
          args:
            from_table_name: null
            from_col_name: username
            to_table_name: null
            to_col_name: user_name
      subscriptables:
        - mapper: SubscriptableMapper
          args:
            from_column_name: tags
            to_nested_col_name: tags
            value_subcolumn_name: value
            nullable: false
            
  - storage: errors
    is_writable: true
    translation_mappers:
      columns:
        - mapper: ColumnToColumn
          args:
            from_table_name: null
            from_col_name: username
            to_table_name: null
            to_col_name: user_name

Storage Selection

The storage selector determines which storage to query:
storage_selector:
  selector: SimpleQueryStorageSelector
  args:
    storage: errors

Translation Mappers

Translation mappers convert between entity columns and storage columns:

Column Mappers

translation_mappers:
  columns:
    - mapper: ColumnToColumn
      args:
        from_table_name: null
        from_col_name: transaction
        to_table_name: null
        to_col_name: transaction_name

Function Mappers

Function Mappers
translation_mappers:
  functions:
    - mapper: FunctionMapper
      args:
        from_name: count_unique
        to_name: uniq

Subscriptable Mappers

Handle subscript operations like tags[key]:
Subscriptable Mappers
translation_mappers:
  subscriptables:
    - mapper: SubscriptableMapper
      args:
        from_column_name: tags
        to_nested_col_name: tags
        value_subcolumn_name: value
        nullable: false
        
    - mapper: SubscriptableMapper
      args:
        from_column_name: contexts
        to_nested_col_name: contexts
        value_subcolumn_name: value
        nullable: false

Query Processors

Query processors transform queries before execution:
Query Processors
query_processors:
  - processor: TimeSeriesProcessor
    args:
      time_group_columns:
        time: timestamp
        rtime: received
      time_parse_columns:
        - timestamp
        - received
        
  - processor: BasicFunctionsProcessor
  
  - processor: HandledFunctionsProcessor
    args:
      column: exception_stacks.mechanism_handled

Common Query Processors

Handles time-based queries, granularity, and time grouping.
- processor: TimeSeriesProcessor
  args:
    time_group_columns:
      time: timestamp
      rtime: received
    time_parse_columns:
      - timestamp
      - received
Translates basic function calls to ClickHouse equivalents.
- processor: BasicFunctionsProcessor
Processes error handled/unhandled function calls.
- processor: HandledFunctionsProcessor
  args:
    column: exception_stacks.mechanism_handled

Validators

Validators enforce query constraints and security rules:
Validators
validators:
  - validator: EntityRequiredColumnValidator
    args:
      required_filter_columns:
        - project_id
        
  - validator: TagConditionValidator
    args: {}
    
  - validator: DatetimeConditionValidator
    args: {}

Common Validators

Ensures required columns are filtered in WHERE clause.
- validator: EntityRequiredColumnValidator
  args:
    required_filter_columns:
      - project_id
      - organization_id
Validates tag filter conditions.
- validator: TagConditionValidator
  args: {}
Ensures datetime ranges are reasonable.
- validator: DatetimeConditionValidator
  args: {}

Subscription Configuration

Configure real-time query subscriptions:
Subscription Config
subscription_validators:
  - validator: AggregationValidator
    args:
      max_allowed_aggregations: 10
      disallowed_aggregations:
        - having
        - orderby
      required_time_column: timestamp
      allows_group_by_without_condition: true
See Subscription Configuration for details.

Join Relationships

Define relationships for multi-entity queries:
Join Relationships
join_relationships:
  grouped:
    rhs_entity: groupedmessage
    join_type: inner
    columns:
      - [project_id, project_id]
      - [group_id, id]
      
  assigned:
    rhs_entity: groupassignee
    join_type: inner
    columns:
      - [project_id, project_id]
      - [group_id, group_id]
      
  attributes:
    rhs_entity: group_attributes
    join_type: left
    columns:
      - [project_id, project_id]
      - [group_id, group_id]

Join Types

  • inner - Inner join (only matching rows)
  • left - Left outer join (all rows from left side)

Complete Example

A full entity configuration for events:
events.yaml
version: v1
kind: entity
name: events

schema:
  - name: project_id
    type: UInt
    args:
      size: 64
  - name: timestamp
    type: DateTime
  - name: event_id
    type: UUID
  - name: message
    type: String
  - name: platform
    type: String
  - name: environment
    type: String
    args:
      schema_modifiers: [nullable]
  - name: tags
    type: Nested
    args:
      subcolumns:
        - name: key
          type: String
        - name: value
          type: String

required_time_column: timestamp

storages:
  - storage: errors
    is_writable: true
    translation_mappers:
      columns:
        - mapper: ColumnToMapping
          args:
            from_col_name: release
            to_nested_col_name: tags
            to_nested_mapping_key: sentry:release
            nullable: false
      subscriptables:
        - mapper: SubscriptableMapper
          args:
            from_column_name: tags
            to_nested_col_name: tags
            value_subcolumn_name: value
            nullable: false

storage_selector:
  selector: ErrorsQueryStorageSelector

query_processors:
  - processor: TimeSeriesProcessor
    args:
      time_group_columns:
        time: timestamp
      time_parse_columns:
        - timestamp
  - processor: BasicFunctionsProcessor

validators:
  - validator: EntityRequiredColumnValidator
    args:
      required_filter_columns:
        - project_id
  - validator: DatetimeConditionValidator
    args: {}

validate_data_model: error

subscription_validators:
  - validator: AggregationValidator
    args:
      max_allowed_aggregations: 10
      required_time_column: timestamp

Best Practices

Security First

Always include required column validators for multi-tenant data (project_id, organization_id).

Translation Consistency

Keep translation mappers consistent across all storages connected to an entity.

Optimize Queries

Use query processors to optimize common query patterns and improve performance.

Document Schema

Add YAML comments to explain complex schema structures and column purposes.

Datasets

Group entities into datasets

Storages

Configure storage backends

Subscriptions

Set up real-time subscriptions

Build docs developers (and LLMs) love