Skip to main content
The events dataset stores error and exception data from Sentry, including stack traces, exception details, user context, and custom tags. It’s the foundational dataset that powers Sentry’s error monitoring features.

Overview

The events dataset is designed for querying error events with rich contextual information. Each event represents a single error occurrence captured by Sentry’s SDKs.

Key Characteristics

  • Storage: Primary storage in errors table with read-only replica errors_ro
  • Entity: Single events entity
  • Partitioning: By retention_days and date
  • Primary Use Cases: Error search, issue aggregation, debugging workflows

Entity: events

The events entity provides the query interface for error data.

Core Columns

Identification

event_id: UUID              # Unique identifier for the event
project_id: UInt64          # Required for all queries
group_id: UInt64            # Issue/group this event belongs to
primary_hash: UUID          # Hash used for grouping

Timestamps

timestamp: DateTime         # When the event occurred (required)
received: DateTime          # When Snuba received the event
message_timestamp: DateTime # Kafka message timestamp
timestamp is the required time column for all time-based queries. It represents when the error actually occurred.

Error Details

message: String            # Error message
title: String              # Event title
culprit: String            # Function/location that caused the error
level: String              # Error severity (fatal, error, warning, info, debug)
type: String               # Event type identifier
location: String           # Source location

User Context

user: String               # User identifier (promoted from tags)
user_id: String            # User ID
user_name: String          # Username
user_email: String         # User email
user_hash: UInt64          # Hash of user identifier (readonly)

HTTP Context

http_method: String        # HTTP request method
http_referer: String       # HTTP referer header
ip_address_v4: IPv4        # Client IPv4 address
ip_address_v6: IPv6        # Client IPv6 address

SDK Information

platform: String           # Platform (python, javascript, etc.)
sdk_name: String           # SDK name (sentry.python, etc.)
sdk_version: String        # SDK version
sdk_integrations: Array    # Enabled SDK integrations

Release Context

release: String            # Release version (promoted from tags)
environment: String        # Environment name (promoted from tags)
dist: String               # Distribution identifier (promoted from tags)
version: String            # Additional version field

Nested Structures

Tags

tags: Nested(
  key: String,
  value: String
)
_tags_hash_map: Array(UInt64)  # Optimization for tag lookups
Tags store custom key-value metadata. Some tags are “promoted” to top-level columns:
  • sentry:releaserelease
  • sentry:distdist
  • sentry:useruser
  • environmentenvironment
  • levellevel

Contexts

contexts: Nested(
  key: String,
  value: String
)
Contexts store structured data like:
  • geo.country_code, geo.region, geo.city - Geographic information
  • trace.trace_id, trace.span_id - Distributed tracing data
  • Custom context data

Exception Stacks

exception_stacks: Nested(
  type: String,
  value: String,
  mechanism_type: String,
  mechanism_handled: UInt8
)
Stores exception information including type, message, and handling mechanism.

Exception Frames

exception_frames: Nested(
  abs_path: String,
  filename: String,
  function: String,
  lineno: UInt32,
  colno: UInt32,
  in_app: UInt8,
  package: String,
  module: String,
  stack_level: UInt16
)
Stack trace frames with source location and context.

Modules

modules: Nested(
  name: String,
  version: String
)
Installed packages/modules at the time of the error.

Tracing & Replay Integration

trace_id: UUID             # Distributed trace ID
span_id: UInt64            # Span ID within trace
trace_sampled: UInt8       # Whether trace was sampled
replay_id: UUID            # Associated session replay
transaction_name: String   # Associated transaction name
transaction_hash: UInt64   # Hash of transaction name

Processing Metadata

partition: UInt16          # Kafka partition
offset: UInt64             # Kafka offset
retention_days: UInt16     # Data retention period
deleted: UInt8             # Soft delete flag
num_processing_errors: UInt64  # Errors during processing

Storage: errors

The primary writable storage for event data.

Table Structure

CREATE TABLE errors_local (
    project_id UInt64,
    timestamp DateTime,
    event_id UUID,
    -- ... other columns
) ENGINE = ReplacingMergeTree()
PARTITION BY (retention_days, toMonday(timestamp))
ORDER BY (project_id, toStartOfDay(timestamp), event_id)

Storage Configuration

storage:
  key: errors
  set_key: events
readiness_state: complete
local_table_name: errors_local
dist_table_name: errors_dist
partition_format:
  - retention_days
  - date
not_deleted_mandatory_condition: deleted

Replacer System

The errors storage uses a replacer processor to handle event updates and deletions:
replacer_processor:
  processor: ErrorsReplacer
  args:
    state_name: errors
    storage_key_str: errors
This enables:
  • Soft deletion of events
  • Group ID updates when issues are merged
  • Metadata corrections

Query Processors

Events storage applies multiple query processors for optimization:

UUID Processing

- processor: UUIDColumnProcessor
  args:
    columns: [event_id, primary_hash, trace_id, replay_id]
Converts UUID strings to binary format for efficient storage.

Mapping Optimization

- processor: MappingOptimizer
  args:
    column_name: tags
    hash_map_name: _tags_hash_map
    killswitch: events_tags_hash_map_enabled
Uses hash maps for fast tag lookups when enabled.

Prewhere Optimization

- processor: PrewhereProcessor
  args:
    prewhere_candidates:
      - event_id
      - trace_id
      - group_id
      - release
      - message
Moves selective filters to ClickHouse’s PREWHERE clause for faster execution.

Allocation Policies

The errors storage enforces resource limits:
allocation_policies:
  - name: ConcurrentRateLimitAllocationPolicy
    required_tenant_types:
      - organization_id
      - referrer
      - project_id
  - name: BytesScannedRejectingPolicy
    default_config_overrides:
      is_enforced: 1
  - name: CrossOrgQueryAllocationPolicy
    default_config_overrides:
      is_enforced: 1

Data Ingestion

Stream Loader

stream_loader:
  processor: ErrorsProcessor
  default_topic: events
  replacement_topic: event-replacements
  commit_log_topic: snuba-commit-log
  subscription_scheduler_mode: partition

Message Format

Events are ingested from Kafka in the following format:
[
  2,
  "insert",
  {
    "project_id": 1,
    "event_id": "abc123...",
    "data": {
      "timestamp": 1647532800.0,
      "message": "Division by zero",
      "exception": {
        "values": [...]
      },
      "tags": {
        "environment": "production",
        "level": "error"
      },
      "user": {
        "id": "user123",
        "email": "[email protected]"
      }
    }
  }
]

Example Queries

Find events by event ID

MATCH (events)
SELECT event_id, message, timestamp, group_id
WHERE project_id = 1
  AND event_id = 'abc123...'
  AND timestamp >= toDateTime('2024-01-01 00:00:00')
  AND timestamp < toDateTime('2024-01-02 00:00:00')

Group errors by level

MATCH (events)
SELECT level, count() as error_count
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01 00:00:00')
  AND timestamp < toDateTime('2024-01-02 00:00:00')
GROUP BY level
ORDER BY error_count DESC

Search by tag

MATCH (events)
SELECT event_id, message, tags[environment] as env
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01 00:00:00')
  AND timestamp < toDateTime('2024-01-02 00:00:00')
  AND tags[environment] = 'production'
LIMIT 100

Find events with specific exception type

MATCH (events)
SELECT event_id, message, exception_stacks.type
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01 00:00:00')
  AND timestamp < toDateTime('2024-01-02 00:00:00')
  AND arrayExists(x -> x = 'ZeroDivisionError', exception_stacks.type)
LIMIT 100

Aggregate by user

MATCH (events)
SELECT user_email, count() as event_count
WHERE project_id = 1
  AND timestamp >= toDateTime('2024-01-01 00:00:00')
  AND timestamp < toDateTime('2024-01-02 00:00:00')
  AND user_email != ''
GROUP BY user_email
ORDER BY event_count DESC
LIMIT 20

Join Relationships

The events entity supports joins with related datasets:
join_relationships:
  grouped:
    rhs_entity: groupedmessage
    join_type: inner
    columns:
      - [project_id, project_id]
      - [group_id, id]
  assigned:
    rhs_entity: groupassignee
    join_type: inner
    columns:
      - [project_id, project_id]
      - [group_id, group_id]
  attributes:
    rhs_entity: group_attributes
    join_type: left
    columns:
      - [project_id, project_id]
      - [group_id, group_id]

Subscriptions

The events entity supports subscriptions for real-time alerting:
subscription_validators:
  - validator: AggregationValidator
    args:
      max_allowed_aggregations: 10
      disallowed_aggregations:
        - having
        - orderby
      required_time_column: timestamp
      allows_group_by_without_condition: true

Performance Considerations

The project_id filter is mandatory and critical for query performance. It’s enforced by the EntityRequiredColumnValidator.
Limit timestamp ranges to avoid scanning excessive partitions. Most queries should be < 90 days.
Use promoted columns (release, environment, user) instead of tag subscripts when possible for better performance.
When looking up specific events, include event_id in the WHERE clause to enable prewhere optimization.
Avoid grouping by high-cardinality columns like event_id or user_email without additional filters.

Transactions Dataset

Performance data linked via trace_id

Replays Dataset

Session replays linked via replay_id

Query Optimization

Learn about query optimization strategies

SnQL Reference

Full SnQL language reference

Build docs developers (and LLMs) love