This feature is under active development and is subject to change. The current implementation supports the architecture described here, but additional features are still being added.
Dataset slicing enables Snuba to partition data across multiple physical resources (ClickHouse clusters, Kafka clusters, Redis instances) while maintaining the same logical schema. This provides isolation, scalability, and performance benefits for multi-tenant deployments.
Slicing Overview
Slicing maps logical partitions to physical slices :
Key Concepts :
Logical Partition : Identifier for a tenant (e.g., organization_id)
Physical Slice : Actual infrastructure resources (ClickHouse cluster, Kafka topics)
Slice ID : Numeric identifier for a physical slice (0, 1, 2, …)
Why Slicing?
Slicing provides several benefits:
Tenant Isolation Separate infrastructure prevents noisy neighbors
Horizontal Scaling Add slices to scale capacity independently
Performance Isolation High-volume tenants don’t impact others
Operational Flexibility Migrate, upgrade, or debug individual slices
Architecture Components
Logical Partition Mapping
Maps organization IDs to slice IDs:
# In settings.py
LOGICAL_PARTITION_MAPPING = {
StorageSetKey. METRICS : {
1 : 0 , # Org 1 -> Slice 0
2 : 0 , # Org 2 -> Slice 0
3 : 1 , # Org 3 -> Slice 1
4 : 1 , # Org 4 -> Slice 1
5 : 2 , # Org 5 -> Slice 2
# ...
}
}
Every logical partition must be assigned to a slice. Valid slice IDs are in range [0, slice_count).
Sliced Storage Sets
Defines which storage sets support slicing:
# In settings.py
SLICED_STORAGE_SETS = {
StorageSetKey. METRICS : 3 , # 3 slices (0, 1, 2)
StorageSetKey. GENERIC_METRICS : 2 , # 2 slices (0, 1)
}
Sliced ClickHouse Clusters
Maps (storage_set, slice_id) pairs to ClickHouse clusters:
# In settings.py
SLICED_CLUSTERS = [
{
"host" : "clickhouse-slice-0" ,
"port" : 9000 ,
"http_port" : 8123 ,
"database" : "default" ,
"storage_set_slices" : [
( "metrics" , 0 ),
( "generic_metrics" , 0 ),
],
"single_node" : False ,
"cluster_name" : "metrics_slice_0" ,
},
{
"host" : "clickhouse-slice-1" ,
"port" : 9000 ,
"http_port" : 8123 ,
"database" : "default" ,
"storage_set_slices" : [
( "metrics" , 1 ),
( "generic_metrics" , 1 ),
],
"single_node" : False ,
"cluster_name" : "metrics_slice_1" ,
},
]
Sliced Kafka Topics
Maps (logical_topic, slice_id) to physical topics:
# In settings.py
SLICED_KAFKA_TOPIC_MAP = {
( "snuba-generic-metrics" , 0 ): "snuba-generic-metrics-0" ,
( "snuba-generic-metrics" , 1 ): "snuba-generic-metrics-1" ,
( "snuba-generic-metrics" , 2 ): "snuba-generic-metrics-2" ,
}
SLICED_KAFKA_BROKER_CONFIG = {
( "snuba-generic-metrics" , 0 ): {
"bootstrap.servers" : "kafka-slice-0:9092" ,
},
( "snuba-generic-metrics" , 1 ): {
"bootstrap.servers" : "kafka-slice-1:9092" ,
},
( "snuba-generic-metrics" , 2 ): {
"bootstrap.servers" : "kafka-slice-2:9092" ,
},
}
Cluster Resolution
Snuba resolves clusters based on context:
# From snuba/clusters/cluster.py
def get_cluster (
storage_set_key : StorageSetKey,
slice_id : Optional[ int ] = None ,
) -> ClickhouseCluster:
"""
Return a ClickHouse cluster for a storage set.
- If slice_id provided: Look up in SLICED_CLUSTERS
- If slice_id is None: Look up in CLUSTERS (non-sliced)
"""
if slice_id is not None :
# Sliced storage set
sliced_map = _get_sliced_storage_set_cluster_map()
cluster = sliced_map.get((storage_set_key, slice_id))
if cluster is None :
raise UndefinedClickhouseCluster(
f " { (storage_set_key, slice_id) } not defined in SLICED_CLUSTERS"
)
else :
# Non-sliced storage set
cluster_map = _get_storage_set_cluster_map()
cluster = cluster_map.get(storage_set_key)
if cluster is None :
raise UndefinedClickhouseCluster(
f " { storage_set_key } not defined in CLUSTERS"
)
return cluster
Storage Configuration
Storages declare partition key for slicing:
# From storage YAML configuration
version : v1
kind : writable_storage
name : generic_metrics_distributions
storage :
key : generic_metrics_distributions
set_key : generic_metrics
schema :
columns :
- { name : org_id , type : UInt , args : { size : 64 } }
- { name : project_id , type : UInt , args : { size : 64 } }
- { name : metric_id , type : UInt , args : { size : 64 } }
# Partition key defines slicing dimension
partition_key_column_name : org_id
Partition Key Column :
Used to calculate logical partition
Must be present in all queries
Determines slice routing
Query Routing
When executing queries, Snuba routes to appropriate slice:
def route_query_to_slice ( query : Query, storage : Storage) -> int :
"""
Determine which slice to query based on partition key
"""
# Extract partition key value (e.g., org_id)
partition_key = storage.get_schema().get_partition_key_column_name()
partition_value = extract_partition_value(query, partition_key)
# Look up slice for this partition
storage_set_key = storage.get_storage_set_key()
logical_partition = calculate_logical_partition(partition_value)
slice_id = LOGICAL_PARTITION_MAPPING [storage_set_key][logical_partition]
return slice_id
Multi-Slice Queries
Queries spanning multiple organizations require mega-cluster:
-- Query: Multiple organizations
SELECT project_id, count ()
FROM generic_metrics_distributions_dist
WHERE org_id IN ( 1 , 2 , 3 ) -- May span multiple slices
GROUP BY project_id
Mega-Cluster : Special ClickHouse cluster that can query across all slices
Multi-slice queries are expensive. Design data model and queries to operate within single slice when possible.
Consumer Configuration
Consumers must specify slice ID:
# Start consumer for slice 0
snuba consumer \
--storage=generic_metrics_distributions \
--consumer-group=snuba-consumers \
--slice-id=0
# Start consumer for slice 1
snuba consumer \
--storage=generic_metrics_distributions \
--consumer-group=snuba-consumers \
--slice-id=1
Consumer behavior :
Reads from sliced Kafka topic for specified slice
Writes to ClickHouse cluster for specified slice
Produces to sliced commit log topic
Consumer Slicing Architecture
# From consumer code
def build_batch_writer (
table_writer : TableWriter,
slice_id : Optional[ int ] = None ,
) -> ProcessedMessageBatchWriter:
# Get cluster for this slice
cluster = storage.get_cluster( slice_id = slice_id)
# Create writer targeting sliced cluster
writer = table_writer.get_batch_writer(
metrics,
slice_id = slice_id, # Routes to correct cluster
)
return ProcessedMessageBatchWriter(writer)
Repartitioning
Moving organizations between slices:
Step 1: Update Mapping
# Before: Org 100 on Slice 0
LOGICAL_PARTITION_MAPPING = {
StorageSetKey. METRICS : {
100 : 0 , # Old mapping
}
}
# After: Org 100 on Slice 1
LOGICAL_PARTITION_MAPPING = {
StorageSetKey. METRICS : {
100 : 1 , # New mapping
}
}
Step 2: Dual Writing
New data goes to new slice, old data remains on old slice.
Step 3: Mega-Cluster Queries
Queries span both slices during transition:
# Query execution with partial data
def execute_with_repartition (
query : Query,
org_id : int ,
storage_set : StorageSetKey,
) -> QueryResult:
# Data may exist on multiple slices
old_slice = get_historical_slice(org_id, storage_set)
new_slice = get_current_slice(org_id, storage_set)
if old_slice != new_slice:
# Query both slices via mega-cluster
return query_mega_cluster(query, [old_slice, new_slice])
else :
# Query single slice
return query_single_slice(query, new_slice)
Step 4: Data Migration (Optional)
Backfill historical data to new slice if needed.
Repartitioning is complex. Mega-cluster queries ensure correctness during transition but have performance impact.
Sliced Topic Types
Multiple topic types can be sliced:
# From stream loader configuration
stream_loader:
processor: GenericMetricsDistributionsProcessor
# Main data topic (sliced)
default_topic: snuba - generic - metrics
# Replacements topic (sliced)
replacement_topic: generic - metrics - replacements
# Commit log topic (sliced)
commit_log_topic: snuba - commit - log
# Subscription scheduler (sliced)
subscription_scheduled_topic: scheduled - subscriptions - generic - metrics
# Subscription results (NOT sliced - limitation)
subscription_result_topic: generic - metrics - subscription - results
Subscription result topics cannot currently be sliced. This is a known limitation.
Configuration Validation
Snuba validates slicing configuration on startup:
def validate_slicing_config ():
"""Ensure slicing configuration is consistent"""
for storage_set, slice_count in SLICED_STORAGE_SETS .items():
# Check logical partition mapping covers all slices
mapping = LOGICAL_PARTITION_MAPPING .get(storage_set, {})
mapped_slices = set (mapping.values())
expected_slices = set ( range (slice_count))
if mapped_slices != expected_slices:
raise ValueError (
f "Storage set { storage_set } has slices { expected_slices } "
f "but mapping only covers { mapped_slices } "
)
# Check all slices have cluster configuration
for slice_id in range (slice_count):
try :
get_cluster(storage_set, slice_id)
except UndefinedClickhouseCluster:
raise ValueError (
f "No cluster defined for ( { storage_set } , { slice_id } )"
)
Monitoring Sliced Deployments
Key metrics per slice:
# Consumer lag per slice
metrics.gauge(
"consumer_lag" ,
lag,
tags = { "slice_id" : slice_id, "storage_set" : storage_set}
)
# Query latency per slice
metrics.timing(
"query_duration_ms" ,
duration,
tags = { "slice_id" : slice_id}
)
# Storage size per slice
metrics.gauge(
"table_size_bytes" ,
size,
tags = { "slice_id" : slice_id, "table" : table_name}
)
Best Practices
Slice Assignment
Balance load : Distribute high-volume organizations across slices
Consider growth : Leave room for new organizations
Group related orgs : Co-locate related tenants if beneficial
Operational Considerations
Start with conservative slice count : Over-slicing increases complexity
Monitor per-slice metrics : Identify imbalanced slices early
Plan repartitioning : Have process for rebalancing slices
Test mega-cluster : Ensure multi-slice queries work before repartitioning
Query Design
Include partition key : Always filter by org_id or equivalent
Avoid cross-slice queries : Design data model to avoid when possible
Use single-org queries : Most performant query pattern
Future Development
The following features are planned but not yet implemented:
Automatic rebalancing : Dynamic slice assignment based on load
Sliced subscription results : Support for sliced result topics
Slice-aware subscriptions : Subscription scheduler per slice
Simplified configuration : Auto-generate slicing topology
Storage - ClickHouse cluster configuration
Ingestion - Consumer configuration for sliced topics
Data Model - Storage sets and partition keys