For common use cases like adding columns, Snuba can auto-generate migrations from storage configuration changes.
1
Locate the storage configuration
Find the relevant storage.yaml file for your table:
# Storage configurations are located in:snuba/datasets/configuration/<dataset>/storages/<storage>.yaml# Example:snuba/datasets/configuration/group_attributes/storages/group_attributes.yaml
2
Modify the storage schema
Edit the schema.columns section to add your new column:
This creates a new migration file with the appropriate operations.
4
Review and commit
Review the generated migration file, test it, and commit:
# Test the migrationsnuba migrations run --group <group> --migration-id <id> --dry-run# Commit if everything looks goodgit add snuba/snuba_migrations/<group>/<migration_id>.pygit commit -m "Add new_column to group_attributes"
Auto-generation currently supports adding columns to existing tables. For other operations, you’ll need to write a custom migration.
Migrations must follow a specific naming convention:
# Format: <4-digit-number>_<descriptive_name>.py# Number must be sequential within the group# Find the next migration numberls snuba/snuba_migrations/<group>/ | grep -E '^[0-9]{4}' | tail -1# Create new migration filetouch snuba/snuba_migrations/<group>/0042_add_user_agent_column.py
import loggingfrom typing import Sequencefrom snuba.clusters.cluster import ClickhouseClientSettings, get_clusterfrom snuba.clusters.storage_sets import StorageSetKeyfrom snuba.migrations import migration, operationsdef migrate_user_data(logger: logging.Logger) -> None: """Migrate user data from old format to new format.""" cluster = get_cluster(StorageSetKey.EVENTS) connection = cluster.get_query_connection( ClickhouseClientSettings.MIGRATE ) # Perform data migration logger.info("Starting user data migration") connection.execute( "ALTER TABLE errors_local UPDATE user_name = ... WHERE ..." ) logger.info("User data migration complete")class Migration(migration.CodeMigration): blocking = True # Data migrations should be blocking def forwards_global(self) -> Sequence[operations.GenericOperation]: return [ operations.RunPython( func=migrate_user_data, description="Migrate user data to new format", ), ] def backwards_global(self) -> Sequence[operations.GenericOperation]: # Usually cannot reverse data migrations return []
def backwards_ops(self) -> Sequence[operations.SqlOperation]: return [ # 1. Drop from distributed tables FIRST operations.DropColumn( table_name="events_dist", target=OperationTarget.DISTRIBUTED, # ... ), # 2. Then drop from local tables operations.DropColumn( table_name="events_local", target=OperationTarget.LOCAL, # ... ), ]
Drop from distributed tables first, then local. Dropping local columns while the distributed table references them causes errors.
Local op: ALTER TABLE errors_local ADD COLUMN IF NOT EXISTS user_agent String AFTER user_email;Dist op: ALTER TABLE errors_dist ADD COLUMN IF NOT EXISTS user_agent String AFTER user_email;
Always test migrations in development before production:
# Test with dry-runsnuba migrations run --group <group> --migration-id <id> --dry-run# Execute in local environmentsnuba migrations run --group <group> --migration-id <id> --force
2
Write reversible migrations
Always provide backward operations that restore the original state:
def backwards_ops(self) -> Sequence[operations.SqlOperation]: # Don't just return empty list return [ operations.DropColumn( table_name="events_local", column_name="user_agent", target=OperationTarget.LOCAL, ), ]
3
Set blocking flag appropriately
Mark migrations as blocking if they:
Migrate large amounts of data
Require significant processing time
Need consumers to be stopped
class Migration(migration.ClickhouseNodeMigration): blocking = True # Requires --force
4
Update storage schemas
After adding a migration, update the corresponding storage schema file to match. Tests will fail if schemas are inconsistent.
# BAD - target defaults to UNSEToperations.AddColumn( storage_set=StorageSetKey.EVENTS, table_name="events_local", column=Column("field", String()),)# GOOD - explicitly set targetoperations.AddColumn( storage_set=StorageSetKey.EVENTS, table_name="events_local", column=Column("field", String()), target=OperationTarget.LOCAL, # Explicit target)
Wrong operation order
# BAD - distributed before localdef forwards_ops(self): return [ operations.AddColumn(target=OperationTarget.DISTRIBUTED, ...), operations.AddColumn(target=OperationTarget.LOCAL, ...), ]# GOOD - local before distributeddef forwards_ops(self): return [ operations.AddColumn(target=OperationTarget.LOCAL, ...), operations.AddColumn(target=OperationTarget.DISTRIBUTED, ...), ]
Dropping primary key columnsYou cannot drop columns that are part of the primary key or sorting key:
# This will fail if project_id is in ORDER BYoperations.DropColumn( table_name="events_local", column_name="project_id", target=OperationTarget.LOCAL,)