Best practices for designing schemas when migrating to CockroachDB for optimal performance and scalability.
When migrating to CockroachDB, proper schema design is critical for achieving optimal performance and scalability. This guide covers best practices for tables, primary keys, indexes, and data types.
Primary keys are the foundation of your schema design in CockroachDB. They determine how data is distributed across the cluster and significantly impact query performance.
Sequential primary keys (like auto-incrementing integers or timestamps) can create hotspots that severely degrade performance in distributed databases.
Problematic patterns:
-- ❌ Avoid: Auto-incrementing integerCREATE TABLE orders ( id SERIAL PRIMARY KEY, customer_id INT, order_date TIMESTAMPTZ);-- ❌ Avoid: Timestamp as primary keyCREATE TABLE events ( event_time TIMESTAMPTZ PRIMARY KEY, event_type STRING, data JSONB);
Why this is problematic:
All writes go to a single range (the “tail” of the sequential key space)
Creates a hotspot on a single node
Limits write throughput to a single node’s capacity
Composite primary keys are often the best choice for distributed databases. They allow you to partition data by a high-cardinality column while maintaining ordering within partitions.
Always set size limits on variable-length columns, especially those that are indexed. Values exceeding 1MB can cause write amplification and performance issues.
-- ❌ Avoid: Unbounded stringCREATE TABLE users ( id UUID PRIMARY KEY, email STRING);-- ✅ Recommended: Bounded stringCREATE TABLE users ( id UUID PRIMARY KEY, email STRING(255), bio STRING(2000));
Create indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses:
CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email STRING(255) UNIQUE, last_name STRING(100), first_name STRING(100), created_at TIMESTAMPTZ DEFAULT now());-- Index for email lookupsCREATE UNIQUE INDEX users_email_idx ON users(email);-- Composite index for name searchesCREATE INDEX users_name_idx ON users(last_name, first_name);
-- Index only active ordersCREATE INDEX active_orders ON orders(created_at) WHERE status = 'active';-- Index only recent eventsCREATE INDEX recent_events ON events(event_type) WHERE event_time > now() - INTERVAL '30 days';
CREATE TABLE products ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name STRING, attributes JSONB, tags STRING[]);-- Inverted index for JSONCREATE INVERTED INDEX product_attributes ON products(attributes);-- Inverted index for arraysCREATE INVERTED INDEX product_tags ON products(tags);
Enables efficient queries:
-- JSON queriesSELECT * FROM products WHERE attributes @> '{"color": "red"}';-- Array queriesSELECT * FROM products WHERE 'electronics' = ANY(tags);
When migrating from databases with auto-incrementing primary keys:
-- Before (PostgreSQL)CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255));-- After (CockroachDB)CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email STRING(255));
CockroachDB INT is 64-bit by default, while PostgreSQL and MySQL default to 32-bit integers.
-- PostgreSQL/MySQL INT → CockroachDB-- Option 1: Keep as INT8 (64-bit)CREATE TABLE products ( id INT8, -- or just INT quantity INT8);-- Option 2: Use INT4 for 32-bit compatibilityCREATE TABLE products ( id INT4, quantity INT4);
-- Check for tables without explicit primary keysSELECT table_schema, table_nameFROM information_schema.tablesWHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'crdb_internal') AND table_name NOT IN ( SELECT table_name FROM information_schema.table_constraints WHERE constraint_type = 'PRIMARY KEY' );-- Review index usageSHOW INDEXES FROM database_name;-- Check for large column valuesSELECT table_name, column_name, data_type, character_maximum_lengthFROM information_schema.columnsWHERE character_maximum_length IS NULL AND data_type IN ('STRING', 'BYTES');