Choose SQL when data is relational, consistency is critical, and queries are ad-hoc. Choose NoSQL when access patterns are specific and predictable, horizontal write scalability is required, or data shape is inherently non-relational.
// Use SQL (PostgreSQL) when:→ Complex joins and ad-hoc queries→ ACID multi-table transactions required→ Schema is stable, well-understood// Use NoSQL when:→ Write throughput > single-node SQL capacity→ Access pattern is key-value, range, or time-series→ Data is documents, graphs, or IoT streams
Start with PostgreSQL — it handles relational, JSONB, full-text search, and time-series workloads natively. Add specialized stores only when PostgreSQL becomes the proven bottleneck.
NewSQL databases (CockroachDB, Google Spanner, YugabyteDB) provide SQL semantics with NoSQL horizontal write scalability:
-- CockroachDB: distributed SQL with automatic shardingCREATE TABLE orders ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL, total DECIMAL(10,2), created_at TIMESTAMP DEFAULT now());-- Automatically distributed across nodes-- ACID transactions across shards-- SQL interface with joins
Consider NewSQL when you need SQL semantics (joins, transactions) but have outgrown single-node write capacity. Trade-off: increased latency for distributed consensus.
-- Single-column indexCREATE INDEX idx_user_id ON orders(user_id);-- Composite index (order matters!)CREATE INDEX idx_user_status_date ON orders(user_id, status, created_at DESC);-- Covering index (includes extra columns)CREATE INDEX idx_user_status_date_covering ON orders(user_id, status, created_at DESC) INCLUDE (total, item_count);
Column order rules:
Equality filters first (user_id = ?)
Range filters next (created_at > ?)
ORDER BY last
Index only a subset of rows:
-- Only index active orders (90% are completed)CREATE INDEX idx_active_orders ON orders(user_id, created_at) WHERE status IN ('pending', 'processing');-- 10x smaller index, faster queries for active orders
Benefits:
Smaller index size
Faster writes (fewer index updates)
Better cache hit rate
Index on computed expressions:
-- Index for case-insensitive email lookupCREATE INDEX idx_email_lower ON users(LOWER(email));-- Query must use same expressionSELECT * FROM users WHERE LOWER(email) = '[email protected]';-- JSONB field indexCREATE INDEX idx_metadata_type ON products((metadata->>'type'));
PostgreSQL built-in full-text search:
-- Add tsvector columnALTER TABLE articles ADD COLUMN search_vector tsvector;-- Update trigger to maintain search vectorCREATE TRIGGER articles_search_update BEFORE INSERT OR UPDATE ON articles FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger( search_vector, 'pg_catalog.english', title, body );-- GIN index for full-text searchCREATE INDEX idx_articles_search ON articles USING GIN(search_vector);-- Query with rankingSELECT title, ts_rank(search_vector, query) as rankFROM articles, to_tsquery('english', 'postgresql & performance') queryWHERE search_vector @@ queryORDER BY rank DESC;
// BAD: 1 + N queriesconst orders = await db.all('SELECT * FROM orders');for (const o of orders) { o.user = await db.get(`SELECT * FROM users WHERE id=${o.userId}`);}// GOOD: 1 query with JOINconst orders = await db.all(` SELECT o.*, u.name, u.email FROM orders o JOIN users u ON o.user_id = u.id`);
Detection: APM tools (Datadog, New Relic) show “span repetition” in traces
SELECT *
Problem: Fetching unused columns wastes I/O and network bandwidth
-- BAD: fetches 50 columns, uses 2SELECT * FROM users WHERE id = 42;-- GOOD: explicit column listSELECT id, name, email FROM users WHERE id = 42;
Impact: Covering index possible only with explicit columns
Missing WHERE in UPDATE/DELETE
Problem: Accidentally modifying all rows
-- DANGEROUS: updates every row!UPDATE orders SET status = 'cancelled';-- Missing: WHERE id = 123-- Safe: always include WHEREUPDATE orders SET status = 'cancelled' WHERE id = 123;
Enable sql_safe_updates in MySQL or use transactions with explicit commit to prevent accidental bulk modifications.
Implicit Type Conversion
Problem: WHERE clause type mismatch prevents index usage
-- BAD: user_id is INT, comparing to STRINGSELECT * FROM orders WHERE user_id = '42';-- Forces sequential scan!-- GOOD: matching typesSELECT * FROM orders WHERE user_id = 42;
-- Vacuum and analyze (PostgreSQL)-- Reclaim space and update query planner statisticsVACUUM ANALYZE orders;-- Auto-vacuum configurationALTER TABLE orders SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02);-- Reindex (rebuild indexes to remove bloat)REINDEX INDEX CONCURRENTLY idx_user_id;-- Check index usageSELECT schemaname, tablename, indexname, idx_scan, idx_tup_readFROM pg_stat_user_indexesWHERE idx_scan = 0 AND indexname NOT LIKE 'pg_toast%'ORDER BY pg_relation_size(indexrelid) DESC;
Run VACUUM ANALYZE regularly on write-heavy tables. Stale statistics cause the query planner to choose suboptimal execution plans.
S3/Object Storage: Binary files, backups, data lake
Neo4j: Graph traversal, recommendations
Each additional database type adds operational complexity. Add specialized stores only when PostgreSQL + proper indexing is insufficient for that specific access pattern.