Skip to main content

SQL vs NoSQL

Choose SQL when data is relational, consistency is critical, and queries are ad-hoc. Choose NoSQL when access patterns are specific and predictable, horizontal write scalability is required, or data shape is inherently non-relational.

Decision Framework

// Use SQL (PostgreSQL) when:
→ Complex joins and ad-hoc queries
→ ACID multi-table transactions required
→ Schema is stable, well-understood

// Use NoSQL when:
→ Write throughput > single-node SQL capacity
→ Access pattern is key-value, range, or time-series
→ Data is documents, graphs, or IoT streams

SQL Databases

Strengths:
  • ACID transactions
  • Joins and foreign keys
  • Schema enforcement
  • Mature tooling and ecosystem
Examples: PostgreSQL, MySQL, SQL ServerUse for: Financial transactions, inventory, user accounts

NoSQL Databases

Strengths:
  • Horizontal write scalability
  • Flexible schema
  • Access-pattern optimized
  • High throughput
Types: Key-value (Redis), Document (MongoDB), Wide-column (Cassandra), Graph (Neo4j)Use for: Caching, sessions, logs, metrics, social graphs
Start with PostgreSQL — it handles relational, JSONB, full-text search, and time-series workloads natively. Add specialized stores only when PostgreSQL becomes the proven bottleneck.

NoSQL Database Types

Pattern: GET key → valueExamples: Redis, DynamoDB, MemcachedBest for:
  • Session storage
  • Cache layer
  • Rate limiting counters
  • Real-time leaderboards
# Redis examples
SET user:1001:session "jwt_token_here" EX 3600
GET user:1001:session
INCR rate_limit:user:1001
ZADD leaderboard 9500 "player123"

NewSQL: Best of Both Worlds

NewSQL databases (CockroachDB, Google Spanner, YugabyteDB) provide SQL semantics with NoSQL horizontal write scalability:
-- CockroachDB: distributed SQL with automatic sharding
CREATE TABLE orders (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL,
  total DECIMAL(10,2),
  created_at TIMESTAMP DEFAULT now()
);

-- Automatically distributed across nodes
-- ACID transactions across shards
-- SQL interface with joins
Consider NewSQL when you need SQL semantics (joins, transactions) but have outgrown single-node write capacity. Trade-off: increased latency for distributed consensus.

SQL Query Optimization

EXPLAIN ANALYZE

The primary tuning tool for understanding query execution:
-- Before optimization
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

-- Output shows:
Seq Scan on orders  (cost=0.00..45823.00 rows=1000000 width=120)
  Filter: (user_id = 42)
  Rows Removed by Filter: 999500
  Buffers: shared hit=8000 read=12000
Planning Time: 0.5ms
Execution Time: 2847.3ms
Sequential scan with high “Rows Removed by Filter” indicates a missing index. The database is scanning the entire table and discarding most rows.

Index Strategy

Default index type for most use cases:
-- Single-column index
CREATE INDEX idx_user_id ON orders(user_id);

-- Composite index (order matters!)
CREATE INDEX idx_user_status_date 
  ON orders(user_id, status, created_at DESC);

-- Covering index (includes extra columns)
CREATE INDEX idx_user_status_date_covering
  ON orders(user_id, status, created_at DESC)
  INCLUDE (total, item_count);
Column order rules:
  1. Equality filters first (user_id = ?)
  2. Range filters next (created_at > ?)
  3. ORDER BY last

Common Query Anti-Patterns

Problem: Loop issuing one query per row
// BAD: 1 + N queries
const orders = await db.all('SELECT * FROM orders');
for (const o of orders) {
  o.user = await db.get(`SELECT * FROM users WHERE id=${o.userId}`);
}

// GOOD: 1 query with JOIN
const orders = await db.all(`
  SELECT o.*, u.name, u.email
  FROM orders o
  JOIN users u ON o.user_id = u.id
`);
Detection: APM tools (Datadog, New Relic) show “span repetition” in traces
Problem: Fetching unused columns wastes I/O and network bandwidth
-- BAD: fetches 50 columns, uses 2
SELECT * FROM users WHERE id = 42;

-- GOOD: explicit column list
SELECT id, name, email FROM users WHERE id = 42;
Impact: Covering index possible only with explicit columns
Problem: Accidentally modifying all rows
-- DANGEROUS: updates every row!
UPDATE orders SET status = 'cancelled';
-- Missing: WHERE id = 123

-- Safe: always include WHERE
UPDATE orders SET status = 'cancelled' WHERE id = 123;
Enable sql_safe_updates in MySQL or use transactions with explicit commit to prevent accidental bulk modifications.
Problem: WHERE clause type mismatch prevents index usage
-- BAD: user_id is INT, comparing to STRING
SELECT * FROM orders WHERE user_id = '42';
-- Forces sequential scan!

-- GOOD: matching types
SELECT * FROM orders WHERE user_id = 42;

Database Maintenance

-- Vacuum and analyze (PostgreSQL)
-- Reclaim space and update query planner statistics
VACUUM ANALYZE orders;

-- Auto-vacuum configuration
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

-- Reindex (rebuild indexes to remove bloat)
REINDEX INDEX CONCURRENTLY idx_user_id;

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
Run VACUUM ANALYZE regularly on write-heavy tables. Stale statistics cause the query planner to choose suboptimal execution plans.

Connection Pooling

Database connections are expensive. Connection poolers sit between application and database:
Lightweight connection pooler for PostgreSQL:
# pgbouncer.ini
[databases]
mydb = host=postgres port=5432 dbname=production

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Pool settings
pool_mode = transaction        # or session, statement
max_client_conn = 10000       # app can have 10k connections
default_pool_size = 25        # but only 25 to PostgreSQL
reserve_pool_size = 5
reserve_pool_timeout = 3
Pool modes:
  • session: connection held for entire client session
  • transaction: connection returned after each transaction (recommended)
  • statement: connection returned after each statement (breaks transactions)

Polyglot Persistence

Use specialized databases for specific access patterns:
// Typical microservice data architecture
┌─────────────────┐
│  API Gateway    │
└────────┬────────┘

    ┌────┴─────┬─────────┬──────────┐
    │          │         │          │
┌───▼────┐ ┌──▼───┐ ┌──▼────┐ ┌───▼─────┐
│ Users  │ │Orders│ │Product│ │Analytics│
│Service │ │Service│ │Service│ │ Service │
└───┬────┘ └──┬───┘ └──┬────┘ └───┬─────┘
    │          │         │          │
┌───▼────┐ ┌──▼───┐ ┌──▼────┐ ┌───▼─────┐
│Postgres│ │Postgres│Elastic│ │ClickHouse│
│        │ │      │ │Search │ │ (OLAP)  │
└────────┘ └──────┘ └───────┘ └─────────┘
     │          │
     └────┬─────┘

     ┌────▼────┐
     │  Redis  │  (shared cache + sessions)
     └─────────┘

When to Add Specialized Stores

  • Elasticsearch: Full-text search, log aggregation, analytics
  • Redis: Caching, sessions, rate limiting, real-time leaderboards
  • ClickHouse: OLAP analytics, time-series aggregation
  • S3/Object Storage: Binary files, backups, data lake
  • Neo4j: Graph traversal, recommendations
Each additional database type adds operational complexity. Add specialized stores only when PostgreSQL + proper indexing is insufficient for that specific access pattern.

Next Steps

Caching

Reduce database load with multi-layer caching strategies

Scalability

Learn about sharding, replication, and horizontal scaling

Load Balancing

Distribute database reads across replicas effectively

Availability

High availability through replication and failover

Build docs developers (and LLMs) love