Skip to main content

Index Fundamentals

Why Use Indexes?

Indexes are data structures that help MySQL find data records quickly, similar to a book’s table of contents. Without Index: Full table scan required
  • Sequential reads from disk
  • Multiple I/O operations
  • Time complexity: O(n)
With Index: Direct location via B+Tree
  • Logarithmic search time
  • Reduced disk I/O
  • Time complexity: O(log n)
The primary goal of indexes is to reduce disk I/O operations and accelerate query speed.

Index Data Structures

B+Tree Structure

MySQL InnoDB uses B+Tree as the default index structure:
Root Node (Level 2)
    ├── Internal Node (Level 1)
    │     ├── Leaf Node (Level 0) → Data Records
    │     └── Leaf Node (Level 0) → Data Records
    └── Internal Node (Level 1)
          ├── Leaf Node (Level 0) → Data Records
          └── Leaf Node (Level 0) → Data Records
B+Tree Characteristics:
  • All data stored in leaf nodes
  • Leaf nodes linked as doubly-linked list
  • Internal nodes only store keys and pointers
  • Height typically 3-4 levels for millions of records
Why B+Tree is Fast:
  • With 100 records per leaf node and 1000 keys per internal node:
    • Level 1: 1,000 records
    • Level 2: 100,000 records
    • Level 3: 10,000,000 records
    • Level 4: 1,000,000,000 records

Clustered vs Non-Clustered Indexes

Clustered Index (Primary Key)

CREATE TABLE users (
    id INT PRIMARY KEY,      -- Clustered index
    name VARCHAR(50),
    age INT
);
Characteristics:
  • Data stored with index (index is data)
  • One per table (typically primary key)
  • Leaf nodes contain complete row data
  • Fast for primary key lookups

Secondary Index (Non-Clustered)

-- Create secondary index
CREATE INDEX idx_name ON users(name);
Characteristics:
  • Leaf nodes store primary key values
  • Multiple per table allowed
  • Requires “back to table” (回表) operation
  • Stores: index column + primary key

Index Types

Single Column Index

-- Create index on single column
CREATE INDEX idx_salary ON employees(salary);

-- Use in query
SELECT * FROM employees WHERE salary > 8000;

Composite Index

-- Create index on multiple columns
CREATE INDEX idx_dept_salary ON employees(department_id, salary);

-- Follows leftmost prefix rule
SELECT * FROM employees 
WHERE department_id = 10 AND salary > 5000;  -- Uses index

SELECT * FROM employees 
WHERE salary > 5000;  -- Doesn't use index (missing leftmost column)
Leftmost Prefix Rule: Composite index (a, b, c) works for:
  • (a)
  • (a, b)
  • (a, b, c)
But NOT for: (b), (c), (b, c)

Unique Index

-- Enforce uniqueness
CREATE UNIQUE INDEX idx_email ON users(email);

-- Or in table definition
CREATE TABLE users (
    id INT PRIMARY KEY,
    email VARCHAR(100) UNIQUE
);

Query Optimization

Index Optimization Strategies

1. Covering Index

-- Create covering index
CREATE INDEX idx_dept_salary_name 
ON employees(department_id, salary, name);

-- Query only uses index columns (no table access needed)
SELECT department_id, salary, name
FROM employees
WHERE department_id = 10;
Covering Index: When all queried columns exist in the index, avoiding table access.

2. Index Hints

-- Force index usage
SELECT * FROM employees 
FORCE INDEX (idx_salary)
WHERE salary > 8000;

-- Ignore specific index
SELECT * FROM employees 
IGNORE INDEX (idx_age)
WHERE age > 30;

3. Avoid Index Failure

-- BAD: Function on indexed column
SELECT * FROM employees WHERE YEAR(hire_date) = 2020;

-- GOOD: Rewrite without function
SELECT * FROM employees 
WHERE hire_date >= '2020-01-01' AND hire_date < '2021-01-01';

-- BAD: Implicit type conversion
SELECT * FROM users WHERE phone = 13800138000;  -- phone is VARCHAR

-- GOOD: Explicit type match
SELECT * FROM users WHERE phone = '13800138000';

-- BAD: Leading wildcard
SELECT * FROM users WHERE name LIKE '%zhang%';

-- GOOD: Prefix match
SELECT * FROM users WHERE name LIKE 'zhang%';

EXPLAIN Analysis

EXPLAIN SELECT * FROM employees WHERE salary > 8000;
Key EXPLAIN Columns:
ColumnDescriptionImportant Values
typeJoin typeALL (worst) → index → range → ref → const (best)
keyIndex usedNULL means no index
rowsRows examinedLower is better
ExtraAdditional info”Using index” is good, “Using filesort” is bad
EXPLAIN SELECT id, name FROM users WHERE id = 1;
-- type: const
-- key: PRIMARY
-- rows: 1
-- Extra: NULL

Transaction Management

ACID Properties

Atomicity

All operations succeed or all fail. No partial completion.

Consistency

Database transitions from one valid state to another.

Isolation

Concurrent transactions don’t interfere with each other.

Durability

Committed changes persist even after system failure.

Transaction Operations

-- Start transaction
START TRANSACTION;
-- or
BEGIN;

-- Execute operations
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

-- Commit changes
COMMIT;

-- Or rollback if error
ROLLBACK;

Savepoints

START TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
SAVEPOINT sp1;

UPDATE accounts SET balance = balance + 100 WHERE id = 2;
SAVEPOINT sp2;

UPDATE accounts SET balance = balance - 50 WHERE id = 3;

-- Rollback to savepoint
ROLLBACK TO sp2;

COMMIT;

Isolation Levels

-- Check current isolation level
SELECT @@transaction_isolation;

-- Set isolation level
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
Isolation LevelDirty ReadNon-Repeatable ReadPhantom Read
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ (default)
SERIALIZABLE
MySQL default: REPEATABLE READ PostgreSQL/Oracle default: READ COMMITTED

Locking Mechanisms

Row-Level Locks

-- Shared lock (S lock) - Read lock
SELECT * FROM users WHERE id = 1 LOCK IN SHARE MODE;

-- Exclusive lock (X lock) - Write lock
SELECT * FROM users WHERE id = 1 FOR UPDATE;

Deadlock Example

-- Transaction 1
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- Locks row 1
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- Waits for row 2

-- Transaction 2 (concurrent)
START TRANSACTION;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- Locks row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- Waits for row 1 → DEADLOCK!
Avoiding Deadlocks:
  • Access tables/rows in same order
  • Keep transactions short
  • Use appropriate isolation levels
  • Set lock timeout values

Storage Engine: InnoDB

Page Structure

InnoDB manages data in pages (default 16KB):
Page (16KB)
├── Page Header
├── Infimum + Supremum Records
├── User Records (data rows)
├── Free Space
├── Page Directory
└── Page Trailer

Buffer Pool

-- Check buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Set buffer pool size (in bytes)
SET GLOBAL innodb_buffer_pool_size = 2147483648;  -- 2GB
Buffer Pool Benefits:
  • Caches frequently accessed pages
  • Reduces disk I/O
  • Improves query performance
Recommended buffer pool size: 60-80% of available RAM for dedicated DB servers

Performance Optimization

Query Optimization Techniques

1. Limit Result Sets

-- Use LIMIT for pagination
SELECT * FROM users 
ORDER BY created_at DESC 
LIMIT 20 OFFSET 0;

-- Better: Use WHERE with index
SELECT * FROM users 
WHERE id > 1000
ORDER BY id 
LIMIT 20;

2. Avoid SELECT *

-- BAD
SELECT * FROM users WHERE id = 1;

-- GOOD
SELECT id, name, email FROM users WHERE id = 1;

3. Use EXISTS Instead of IN

-- Less efficient
SELECT * FROM employees 
WHERE department_id IN (SELECT id FROM departments WHERE name = 'IT');

-- More efficient
SELECT * FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.id = e.department_id AND d.name = 'IT'
);

4. Optimize JOIN Operations

-- Use small table as driving table
SELECT /*+ LEADING(d e) */ e.name, d.department_name
FROM departments d
INNER JOIN employees e ON d.id = e.department_id
WHERE d.name = 'IT';
Join Limit: Avoid joining more than 3 tables. For complex queries, consider:
  • Denormalization
  • Caching
  • Query splitting

Index Design Principles

✓ Columns in WHERE clauses ✓ Columns in JOIN conditions ✓ Columns in ORDER BY/GROUP BY ✓ Foreign key columns ✓ Frequently queried columns
✗ Columns with few distinct values ✗ Tables with frequent INSERT/UPDATE/DELETE ✗ Columns rarely used in queries ✗ Small tables (< 1000 rows) ✗ Columns with large data types (TEXT, BLOB)
Order columns by:
  1. Equality conditions (equals)
  2. Range conditions (greater than, less than, BETWEEN)
  3. Sort/group columns (ORDER BY, GROUP BY)
  4. High cardinality first

Monitoring and Maintenance

Slow Query Log

-- Enable slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 2;  -- Queries > 2 seconds
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- Check slow queries
SHOW GLOBAL STATUS LIKE 'Slow_queries';

Index Statistics

-- Analyze table to update statistics
ANALYZE TABLE employees;

-- Show index cardinality
SHOW INDEX FROM employees;

-- Check index usage
SELECT * FROM sys.schema_unused_indexes;

Table Optimization

-- Optimize table (reclaim space, rebuild indexes)
OPTIMIZE TABLE employees;

-- Check table integrity
CHECK TABLE employees;

-- Repair corrupted table
REPAIR TABLE employees;

Best Practices Summary

Index Strategy

  • Create indexes on frequently queried columns
  • Use composite indexes wisely
  • Monitor index usage regularly
  • Remove unused indexes

Query Optimization

  • Use EXPLAIN to analyze queries
  • Avoid SELECT *
  • Limit result sets
  • Optimize JOIN operations

Transaction Management

  • Keep transactions short
  • Use appropriate isolation levels
  • Handle deadlocks gracefully
  • Commit or rollback explicitly

Performance Monitoring

  • Enable slow query log
  • Monitor buffer pool hit rate
  • Track query execution time
  • Regular maintenance tasks

Next Steps

MySQL Basics

Review fundamental concepts

Redis

Learn Redis for caching strategies

Build docs developers (and LLMs) love