Introduction
In the real world, many things can go wrong when working with data:- The database software or hardware may fail at any time (including in the middle of a write operation)
- The application may crash at any time (including in the middle of a series of operations)
- Network interruptions can unexpectedly cut off the application from the database, or one database node from another
- Several clients may write to the database at the same time, overwriting each other’s changes
- A client may read data that doesn’t make sense because it has only partially been updated
- Race conditions between clients can cause surprising bugs
Why transactions?
Transactions simplify the programming model for applications accessing a database. By using transactions, the application can pretend that certain concurrency problems and certain kinds of hardware and software faults don’t exist. A large class of errors is reduced to a simple transaction abort, and the application just needs to retry. Without transactions, you need to worry about:- What happens if the database crashes while writing multiple records?
- What happens if two clients try to update the same data at the same time?
- How to handle partial failures in complex operations?
The ACID properties
The safety guarantees provided by transactions are often described by the acronym ACID: Atomicity, Consistency, Isolation, and Durability. However, in practice, these terms are somewhat ambiguous and implementations vary. Let’s examine each property in detail.Atomicity
Atomicity means that a transaction is treated as a single, indivisible unit of work. Either all of its operations succeed (and the transaction commits), or none of them do (the transaction aborts and all changes are rolled back).Example: bank transfer
The classic example is transferring money between bank accounts:Key points:
- If any operation in the transaction fails, all previous operations are undone
- The database guarantees that you never end up in a state where money was deducted but not added
- This is sometimes called all-or-nothing guarantee
Abort and retry
If a transaction aborts, the application can safely retry it. However, retry logic isn’t perfect:Consistency
Consistency in ACID is actually a property of the application, not the database. The database provides mechanisms (constraints, triggers), but the application is responsible for defining what “consistency” means.Key points:
- The “C” in ACID is somewhat misnamed - it’s really about application correctness
- Atomicity, Isolation, and Durability are properties of the database
- Consistency is a property of the application
- The application relies on the database’s atomicity and isolation to achieve consistency
Isolation
Isolation means that concurrently executing transactions are isolated from each other. Each transaction can pretend that it’s the only transaction running on the entire database.The problem: concurrent transactions
Without isolation, concurrent transactions can interfere with each other in confusing ways: This is called a lost update - one transaction’s write overwrites another’s, as if it never happened.Isolation levels
In practice, full isolation is expensive (requires serialization, which hurts performance). Most databases offer several isolation levels, trading off consistency guarantees for performance. We’ll explore each isolation level in detail in the following sections.Durability
Durability is the promise that once a transaction has committed successfully, any data it has written will not be forgotten, even if there’s a hardware fault or the database crashes.How durability works
Single-node database:- Write data to multiple nodes
- Only acknowledge commit after data written to multiple nodes
- If one node fails, data survives on other nodes
Durability limitations
Trade-offs:Isolation levels and concurrency problems
Now let’s dive deep into isolation - the most complex of the ACID properties. We’ll explore the problems that can occur when transactions run concurrently, and the isolation levels that prevent them.Read committed
Read Committed is the most basic level of transaction isolation. It makes two guarantees:- No dirty reads: Only read data that has been committed
- No dirty writes: Only overwrite data that has been committed
No dirty reads
Dirty read: Reading uncommitted data written by another transaction. Prevention with Read Committed:No dirty writes
Dirty write: Overwriting uncommitted data written by another transaction.Most databases use Read Committed as the default isolation level (PostgreSQL, Oracle, SQL Server).
Snapshot isolation (repeatable read)
Read Committed prevents dirty reads, but still allows non-repeatable reads: Non-repeatable read: Same query returns different results within a transaction. Solution: Snapshot Isolation (also called Repeatable Read): Each transaction reads from a consistent snapshot of the database - the transaction sees all data that was committed at the start of the transaction, plus its own uncommitted writes.Why snapshot isolation is important
Snapshot Isolation solves critical real-world problems that Read Committed cannot handle: Real-world scenario 1: Database Backup Without snapshot isolation, backups are unreliable:When Snapshot Isolation is Critical
When Snapshot Isolation is Critical
Use Cases Requiring Snapshot Isolation:
- Long-running analytics queries: Need consistent view even when transaction runs for minutes or hours
- Database backups: Must capture consistent state across all tables
- Data integrity checks: Validation must see stable data
- Report generation: Numbers must add up consistently
- ETL processes: Extract data in consistent state
- Batch processing: Process consistent dataset
- Short transactions: If your transaction only does a single read or write, Read Committed is sufficient
- No multi-step reads: If you don’t read the same data twice in a transaction
- When you need to see latest data: Snapshot isolation shows data at start of transaction, not latest
Multi-version concurrency control (MVCC)
Implementation: Keep multiple versions of each object, tagged with transaction ID that created it. Benefits:- Long-running read transactions don’t block writes
- Writes don’t block reads
- Better performance for read-heavy workloads
Lost updates
Even with snapshot isolation, some problems remain. One important one is lost updates. Lost update: Two transactions read a value, modify it, and write it back, with one modification getting lost.Solutions to lost updates
Solution 1: Atomic Write OperationsWrite skew and phantoms
Write skew is a generalization of lost updates. It happens when two transactions read the same objects, then update some of those objects (different ones). Because they update different objects, neither transaction sees a conflict, yet an invariant is violated.Example: on-call doctor scheduling
Business rule: At least one doctor must be on call at all times.Solutions to write skew
Solution 1: Serializable Isolation Use the strongest isolation level (covered next section). Solution 2: Explicit LocksSerializable isolation
Serializable isolation is the strongest isolation level. It guarantees that even though transactions may execute in parallel, the end result is the same as if they executed one at a time, in some serial order. Three main techniques for implementing serializability:Actual serial execution
The simplest way to avoid concurrency problems: don’t allow concurrency! Execute one transaction at a time, in serial order. This seems crazy (throwing away concurrency), but it’s viable if:- Transactions are very fast (no slow I/O)
- Dataset fits in memory (no disk seeks)
- Single-threaded CPU can handle the throughput
- Throughput limited to single CPU core
- Can’t do slow I/O (network requests, disk seeks)
- Multi-partition transactions expensive (coordination needed)
Two-phase locking (2PL)
For decades, the standard way to implement serializability. Stronger than locks we’ve seen before. Rules:- If transaction wants to read an object, must acquire shared lock
- Multiple transactions can hold shared lock simultaneously
- If transaction wants to write an object, must acquire exclusive lock
- No other locks (shared or exclusive) can be held simultaneously
- If transaction holds a lock, it holds it until transaction commits or aborts (two-phase: acquire locks, then release all at end)
- Poor concurrency: Readers and writers block each other
- Deadlocks: Two transactions waiting for each other’s locks
Serializable snapshot isolation (SSI)
A newer algorithm that provides serializable isolation with better performance than 2PL. Key idea: Use snapshot isolation, but detect when isolation has been violated and abort transactions. How it works:- Transactions execute using snapshot isolation (MVCC)
- Database tracks reads and writes to detect conflicts
- When a potential conflict detected, abort one transaction
Summary
Transactions are fundamental to reliable database systems. Key takeaways: ACID Properties:- Atomicity: All or nothing - transaction either completes fully or not at all
- Consistency: Application-defined invariants maintained
- Isolation: Concurrent transactions don’t interfere with each other
- Durability: Committed data persists even after crashes
| Level | Prevents Dirty Reads | Prevents Lost Updates | Prevents Write Skew | Performance |
|---|---|---|---|---|
| Read Committed | ✓ | ❌ | ❌ | High |
| Snapshot Isolation | ✓ | Sometimes | ❌ | Medium |
| Serializable | ✓ | ✓ | ✓ | Low |
- Read Committed: Default for most applications, good balance
- Snapshot Isolation: Long-running analytics, backups, reports
- Serializable: Critical operations requiring strongest guarantees