CAP theorem
Source: CAP theorem revisited
In a distributed computer system, you can only support two of the following guarantees:
- Consistency - Every read receives the most recent write or an error
- Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
- Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures
CP - consistency and partition tolerance
Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.When to choose CP
CP systems sacrifice availability for consistency. Choose CP when:- Your business requirements demand atomic reads and writes
- Data accuracy is more important than system availability
- You need to prevent conflicting updates
Examples of CP systems
- Banking and financial systems
- Systems requiring strong consistency guarantees
- MongoDB (configurable)
- HBase
- Redis (when configured for strong consistency)
AP - availability and partition tolerance
Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved. AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors.When to choose AP
AP systems sacrifice consistency for availability. Choose AP when:- System availability is more critical than data consistency
- Eventual consistency is acceptable for your use case
- You need the system to continue operating during network partitions
Examples of AP systems
- Social media feeds
- DNS
- Caching systems
- Cassandra
- DynamoDB (default configuration)
- CouchDB
Most modern distributed systems aim for “eventual consistency” - they choose availability over immediate consistency but ensure data converges to a consistent state eventually.
Real-world considerations
In practice, the CAP theorem is more nuanced:- Partitions are rare: Network partitions don’t happen constantly, so you don’t always have to choose
- Tunable consistency: Many modern databases offer tunable consistency levels
- Different guarantees for different operations: Some systems provide different guarantees for different types of operations
The CAP theorem describes trade-offs during network partitions. When there are no partitions, you can have both consistency and availability.
Making the choice
Questions to ask when choosing between CP and AP:- Can your application tolerate stale data?
- Is it acceptable for some users to see different versions of data temporarily?
- What happens if the system becomes unavailable during a partition?
- Do you need transactions that span multiple operations?
- What are the business consequences of inconsistent data?
