Beyond CAP: Understanding the PACELC Theorem in Distributed Systems
In this article, I’m going to further explore the trade-offs that system designers face in distributed systems and databases by introducing the PACELC Theorem (pronounced “pass-elk”), an extension of the popular CAP Theorem.
👉 If you aren’t familiar with CAP, check out my previous article:
📌 [ CAP Theorem Explained: Distributed Systems Series ]
Now, let’s dive in! 🌊
A Quick Refresher: Distributed Systems
A distributed system can be defined as:
A network of computers that work together to provide services or solve problems. These computers in the network are able to communicate with each other in order to execute tasks and applications.
In other words, instead of one powerful machine doing all the work, multiple machines collaborate to achieve scalability, fault tolerance, and performance.
A Quick Refresher: CAP Theorem
The CAP Theorem (formally proved by Gilbert and Lynch) states:
In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.
Simplified: when a network partition (communication failure) happens, a system must choose between:
- Consistency (C): All nodes see the same data at the same time.
- Availability (A): Every request receives a response, even if it’s not the latest data.
- Partition Tolerance (P): The system continues to operate despite communication breakdowns.
Since partitions are inevitable in real-world networks, designers typically trade between Consistency (C) and Availability (A) when a partition occurs.
🌍 A Quick Refresher: What is a Distributed System?
A distributed system can be defined as:
A network of computers that work together to provide services or solve problems. These computers in the network are able to communicate with each other in order to execute tasks and applications.
Think of it as multiple brains working together 🧠💡 instead of one supercomputer.
⚖️ A Quick Refresher: CAP Theorem
The CAP Theorem (proved by Gilbert and Lynch) states:
In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.
🔑 In simple terms, during a network partition (communication failure), a system must choose between:
✅ Consistency (C): All nodes show the same, up-to-date data.
✅ Availability (A): Every request gets a response—even if it’s not the latest.
✅ Partition Tolerance (P): The system still works despite broken communication.
Since network partitions are unavoidable, designers usually balance C vs A.
🦌 Meet PACELC Theorem
So here’s the catch: CAP focuses only on what happens during partitions. But what about when the network is healthy? 🤔
That’s where PACELC comes in. Proposed by Daniel J. Abadi, PACELC extends CAP:
- P → Partition
- A → Availability
- C → Consistency
- E → Else
- L → Latency
- C → Consistency
📢 PACELC says:
If there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally, how does it trade off latency and consistency (L and C).
💡 Why PACELC Exists
CAP helped us reason about failures. But in practice, systems spend most of their life without partitions. So PACELC reminds us:
- During partitions → trade-off between Availability (A) and Consistency (C).
- Else (normal ops) → trade-off between Latency (L) and Consistency (C).
It’s like saying: “Even when things are good, you still have to choose your battles.” ⚔️
⏱️ What is Latency?
Latency = the time ⏳ it takes for a request to travel from client → system → back.
High consistency often means more coordination → ⚡ slower responses.
Low latency often means relaxing consistency → 📉 risk of stale reads.
So latency is always there, even without partitions.
🛠️ Example System Designs (PACELC in Action)
Let’s walk through some replication strategies and how they balance latency vs. consistency:
1️⃣ Preprocessing Replication 🗂️
Requests go through a preprocessor before reaching replicas.
✅ Strong consistency.
❌ Higher latency due to sequencing and routing delays.
2️⃣ Synchronous Replication 🔄
Primary waits for all replicas to confirm writes before allowing reads.
✅ Strong consistency.
❌ Latency depends on the slowest replica.
3️⃣ Asynchronous Replication I ⏩
Primary updates replicas asynchronously.
Reads always go to the primary.
✅ Consistency preserved.
❌ Latency increases since all reads route to primary.
4️⃣ Asynchronous Replication II ⚡
Primary updates replicas asynchronously, but reads can go to any node.
✅ Very low latency.
❌ Inconsistent reads possible.
5️⃣ Hybrid (Async + Sync) Replication ⚖️
Some replicas update synchronously, others asynchronously.
Uses the formula:
If R + W > N → Consistency maintained (but higher latency).
If R + W ≤ N → Faster, but risk of inconsistency.
🌟 Real-World Case Studies
How do famous systems embody PACELC trade-offs? Let’s see 👇
- Amazon DynamoDB (AP/EL)
P → Availability over Consistency.
E → Latency over Consistency (tunable).
⚡ Designed for speed + availability at scale.
- Google Spanner (CP/EC)
P → Consistency over Availability.
E → Consistency over Latency.
⏳ Uses atomic clocks (TrueTime) to maintain global consistency.
- Apache Cassandra (AP/EL with Tunable C)
P → Availability over Consistency.
E → Latency over Consistency (but tunable).
🎚️ Developers choose per-query consistency level.
- MongoDB (CP/EC default, AP/EL optional)
P → Consistency first (writes block until new primary elected).
E → Consistency over Latency, but tunable with read preferences.
- Netflix (AP/EL)
P → Availability.
E → Latency
🍿 Uses Cassandra + EVCache to keep streams smooth—even if recommendations lag a bit.
🎯 Key Takeaways
CAP = trade-offs under partitions.
PACELC = trade-offs under partitions and normal ops.
Systems choose based on business needs:
-
🏦 Banking → Consistency first.
-
🛒 E-commerce → Mix of latency + consistency (depending on operation).
-
🎬 Netflix → Latency + availability first.
Both CAP and PACELC aren’t strict rules—they’re thinking frameworks 🧩 for system design.
✨ Question for You: If you were designing a real-time multiplayer game backend, would you prioritize low latency 🎮 or strong consistency 🛡️?