show the difference

A Quick Intro to Distributed Systems + CAP/ACID/BASE: First Steps Toward “Exactly-Once”

What happens when a single machine hits its limits? Why isn’t the network “perfect”? In a partition, do you pick C or A? A short, punchy primer.

Reading time: ~7–8 min

What Are Distributed Systems and Why Use Them?

A distributed system is made of components running on different servers/devices that coordinate by exchanging messages. Instead of one big box, many machines work together, which gives you:

  • Horizontal scale: add nodes to increase capacity.
  • Fault tolerance: if one node fails, others keep serving.

This lets you handle workloads beyond a single machine and reduce single points of failure. The price: networks, disks, software, and timing can (and will) fail. Design with failure as the default (timeouts, retries, jitter, backpressure, circuit breakers, observability, etc.).

Core Challenges

Network and hardware failures are normal: servers crash, disks die, links drop, latency spikes. The famous fallacies of distributed computing (e.g., “the network is reliable,” “latency is zero,” “bandwidth is infinite”) are traps. These uncertainties cause partial failure—some components fail while others keep running. Developers must plan timeouts, retries, backpressure, and compensation from the start.

CAP Theorem: In a Partition, C or A?

CAP (Brewer’s) Theorem says that under a network partition, you cannot simultaneously guarantee both Consistency (C) and Availability (A); Partition tolerance (P) is a given in real systems. During a partition you must choose:

  • Preserve C → reject/block some requests, sacrificing A.
  • Preserve A → keep responding, accepting brief inconsistency.

Note: Without a partition, you can often enjoy both C and A just fine. CAP mainly clarifies what you do when the link breaks.

Consistency Models: ACID vs BASE

  • ACID (Atomicity, Consistency, Isolation, Durability): strong consistency; may introduce blocking under partitions (depends on isolation).
  • BASE (“Basically Available, Soft state, Eventually consistent”): replicas converge over time; favors availability/scale, but needs conflict resolution (e.g., vector clocks, last-writer-wins).

How to choose?

By domain: Finance leans ACID; massive social feeds lean BASE.

  • Pick ACID when errors are expensive (money movement, strict inventory, double-spend risk).
  • Pick BASE when you need global reach, extreme read throughput, and brief staleness is acceptable.

show the difference

Mini Scenario: EV Charging Network with Grid-Aware Sessions

Context: Nationwide EV chargers. When the grid is constrained, the operator pushes dynamic prices and power throttling.

User flow: reserve → authorize → start charging → interim meter reports → stop → billing.

Show user flow: reserve → authorize → start charging → interim meter reports → stop → billing.

A) Discovery & Offers (AP + BASE)

Station availability (free/busy, wait time) and dynamic price signals must be highly available; a few seconds of staleness are acceptable.

Choice: AP-leaning + BASE (caches/replicas with TTL; tolerate small drift).

B) Session Lifecycle (CP + ACID + SAGA)

kWh accounting, payments, reservation locks must be correct—no wrong totals.

Choice: CP-leaning + ACID; on failures use SAGA compensations. Orchestrators like Temporal or AWS Step Functions add durable retries and rollbacks.

C) Telemetry and the “Exactly-Once Effect”

Use at-least-once delivery + idempotent consumers: don’t lose meter data; if duplicated, apply it once.

Transactional Outbox + CDC (Debezium): producer writes data + outbox atomically; CDC publishes to the broker reliably.

Product Support (2025)

  • Kafka: Idempotent producers + transactions enable exactly-once processing semantics (EOS) (especially across stream pipelines).
  • Apache Pulsar: Transactions unify consume+produce in a single atomic context.
  • Google Cloud Pub/Sub: Exactly-once delivery in certain subscription modes (mind the constraints).

*End-to-end sequence: Orchestrator → Broker → Station Control → EVSE (OCPP); telemetry to Session Service via Inbox; finalize and billing capture; DLQ replay path.*

Closing

Sound distributed design requires a clear CAP stance for partitions and per-flow ACID/BASE choices. In EV charging, keep reads on AP/BASE for great UX, and enforce CP/ACID for critical accounting and payments. The practical path toward “exactly-once” is paved with idempotency and patterns like outbox/inbox + CDC.

Sources

Similar Posts