Exactly once
Exactly once
Posted: 2025-08-19Why “Exactly Once” in Payments Is a Myth, and What Works Instead
Payment systems live with retries. Customers double-click. Networks misbehave. Providers time out. For most services on the Internet this is not a big problem, but as soon as money is involved the stakes are higher.
“Exactly once processing” sounds like the answer. In distributed systems it is not achievable.
As Mathias Verraes joked:
There are only two hard problems in distributed systems:
- Exactly-once delivery
- Guaranteed order of messages
- Exactly-once delivery
Databases Already Solved This
In the 1970s and 80s, relational databases gave us transactions. BEGIN
,
COMMIT
, ROLLBACK
. Two-phase commit across a couple of resources.
For many business systems this was enough. Inside one database boundary, you could rely on exactly-once semantics and sleep well.
Databases even solved the two “hard problems” of distributed systems: ordering and exactly-once delivery, by hiding them under the hood. A master node applied every write in order, replicas replayed the log, and the system recovered after crashes. To the developer, it looked simple: every committed transaction happened once, in order. The complexity was real, but it was contained inside the database engine.
The World Changed
Modern architectures encourage each microservice to own its database. A single payment now crosses several services and several persistence layers.
Two-phase commit across them is brittle, expensive, and slow. Crashes, retries, and failovers are the norm. The old guarantees don’t reach that far.
As Gregor Hohpe noted in his classic essay Starbucks Doesn't Use Two-Phase Commit, the coffee shop doesn't lock a global transaction until your latte is ready. It takes your order, gives you a token, and processes each step independently. The business process absorbs retries and delays, rather than the infrastructure pretending they cannot happen.
What Works Today
Instead of promising exactly once, modern systems aim for at-least once execution with consistent outcomes. The same request may run multiple times, but the final state is unambiguous.
The main tools:
-
Transaction tokens (idempotency keys). Generated at the very start, carried through every call and write.
-
Idempotent operations. Each subsystem treats “process payment with token X” as a conditional write: succeed once, or return the same result again.
-
Persistent logs. Append-only records allow reconciliation after crashes.
-
Event-based systems and message queues. Kafka, RabbitMQ, or SQS provide durability and back-pressure. But they only guarantee at-least-once delivery. Your handlers still need tokens and idempotency to prevent duplicates.
When Security or Scale Demands More
For flows that span companies, regulators, or geographies, further reinforcement helps:
Ledgers
Append-only histories allow replay, audit, and reconciliation.
In a sense this is not new: every relational database since the 1970s has relied on a log, whether called a write-ahead log, redo log, or commit log. The log is the database. Recovery after crashes has always depended on replaying a durable, preferably idempotent, sequence of changes.
What blockchain technology adds is immutability and verifiability. By hashing every entry and chaining the hashes, you can prove that no historical record has been altered. With Patricia–Merkle trees you can also prove membership and quickly reconcile ledgers across replicas. These mechanisms turn the familiar database log into a tamper-evident ledger, suitable for systems that span organizational or regulatory boundaries.
CRDTs
Conflict-free replicated data types allow distributed nodes to update independently and converge without locks. This is where current research and practice meet. Chris Meiklejohn and colleagues have shown how CRDTs can underpin highly available, fault-tolerant distributed systems that still provide strong guarantees of convergence.
Systems like Filibuster explore how to systematically test failure scenarios and ensure idempotency across microservices. Academic work such as Meiklejohn & Van Roy’s “Lasp” and later work on replicated data types at scale point to a future where distributed programming models incorporate these guarantees by design, rather than bolting them on afterwards.
These approaches do not remove complexity, but they make convergence predictable. They are the natural extension of the same instinct that made relational databases so powerful in the 70s: encapsulate the hard problems once, so that application developers can move faster with less risk.
Practical Guidance
- If one service and one database can own the whole transaction, trust it.
- If multiple services are involved, add tokens, idempotency, and durable queues.
- If multiple organizations or regions are involved, add ledgers and consider CRDTs.
Exactly once was solved by databases fifty years ago. Beyond that boundary, retries return. The goal is not to eliminate multiple deliveries but to make them safe and boring.