Hacker's Handbook


Exactly once

Exactly once

Posted: 2025-08-19

Why "Exactly Once" in Payments Is a Myth, and What Works Instead

Payment systems live with retries. Customers double-click. Networks misbehave. Providers time out. For most services on the Internet this is not a big problem, but as soon as money is involved the stakes are higher.

"Exactly once processing" sounds like the answer. In distributed systems it is not achievable.

As Mathias Verraes joked:

There are only two hard problems in distributed systems:

  1. Exactly-once delivery
  2. Guaranteed order of messages
  3. Exactly-once delivery

Databases Already Solved This

In the 1970s and 80s, relational databases gave us transactions. BEGIN, COMMIT, ROLLBACK. Two-phase commit across a couple of resources.

For many business systems this was enough. Inside one database boundary, you could rely on exactly-once semantics and sleep well.

Databases even solved the two "hard problems" of distributed systems: ordering and exactly-once delivery, by hiding them under the hood. A master node applied every write in order, replicas replayed the log, and the system recovered after crashes. To the developer, it looked simple: every committed transaction happened once, in order. The complexity was real, but it was contained inside the database engine.


The World Changed

Modern architectures encourage each microservice to own its database. A single payment now crosses several services and several persistence layers.

Two-phase commit across them is brittle, expensive, and slow. Crashes, retries, and failovers are the norm. The old guarantees don't reach that far.

As Gregor Hohpe noted in his classic essay Starbucks Doesn't Use Two-Phase Commit, the coffee shop doesn't lock a global transaction until your latte is ready. It takes your order, gives you a token, and processes each step independently. The business process absorbs retries and delays, rather than the infrastructure pretending they cannot happen.


What Works Today

Instead of promising exactly once, modern systems aim for at-least once execution with consistent outcomes. The same request may run multiple times, but the final state is unambiguous.

The main tools:

  • Transaction tokens (idempotency keys). Generated at the very start, carried through every call and write.

  • Idempotent operations. Each subsystem treats "process payment with token X" as a conditional write: succeed once, or return the same result again.

  • Persistent logs. Append-only records allow reconciliation after crashes.

  • Event-based systems and message queues. Kafka, RabbitMQ, or SQS provide durability and back-pressure. But they only guarantee at-least-once delivery. Your handlers still need tokens and idempotency to prevent duplicates.


When Security or Scale Demands More

For flows that span companies, regulators, or geographies, further reinforcement helps:

Ledgers

Append-only histories allow replay, audit, and reconciliation.

In payments this is more than a log file, it is often double-entry bookkeeping at the core. Every debit has a matching credit, and the ledger balances at all times. Settlement windows (say, end-of-day netting between banks) depend on these records, as do regulatory audit trails.

This is not a new idea. As Jim Gray described in his 1978 work on transaction processing, databases have always relied on a commit log to guarantee durability and recoverability. The log is the database. What modern ledger systems and blockchains add is immutability and verifiability:

  • Hash-chained logs. Every entry in the ledger includes a cryptographic hash of the previous entry. This creates a chain where altering even a single past record changes every hash after it. The effect is tamper evidence: regulators, auditors, or counterparties can verify that no history has been rewritten. In payments this is critical for settlement systems where disputes may arise months later, the hash chain proves the record is intact.

  • Patricia–Merkle trees. These are tree-shaped data structures where each branch node contains the hash of its children. They allow you to prove the presence (or absence) of a transaction without revealing the entire ledger. For example, a bank can prove to a regulator that a given transfer is included in the ledger, or two institutions can reconcile only the subset of accounts they have in common. This makes cross-organization settlement and audit feasible without exchanging full databases.

The result is a tamper-evident accounting system suitable for flows that span organizations and regulators. This is why your bank transfer shows as "pending" for a day or two: it sits in a settlement ledger until the window closes and reconciliation completes.

CRDTs

Conflict-free replicated data types (CRDTs) allow distributed nodes to update independently and converge without locks.

In payments, this matters when multiple actors must see consistent balances without a single global database. Imagine a mobile wallet replicated across regions: users can initiate payments offline or under flaky networks. Each node accepts local updates, and CRDTs guarantee the balances converge once connectivity is restored.

Chris Meiklejohn and colleagues have shown how CRDTs can underpin highly available systems that still provide strong convergence guarantees. Systems like Filibuster push this further by systematically testing failure scenarios in microservices, ensuring that retries and idempotency hold even under complex distributed failures.

Academic work such as Meiklejohn & Van Roy's "Lasp" and later work on replicated data types at scale point to a future where distributed programming models incorporate these guarantees by design, rather than bolting them on afterwards.

These approaches do not remove complexity, but they make convergence predictable. They are the natural extension of the same instinct that made relational databases so powerful in the 70s: encapsulate the hard problems once, so that application developers can move faster with less risk.


Practical Guidance

  • If one service and one database can own the whole transaction, trust it.
  • If multiple services are involved, add tokens, idempotency, and durable queues.
  • If multiple organizations or regions are involved, add ledgers and consider CRDTs.

This is not academic. It's how real-world systems run. Stripe processes millions of payments a day using idempotency keys. Your bank marks a transfer as "pending" because it sits in a settlement ledger until the next batch window. The point is not to eliminate retries or duplicates, but to make them safe and boring.

- Happi


Happi Hacking AB
KIVRA: 556912-2707
106 31 Stockholm