The exactly-once lie: idempotency keys and how I keep payments honest

Most of my career has been spent on systems where money moves. Payroll, bank integrations, the kind of code where a bug does not just throw a stack trace, it pays a stranger twice. The single hardest lesson from that work is also the most boring sounding one:

"Exactly once" is a lie you tell yourself. Plan for "at least once" and make the duplicates harmless.

Here is why, and the one pattern that has saved me more times than any clever abstraction.

Why "exactly once" doesn't exist

Picture the simplest possible flow. Your service calls a bank API to send a payment. You fire the request. The bank processes it. And then, right before the 200 OK reaches you, the connection drops.

What happened? You have no idea. Maybe the bank sent the money and you lost the response. Maybe the request never arrived. From where you are standing, a successful payment and a failed one look identical.

Now multiply that uncertainty across everything that retries in a real system:

HTTP clients retry on timeouts.
Kafka and RabbitMQ redeliver messages when a consumer does not ack in time.
Load balancers and gateways replay requests.
Users double-click the "Pay" button because the spinner felt slow.

Every one of those is a correctness feature for availability and a footgun for money. The network does not promise to deliver your request once. It promises to deliver it at least once, and quietly leaves the rest to you.

The fix: make the request carry its own identity

An idempotency key is a unique token the caller generates for a logical operation and reuses on every retry of that same operation. The server makes a promise back: for a given key, I will do the work at most once, and return the same result every time you ask.

The client side is trivial. Generate the key once, before the first attempt, and hold it across retries:

// Generate ONCE per logical operation, not per HTTP attempt.
val idempotencyKey = UUID.randomUUID().toString()

retry(times = 3) {
    httpClient.post("/payments") {
        header("Idempotency-Key", idempotencyKey)
        setBody(PaymentRequest(amount, destination))
    }
}

The interesting half lives on the server. The naive version, check if the key exists then insert, has a race. Two redeliveries arriving at once both pass the check, and you are back to paying twice. The fix is to let the database be the referee with a unique constraint:

@Transactional
fun process(key: String, request: PaymentRequest): PaymentResult {
    // First writer wins. A unique index on idempotency_key makes the DB
    // the single source of truth for "have I seen this before?"
    val existing = paymentRepository.findByIdempotencyKey(key)
    if (existing != null) return existing.toResult()

    return try {
        val payment = paymentRepository.saveAndFlush(
            Payment(idempotencyKey = key, status = PENDING, request = request)
        )
        val result = bank.send(request)        // the side effect
        payment.markComplete(result)
        payment.toResult()
    } catch (e: DataIntegrityViolationException) {
        // Lost the insert race. A concurrent request already claimed this key,
        // so return its result instead of doing the work again.
        paymentRepository.findByIdempotencyKey(key)!!.toResult()
    }
}

The unique constraint is the whole trick. You are not checking for duplicates, you are making them impossible to commit, then handling the rejection gracefully.

The details that bite you later

Getting the happy path right is easy. These are the things I have learned to nail down before shipping:

Store the response, not just the key. A retry should get the same answer as the original, same payment ID, same status, not a fresh 409. The caller should not have to care that it was a retry.
Persist the key before the side effect, in the same transaction as the work. If you call the bank first and crash before recording the key, you have lost the race against your own retry.
Give keys a TTL. They cannot live forever. Pick a window that comfortably outlasts your longest retry policy (24 to 72 hours is usually plenty) and expire them.
Scope the key correctly. An idempotency key for "pay invoice #42" should be derived from something stable about that invoice, so a genuine retry reuses it but a legitimately new payment gets a fresh one.
Make it boring. This logic should be a thin, well-tested wrapper that every money-moving endpoint shares, not something each feature reinvents.

Why I keep coming back to it

There is a principle under all of this that goes well beyond payments. In a distributed system you cannot control how many times a message arrives, so you control what happens when it arrives more than once. You stop fighting the network's "at least once" and instead make your operations safe to repeat.

Do that, and a dropped connection at the worst possible moment stops being a 2am incident and becomes a non-event. The retry sails through, the key says "already done," and nobody gets paid twice. That is the actual job a lot of the time. Turning scary uncertainty into something boring. Boring means people got paid the right amount.

The exactly-once lie: idempotency keys and how I keep payments honest

Why "exactly once" doesn't exist

The fix: make the request carry its own identity

The details that bite you later

Why I keep coming back to it

Comments