Implementing Last-Write-Wins in CouchDB

You reached this page because a CouchDB database is accumulating conflicting leaf revisions and you need a deterministic Last-Write-Wins (LWW) resolver that promotes the newest write and tombstones the losers without stalling replication. LWW is the cheapest strategy in the wider decision covered by algorithm selection for merge, and it is the right default for append-heavy telemetry and sensor metrics where a discarded overwrite is acceptable. This guide walks the exact incident path: confirm the symptom, apply a timezone-safe timestamp comparison across every branch, commit the winner plus loser tombstones in one _bulk_docs batch, then verify the conflict rate actually drops. The failure the whole procedure guards against is clock skew — CouchDB’s _rev ordering is topological, never chronological, so LWW only works when timestamps are trustworthy.

Immediate Triage & Prerequisites

Before writing any resolution logic, confirm that you actually have an LWW-shaped problem and not a filter or auth failure masquerading as one. LWW degradation rarely surfaces as an application error; it shows up as silent divergence, a lagging _changes feed, and growing _conflicts arrays.

Confirm conflicts exist. Read a suspect document with the conflicts flag: curl 'http://localhost:5984/iot_telemetry/sensor-42?conflicts=true'. A non-empty _conflicts array is the signal that competing leaves are being retained. CouchDB never auto-prunes these leaves — they survive compaction until you delete them.
Check replication health. Query GET /_active_tasks (or GET /_scheduler/jobs) and filter for type: "replication" rows with doc_write_failures > 0. If pending changes exceed a few hundred per node, pause continuous replication by setting "continuous": false on the relevant job to stop write amplification while you stabilise the resolver.
Verify an application timestamp exists. Every writer — edge device, mobile client, backend — must stamp a field such as updated_at or ts. Grep a document for it before trusting LWW. If it is missing, you cannot do temporal resolution and must fall back to a different strategy from conflict generation models analysis first.
Environment. Python 3.10+ and a single HTTP client (requests or httpx) are all the resolver needs. Run one resolver replica per replication partition; two racing the same partition just doubles 409 pressure.

The reason _rev cannot substitute for a timestamp is worth internalising: revision IDs are generated by the write node from content and lineage, as detailed in revision tree mechanics, so 2-abc is not “newer in time” than 2-def. Only an application clock encodes wall-time intent.

Step-by-Step Implementation

Each step below includes a verification you can run before moving on. The overall flow — read all leaves, pick the newest, commit winner plus tombstones — is shown here:

Fetch the document with its conflicts. Request the winning body and the computed loser list in one call: GET /iot_telemetry/{id}?conflicts=true. The response carries _rev (the current read-winner) and a _conflicts array of losing leaf revisions. Verify: assert body.get("_conflicts"), otherwise there is nothing to resolve.
Fetch every losing leaf body. The _conflicts array holds only revision IDs, not bodies. Retrieve each with GET /iot_telemetry/{id}?rev={rev} so you can read its updated_at. Verify: the number of fetched branch bodies equals len(_conflicts) + 1 (losers plus the current winner).
Select the winner by timestamp. Compare the application timestamp across all branches. Parse every value as timezone-aware UTC to avoid regional-offset inversion. When two timestamps tie, break the tie deterministically on the lexicographically highest _rev so parallel workers always agree. Verify: assert winner["updated_at"] == max(b["updated_at"] for b in branches).
Build one _bulk_docs batch. Put the surviving fields (stamped against the current winner’s _rev) first, then a {"_id": id, "_rev": rev, "_deleted": true} tombstone for every losing leaf. Writing the winner alone leaves the document conflicted — the conflict only clears when the losers are tombstoned. Verify: assert len(batch["docs"]) == len(_conflicts) + 1.
POST the batch and handle 409. POST /iot_telemetry/_bulk_docs. On a 409 Conflict, the winner _rev went stale between read and write; re-read the document, rebuild the batch against the fresh _rev, and retry with exponential backoff. Persist race-prone retries alongside your broader error handling & retry logic. Verify: re-read with ?conflicts=true and assert the _conflicts key is now absent.

Complete Working Example

This self-contained script resolves a single conflicted document end to end. It uses timezone-aware UTC parsing, a deterministic tie-break, a single atomic batch, and bounded backoff on 409. Set COUCH_URL and DOC_ID in the environment before running.

import os
import time
from datetime import datetime, timezone

import requests


def parse_ts(value: str) -> datetime:
    """Parse an application timestamp as timezone-aware UTC.

    Accepts ISO-8601 with a trailing 'Z' or an explicit offset; naive
    values are assumed UTC so a missing offset can never invert LWW.
    """
    dt = datetime.fromisoformat(value.replace("Z", "+00:00"))
    if dt.tzinfo is None:
        dt = dt.replace(tzinfo=timezone.utc)
    return dt.astimezone(timezone.utc)


def fetch_branches(db_url: str, doc_id: str) -> tuple[dict, list[dict]]:
    """Return (winner_body, [all_branch_bodies]) for a conflicted doc."""
    winner = requests.get(f"{db_url}/{doc_id}", params={"conflicts": "true"}).json()
    losers = winner.get("_conflicts", [])
    branches = [winner]
    for rev in losers:
        branches.append(requests.get(f"{db_url}/{doc_id}", params={"rev": rev}).json())
    return winner, branches


def pick_winner(branches: list[dict]) -> dict:
    """Newest application timestamp wins; ties break on highest _rev."""
    return max(branches, key=lambda d: (parse_ts(d.get("updated_at", "")), d["_rev"]))


def resolve_lww(db_url: str, doc_id: str, max_retries: int = 5) -> dict:
    """Promote the newest branch and tombstone every losing leaf atomically."""
    for attempt in range(1, max_retries + 1):
        winner, branches = fetch_branches(db_url, doc_id)
        losers = winner.get("_conflicts", [])
        if not losers:
            return {"ok": True, "note": "no conflict"}

        chosen = pick_winner(branches)
        survivor = {k: v for k, v in chosen.items() if not k.startswith("_")}
        survivor["_id"] = doc_id
        survivor["_rev"] = winner["_rev"]  # write against the current read-winner
        tombstones = [{"_id": doc_id, "_rev": rev, "_deleted": True} for rev in losers]

        resp = requests.post(f"{db_url}/_bulk_docs", json={"docs": [survivor, *tombstones]})
        if resp.status_code != 409:
            return {"ok": True, "attempt": attempt, "result": resp.json()}

        time.sleep(min(2 ** attempt, 30))  # winner _rev went stale; re-read and retry
    raise RuntimeError(f"exhausted retries resolving {doc_id}")


if __name__ == "__main__":
    url = os.environ.get("COUCH_URL", "http://localhost:5984/iot_telemetry")
    doc = os.environ.get("DOC_ID", "sensor-42")
    print(resolve_lww(url, doc))

The same routine slots directly into a streaming worker: subscribe to the _changes feed with conflicts=true, and call resolve_lww for every document whose _conflicts array is non-empty. When you need per-field preservation instead of whole-document replacement, escalate to the semantic and CRDT variants weighed in algorithm selection for merge.

Gotchas & Edge Cases

Clock skew inverts the winner. Wall-clock LWW trusts the writer’s clock. Synchronise edge devices with NTP — or PTP for tighter bounds — and prefer hybrid logical clocks (HLC) over raw timestamps so a lagging device cannot silently overwrite a newer edit.
revs_limit prunes history, not conflicts. Lowering revs_limit via PUT /{db}/_revs_limit (default 1000; 50–100 suits telemetry) trims retained non-leaf history and reduces tree-traversal overhead, but it never removes conflicting leaf branches. Those must still be tombstoned explicitly.
Missing timestamps break temporal ordering. A branch with no updated_at sorts as the empty string and will lose every comparison — verify the field exists on all writers, and route documents lacking it to review rather than dropping them.
Tie collisions need a stable rule. Two branches can share a millisecond. Without the lexicographic _rev tie-break, two workers can pick different winners and re-conflict the document. Keep the tie-break identical across all resolver replicas.
_attachments are per-revision. When a losing leaf carries an attachment the winner lacks, whole-document LWW discards it. If attachments matter, read ?attachments=true and copy them onto the survivor before committing.

Verification & Observability

Confirm the fix at three levels. First, re-read the resolved document with ?conflicts=true and assert the _conflicts key is gone — the authoritative per-document check. Second, watch the CouchDB cluster: GET /_active_tasks should show doc_write_failures flat and the replication backlog draining, while GET /_scheduler/jobs confirms the job is running rather than crashing. Third, emit your own metrics from the resolver — resolution latency (conflict detected to committed _rev), conflicts-resolved count, and escalation rate — and alert when the conflict rate fails to fall after a resolution sweep, which usually means winners are being written without their tombstones. Schedule POST /{db}/_compact in low-traffic windows to physically reclaim storage from pruned branches and deletion tombstones; compaction is what turns cleared conflicts into recovered disk. When timestamp drift or missing metadata makes a document unresolvable, do not force a lossy write — route it through fallback resolution chains and into manual review sync queues with a resolution_status: "manual_review" flag and full audit context.

FAQ

Why can't I just compare _rev numbers to find the latest write?

Because _rev ordering is topological, not chronological. The leading generation integer counts edits along a branch, and the hash suffix is content-derived — neither encodes wall-clock time. Two concurrent edits produce sibling 2-<hash> revisions with no temporal relationship, so LWW must compare an application timestamp such as updated_at, never the _rev.

Do I have to delete the losing revisions, or is writing a new winner enough?

You must tombstone the losers. Writing a new body only adds another leaf; the document stays conflicted until every rev in its _conflicts array is deleted with {"_id": id, "_rev": rev, "_deleted": true}. Sending the survivor and those tombstones in one _bulk_docs batch clears the conflict atomically.

How do I stop clock skew from corrupting LWW decisions?

Synchronise every writer with NTP (or PTP for microsecond bounds), prefer hybrid logical clocks over raw wall-clock values, and always parse timestamps as timezone-aware UTC. HLCs keep causal ordering even when two devices disagree on absolute time, which is exactly the failure LWW is most exposed to.

Part of: Algorithm Selection for Merge

Implementing Last-Write-Wins in CouchDB #

Immediate Triage & Prerequisites #

Step-by-Step Implementation #

Complete Working Example #

Gotchas & Edge Cases #

Verification & Observability #

FAQ #

Related #