Understanding MVCC in CouchDB Replication

You reached this page because a CouchDB database kept both of two concurrent writes instead of rejecting one, _conflicts arrays are growing, and you need to understand the Multi-Version Concurrency Control (MVCC) model that produced that outcome before you can operate it. MVCC is the reason replication across a partition never loses a write and never blocks on a lock — but it is also why divergence surfaces as retained conflict leaves rather than a clean error. This page explains how _rev tokens, generations, and the revision tree encode concurrency, then walks the exact steps to inspect a document’s MVCC state, identify the winning branch, and reason about resolution. It sits under Sync Topology Models, because which nodes write to which determines how often MVCC branches diverge in the first place, and it builds on the broader CouchDB Replication Architecture & Revision Fundamentals that governs every sync stream on this site.

Each update appends a new immutable revision rather than mutating the previous one — the prior revisions remain part of the document’s history:

Immediate Triage & Prerequisites

Before reasoning about MVCC, confirm you are looking at genuine multi-version divergence and not a write you simply have not read yet. MVCC never overwrites state in place; every successful write appends a new revision keyed as generation-hash, where the integer counts edit depth along a branch and the hash cryptographically binds the payload. Concurrent writes to the same _id on disconnected nodes therefore produce sibling revisions that both survive replication.

Confirm the document is actually conflicted. Read it with the conflicts flag: curl 'http://localhost:5984/iot_telemetry/sensor-42?conflicts=true'. A non-empty _conflicts array means MVCC retained more than one leaf. Empty means the divergence you saw already converged to a single winner.
Inspect the revision generation. The _rev prefix tells you edit depth, not wall-clock time. Two leaves at 3-<hash> are the same generation on different branches — neither is “newer” chronologically. This distinction is governed by revision tree mechanics, and misreading it is the single most common MVCC mistake.
Check whether replication is still delivering branches. Query GET /_active_tasks (or GET /_scheduler/jobs) and look for type: "replication" rows whose checkpointed_source_seq is still advancing. New conflict leaves can keep arriving until the source drains, so a conflict count that keeps climbing during triage is expected, not a bug.
Environment. Python 3.10+ and a single HTTP client (requests or httpx) are all you need to inspect and resolve MVCC state. Run one resolver replica per replication partition to avoid two workers racing the same document.

The core mental model to carry into the steps below: MVCC turns “two writers disagreed” from an error into a data condition — the divergence is preserved as a branching tree, and it is your pipeline, not CouchDB, that later collapses it. Why those branches appear at all is the subject of conflict generation models; this page is about reading and resolving the state they leave behind.

Step-by-Step Implementation

Each step includes a verification you can run before moving on. The goal is to move from “the document is conflicted” to “I can see every branch, know which one CouchDB reads by default, and can commit a deterministic resolution.”

Read the document with its conflicts. Request the read-winner body and the loser list in one call: GET /iot_telemetry/{id}?conflicts=true. The response carries _rev (the branch CouchDB serves by default) plus a _conflicts array of the other leaf revisions MVCC retained. Verify: assert body.get("_conflicts"), otherwise the document has already converged and there is nothing to resolve.
Retrieve the full revision tree. To see the branching structure rather than just the leaves, request GET /iot_telemetry/{id}?revs=true&open_revs=all. This returns every open leaf with its _revisions history array, exposing where the branches forked. Verify: the number of returned leaves equals len(_conflicts) + 1.
Read every losing leaf body. The _conflicts array holds only revision IDs. Fetch each with GET /iot_telemetry/{id}?rev={rev} so you can compare payloads across branches. Verify: assert len(branch_bodies) == len(_conflicts) + 1.
Confirm the deterministic winner. CouchDB’s read-winner is not random: it selects the leaf with the highest generation number, breaking ties on the lexicographically highest _rev hash. Recompute that selection yourself so your pipeline agrees with the CouchDB cluster. Verify: assert winner_rev == max(leaf_revs, key=lambda r: (int(r.split("-")[0]), r.split("-",1)[1])).
Commit a resolution as one atomic batch. Deciding which branch is authoritative for your data is a policy choice weighed in algorithm selection for merge. Whichever branch you choose, write the survivor plus a {"_id": id, "_rev": rev, "_deleted": true} tombstone for every losing leaf in a single POST /iot_telemetry/_bulk_docs. The conflict clears only when the losers are tombstoned — writing a new body alone just adds another leaf. Verify: re-read with ?conflicts=true and assert the _conflicts key is now absent.

Complete Working Example

This self-contained script inspects a document’s MVCC state and prints its revision tree, the deterministic read-winner, and the retained conflict leaves — the exact information you need before choosing a resolution. It reproduces CouchDB’s winner ordering (generation first, hash tie-break) so you can verify your pipeline agrees with the CouchDB cluster. Set COUCH_URL and DOC_ID before running.

import os

import requests


def parse_rev(rev: str) -> tuple[int, str]:
    """Split a _rev token into its (generation, hash) sort key.

    CouchDB orders leaves by highest generation, then by the
    lexicographically highest hash suffix as a deterministic tie-break.
    """
    gen, _, digest = rev.partition("-")
    return int(gen), digest


def inspect_mvcc(db_url: str, doc_id: str) -> dict:
    """Return the MVCC state of a document: read-winner, leaves, conflicts."""
    winner = requests.get(
        f"{db_url}/{doc_id}", params={"conflicts": "true"}, timeout=10
    ).json()
    conflicts = winner.get("_conflicts", [])

    # All open leaf revisions = the current read-winner plus every conflict.
    leaf_revs = [winner["_rev"], *conflicts]
    # Recompute the winner CouchDB would serve, independently of the response.
    computed_winner = max(leaf_revs, key=parse_rev)

    leaves = []
    for rev in leaf_revs:
        body = requests.get(
            f"{db_url}/{doc_id}", params={"rev": rev}, timeout=10
        ).json()
        gen, digest = parse_rev(rev)
        leaves.append({"rev": rev, "generation": gen, "body": body})

    return {
        "doc_id": doc_id,
        "read_winner": winner["_rev"],
        "computed_winner": computed_winner,
        "conflicts": conflicts,
        "leaf_count": len(leaf_revs),
        "leaves": leaves,
    }


if __name__ == "__main__":
    url = os.environ.get("COUCH_URL", "http://localhost:5984/iot_telemetry")
    doc = os.environ.get("DOC_ID", "sensor-42")
    state = inspect_mvcc(url, doc)

    print(f"document:        {state['doc_id']}")
    print(f"read-winner:     {state['read_winner']}")
    print(f"computed-winner: {state['computed_winner']}")
    assert state["read_winner"] == state["computed_winner"], "winner mismatch"

    if not state["conflicts"]:
        print("status:          converged (no conflict leaves)")
    else:
        print(f"status:          {state['leaf_count']} divergent leaves")
        for leaf in state["leaves"]:
            marker = " <-- winner" if leaf["rev"] == state["computed_winner"] else ""
            print(f"  gen {leaf['generation']:>3}  {leaf['rev']}{marker}")

Because the script only reads, it is safe to run against production during an incident. Once it confirms which leaves exist, feed the same document into your resolver: apply your chosen merge policy, then commit the survivor plus loser tombstones in one _bulk_docs batch. Wrap the write path in the retry discipline from error handling & retry logic, because a stale read-winner _rev between inspection and commit is answered with an HTTP 409.

Gotchas & Edge Cases

Generation is depth, not time. A 4-<hash> leaf is not automatically newer in wall-clock terms than a 3-<hash> sibling on another branch — it only had one more edit along its own lineage. MVCC encodes causality within a branch, never a global clock, so temporal resolution requires an application timestamp field.
The winner can flip after a delete. Tombstoning the current read-winner promotes the next-highest leaf to winner. If you delete losers and the winner in the same careless batch, the document can resurrect an older branch. Always tombstone only the leaves you intend to discard.
_revs_limit prunes internal history, not conflict leaves. Lowering _revs_limit (default 1000) via PUT /{db}/_revs_limit trims non-leaf ancestors and reduces tree-traversal cost, but it never removes a conflicting leaf. Those survive compaction until explicitly tombstoned.
Deleted-vs-active branch collisions stall sync. When a _deleted: true tombstone on one branch races an active edit on another, MVCC keeps both. Pruning intermediate revisions before resolving this can make the winning branch unreachable and freeze replication for that document.
Attachments are per-revision. If a losing leaf carries an _attachment the winner lacks, discarding that branch discards the attachment. Read ?attachments=true and copy it onto the survivor before committing if it matters.

Verification & Observability

Confirm resolution at three levels. First, per document: re-read with ?conflicts=true and assert the _conflicts key is gone — the authoritative check that MVCC has converged to a single leaf. Second, watch the CouchDB cluster: GET /_active_tasks should show doc_write_failures flat and the replication backlog draining, while GET /_scheduler/jobs confirms the job is running rather than crashing. Third, emit your own metrics — conflicts-resolved count, resolution latency (leaf detected to committed _rev), and the fraction of documents whose conflict count is rising — and alert when a resolution sweep fails to drop the conflict rate, which usually means winners are being written without their tombstones. Schedule POST /{db}/_compact in low-traffic windows to physically reclaim the storage that pruned branches and tombstones leave behind; compaction is what turns a cleared MVCC conflict into recovered disk. When timestamp drift or missing metadata makes a branch genuinely unresolvable, do not force a lossy overwrite — route it through fallback resolution chains and into manual review sync queues with full audit context.

FAQ

Why did CouchDB keep both writes instead of returning a conflict error?

Because MVCC is append-only and never blocks on a lock. When two writers edit the same _id on disconnected nodes, each write succeeds locally against its own _rev and appends a new revision. Replication then carries both branches into the same database, where MVCC retains them as sibling leaves and marks the non-winning ones in _conflicts. You only see a 409 when you write against a _rev that is no longer current on the same node — cross-node divergence is preserved as data, not rejected.

How does CouchDB decide which revision I read by default?

It is fully deterministic and identical on every replica: the leaf with the highest generation number wins, and ties are broken by choosing the lexicographically highest _rev hash. This ordering is topological, not chronological — it guarantees every node converges on the same read-winner without coordination, but it says nothing about which write happened later in real time.

Does lowering _revs_limit reduce my conflict count?

No. _revs_limit bounds how much non-leaf ancestry each branch keeps; it lowers metadata overhead and speeds tree traversal, but it does not touch conflict leaves. A conflict is two or more open leaves, and only tombstoning the losers removes one. Set _revs_limit for storage economy, and resolve conflicts explicitly for correctness.

Part of: Sync Topology Models

Understanding MVCC in CouchDB Replication #

Immediate Triage & Prerequisites #

Step-by-Step Implementation #

Complete Working Example #

Gotchas & Edge Cases #

Verification & Observability #

FAQ #

Related #