How CouchDB Revision IDs Are Generated

You are staring at a cascade of 409 Conflict responses, a growing replication backlog, and two nodes that disagree about which copy of a document is current — and it all traces back to a misunderstanding of the _rev field. In distributed IoT fleets, mobile backends, and edge-sync pipelines the revision ID is routinely mistaken for a sequence counter or a server-assigned UUID. It is neither. _rev is a content-derived lineage token that records a document’s position in its history and lets replication decide which revisions a node is still missing. This page explains exactly how CouchDB computes a revision identifier, how to read the generation counter and hash it encodes, and how to write against the correct _rev so your pipeline stops manufacturing phantom conflicts. It is the field-level companion to revision tree mechanics; if you have not yet mapped how those IDs assemble into a branching tree, start there and come back.

How a revision id is derived on the server: CouchDB hashes the document body, the _deleted flag, the attachment stubs and the parent _rev with MD5, renders the 16-byte digest as 32 hex characters, and prefixes the parent generation incremented by one — yielding N+1-<hash>. None of this is reproducible client-side.

Immediate Triage / Prerequisites

Before changing any code, confirm the symptom is genuinely a revision-tracking problem and not a topology or auth fault. Grep the CouchDB log for the write-conflict signature and correlate it with stalled jobs:

# 409s on writes surface as doc_update_conflict in the CouchDB log
grep "doc_update_conflict" /var/log/couchdb/couch.log | tail -n 20

# confirm the replication job that is choking, and on what sequence
curl -s http://localhost:5984/_active_tasks | \
  python3 -c "import sys,json;[print(t['doc_id'],t.get('changes_pending'),t.get('through_seq')) for t in json.load(sys.stdin) if t['type']=='replication']"

If those 409s pile up on a handful of document IDs, the pipeline is almost certainly writing against a stale _rev. Prerequisites for the steps below: Python 3.8+ and the requests library (pip install requests), plus network reach to the database. One rule to internalize before you write a single line: you cannot compute a valid _rev on the client. CouchDB does not specify a portable JSON canonicalization, so two servers writing the “same” content are not guaranteed to produce the same digest — always fetch the current _rev from the server with a GET or HEAD before issuing a conditional write.

Step-by-Step Implementation

Follow these steps to read, decode, and correctly extend a document’s revision lineage. Each step includes a command or assertion so you can verify state before proceeding.

Fetch the current revision. Issue a HEAD to read the winning _rev from the ETag header without transferring the body:
```
curl -sI http://localhost:5984/iot_telemetry/sensor-42 | grep -i etag
# ETag: "3-c20e9f1d8a5b4e2f7c1a0b6d3e8f9a12"
```
Verify the header is present and quoted — an empty ETag means the document does not exist yet, so your write must omit _rev entirely.
Decode the N-<hash> structure. A revision ID strictly follows the N-<hash> format: N is the integer generation count (depth in the tree) and <hash> is a 32-character hex encoding of a 16-byte MD5 digest. Split it and assert the shape:
```
rev = "3-c20e9f1d8a5b4e2f7c1a0b6d3e8f9a12"
generation, digest = rev.split("-", 1)
assert generation.isdigit() and len(digest) == 32
```
The generation tells you how many writes deep this branch is; the digest is what makes the revision content-addressable so replicas can agree on identity without a coordinator.
Understand what the server hashes. When CouchDB accepts your write it computes the next digest on the server from the new document content and metadata — the body, the _deleted flag, attachment stubs, and the parent revision ID — not from your raw HTTP payload. The MD5 primitive itself is specified in RFC 1321. The resulting revision is N+1-<newhash>, extending the branch by one generation. There is nothing for you to reproduce here; your only job is to supply the correct parent _rev.
Write conditionally against the fetched _rev. Put the new body with the parent revision so CouchDB extends the winning branch instead of forking it:
```
curl -s -X PUT http://localhost:5984/iot_telemetry/sensor-42 \
  -H "If-Match: 3-c20e9f1d8a5b4e2f7c1a0b6d3e8f9a12" \
  -d '{"reading": 21.4, "updated_at": "2026-07-04T10:00:00Z"}'
# {"ok":true,"id":"sensor-42","rev":"4-a91f..."}
```
Verify the response contains "ok":true and a rev whose generation is exactly one higher. A 409 here means another writer advanced the document first — re-fetch and retry.
Inspect lineage when a conflict already exists. If the document has forked, enumerate the history and the competing leaves so you can see where the pipeline lost the authoritative _rev:
```
# status of each ancestor along the winning path
curl -s "http://localhost:5984/iot_telemetry/sensor-42?revs_info=true"
# every live leaf, including conflicting branches
curl -s "http://localhost:5984/iot_telemetry/sensor-42?open_revs=all"
```
A generation gap between leaves pinpoints where divergence began. Resolving that fork — merging the leaves and tombstoning the losers — is covered by revision tree mechanics; for a visual walkthrough of these endpoints see visualizing revision trees in CouchDB.

Complete Working Example

The script below is self-contained and runnable. It reads the current _rev, decodes the generation and digest, performs a conditional update, and — critically — retries safely on 409 by re-fetching the server’s revision rather than guessing one client-side.

import sys
import time

import requests


def parse_rev(rev: str):
    """Split an N-<hash> revision id into (generation:int, digest:str)."""
    gen, digest = rev.split("-", 1)
    if not gen.isdigit() or len(digest) != 32:
        raise ValueError(f"malformed _rev: {rev!r}")
    return int(gen), digest


def current_rev(db_url: str, doc_id: str, session: requests.Session):
    """Fetch the winning _rev from the ETag header, or None if the doc is new."""
    resp = session.head(f"{db_url}/{doc_id}", timeout=30)
    if resp.status_code == 404:
        return None
    resp.raise_for_status()
    return resp.headers["ETag"].strip('"')  # ETag is the quoted current _rev


def conditional_update(db_url: str, doc_id: str, fields: dict,
                       session: requests.Session, max_retries: int = 5):
    """Extend the winning branch by writing against the server's current _rev.

    Never computes a _rev locally: on 409 it re-reads the authoritative rev
    and retries with capped exponential backoff.
    """
    for attempt in range(1, max_retries + 1):
        rev = current_rev(db_url, doc_id, session)
        body = dict(fields)
        if rev is not None:
            body["_rev"] = rev  # parent revision -> CouchDB writes N+1-<hash>
            gen, _ = parse_rev(rev)
            print(f"parent generation {gen}; writing generation {gen + 1}")
        resp = session.put(f"{db_url}/{doc_id}", json=body, timeout=30)
        if resp.status_code == 409:
            backoff = min(2 ** attempt, 30)
            print(f"409 conflict; another writer advanced the doc, retry in {backoff}s")
            time.sleep(backoff)
            continue
        resp.raise_for_status()
        return resp.json()  # {"ok": true, "id": ..., "rev": "N+1-<hash>"}
    raise RuntimeError("exhausted retries; document is contended")


if __name__ == "__main__":
    db = "http://localhost:5984/iot_telemetry"
    s = requests.Session()
    result = conditional_update(
        db, "sensor-42",
        {"reading": 21.4, "updated_at": "2026-07-04T10:00:00Z"},
        s,
    )
    print(result)
    new_gen, new_digest = parse_rev(result["rev"])
    print(f"committed generation={new_gen} digest={new_digest[:8]}…")
    sys.exit(0)

Run it against a live database and it prints the parent generation, the newly committed generation, and the leading bytes of the fresh digest — proving the write extended the branch rather than branching it.

Gotchas & Edge Cases

Do not precompute _rev for idempotency. Because CouchDB’s JSON serialization is not portably specified, a digest you compute locally will not match the server’s and will be rejected. Always read the current revision first; treat _rev as opaque.
A new document must omit _rev. Sending a _rev for a document that does not exist yields a 409, not a create. Let current_rev return None and drop the field, as the example does.
Equal generations do not mean equal content. Two 3-<hash> leaves that share a parent are a genuine fork; CouchDB returns the lexicographically higher hash as the default winner, but the other leaf still exists. A higher generation number is not proof of “newer in wall-clock time” — generation is topological, not chronological. If you need time ordering, carry an application timestamp such as updated_at.
_revs_limit can erase the shared ancestor. If a replica falls behind by more revisions than _revs_limit retains (default 1000), CouchDB prunes the common ancestor and records an ordinary update as a conflict. Don’t lower it aggressively on frequently-diverging databases.
Attachments and the _deleted flag change the digest. The hash is computed over attachment stubs and the deletion flag too, so a tombstone (_deleted: true) is itself a new revision with its own N+1-<hash> — deletions extend the tree, they do not erase it.

Verification & Observability

Confirm the fix took hold at two levels: the individual document and the pipeline. For the document, re-read with ?revs_info=true and assert the winning path advanced by exactly one generation with no new missing ancestors. For the pipeline, watch the conflict rate fall:

# no live conflicting leaves should remain on the target doc
curl -s "http://localhost:5984/iot_telemetry/sensor-42?conflicts=true" | \
  python3 -c "import sys,json;d=json.load(sys.stdin);print('conflicts:',d.get('_conflicts',[]))"

# replication job is draining, not stalled
curl -s http://localhost:5984/_scheduler/jobs | \
  python3 -c "import sys,json;[print(j['id'],j['info'].get('changes_pending')) for j in json.load(sys.stdin)['jobs']]"

A healthy result is an empty _conflicts array on the document and a changes_pending value that trends toward zero on the _scheduler/jobs entry. Emit the count of doc_update_conflict log lines per minute as a metric; once your writers fetch-then-write correctly, that curve should flatten. If it does not, the residual 409s are usually two workers racing the same partition rather than stale-_rev writes — a topology problem, not a hashing one.

FAQ

Can I reconstruct a document's _rev hash on the client to avoid a round trip?

No. CouchDB does not define a portable JSON canonicalization, so the digest it computes from the serialized body, _deleted flag, attachment stubs, and parent revision is not reproducible off-server. Any locally computed _rev will be rejected. Always fetch the current revision with a GET or HEAD before writing, and treat the value as opaque.

What exactly does the N in N-<hash> mean?

N is the generation counter — the depth of that revision in the tree, incremented by one on every accepted write, including deletions. It is topological, not chronological: a higher N means “more writes deep on this branch,” not “later in wall-clock time.” If you need temporal ordering for conflict resolution, carry your own application timestamp field.

Two nodes wrote identical content but produced different _rev values — is that a bug?

No, that is expected. Replication never assumes independent nodes compute matching digests; it exchanges revision IDs and history through _revs_diff to find which revisions a target is missing and transfers only those. Divergent hashes for “the same” content simply become two leaves that a resolver merges, as described in revision tree mechanics.

Part of: Revision Tree Mechanics

How CouchDB Revision IDs Are Generated #

Immediate Triage / Prerequisites #

Step-by-Step Implementation #

Complete Working Example #

Gotchas & Edge Cases #

Verification & Observability #

FAQ #

Related #