CouchDB Replication Architecture & Revision Fundamentals

CouchDB’s replication engine is built for environments where connectivity is intermittent, latency is unpredictable, and data locality matters. Rather than relying on distributed locks or two-phase commit, it implements an asynchronous, multi-version concurrency control (MVCC) model that trades strict consistency for high availability and partition tolerance. For edge/IoT deployments, mobile backend engineers, Python sync pipeline architects, and distributed systems teams, mastering the underlying revision mechanics and replication topology is a prerequisite for building resilient, production-grade synchronization infrastructure — infrastructure that stays correct when a device has been offline for three weeks and then rejoins a fleet of ten thousand peers. This guide is the architectural foundation for the rest of this section: it explains how revisions are minted, how the _changes feed drives synchronization, how conflicts are generated and won, and how to operationalize all of it in Python.

The replication protocol end to end: read the source's _changes feed, negotiate missing revisions via _revs_diff, transfer them with _bulk_docs, and checkpoint to _local documents — while the target's revision tree preserves both the winning and conflicting leaves.

Core Concept & CouchDB Mechanics

Every document in CouchDB carries a _rev field that encodes both a generation counter and an MD5 content hash (e.g., 3-9a8b7c6d...). This identifier is not a sequential version number; it is a content digest of the document state at the point of the write, computed on the node performing that write. It is not guaranteed to be reproducible across independent nodes — CouchDB does not mandate a portable JSON canonicalization, so two nodes writing the same content can produce different hashes. The exact algorithm and its portability caveats are covered in depth under revision tree mechanics, and are worth internalizing before you write any custom merge code.

When a document is updated, CouchDB appends a new revision to a directed acyclic graph known as the revision tree rather than overwriting the previous state. The tree preserves historical lineage, enabling deterministic conflict detection when divergent update paths converge across disconnected nodes. In production, revision tree depth, pruning thresholds (_revs_limit), and compaction schedules directly impact replication throughput and disk I/O, particularly on resource-limited edge gateways where a runaway tree can silently exhaust flash storage.

MVCC: append, never overwrite

The MVCC contract is simple to state and easy to underestimate: a write never mutates an existing revision in place. Instead it creates a new leaf whose parent is the _rev the client supplied. If the supplied _rev is not a current leaf, CouchDB rejects the write with 409 Conflict. This is the mechanism that makes optimistic concurrency safe on a single node — but during replication, CouchDB deliberately bypasses the 409 check and writes both branches, because the whole point of replication is to move a foreign node’s leaf into the local tree without losing it. Understanding this asymmetry (409 on interactive writes, silent branching on replicated writes) is the single most important mental model on this page. The behavior of MVCC across a live sync session is explored further in understanding MVCC in CouchDB replication.

The `_changes` feed and the replication protocol

CouchDB replication is driven by the _changes feed, an append-only, monotonically increasing sequence log of document mutations. Replication workers consume this feed to synchronize state between source and target using a small, well-defined set of HTTP endpoints:

The handshake runs in a fixed order every cycle. _revs_diff keeps bandwidth proportional to genuine deltas, new_edits=false inserts foreign revisions verbatim, and the _local checkpoint lets an interrupted job resume from the last verified sequence.

The protocol proceeds in a fixed order. The worker reads a batch of changed rows from the source’s _changes feed, collects the {id: [rev, ...]} map, and POSTs it to the target’s _revs_diff endpoint. The target replies with only the revisions it is missing, which keeps bandwidth proportional to genuine deltas rather than to database size. The worker then fetches those revisions (with ?revs=true&latest=true so ancestry travels with the document) and streams them into the target via _bulk_docs with new_edits=false, which is the flag that instructs CouchDB to insert the revision exactly as supplied instead of minting a fresh generation. Finally it records the last processed sequence in a _local/ checkpoint document on both endpoints, so an interrupted job resumes from the last verified sequence rather than rescanning the entire feed.

Modern deployments declare these jobs as documents in the _replicator database, enabling continuous, checkpointed synchronization with automatic retry, exponential backoff, and state recovery. The full field-by-field contract for those documents lives in the _replicator document schema, and the broader operational framework around them is the subject of _replicator Configuration & Sync Pipeline Management. For Python sync pipeline builders, the _changes feed can also be consumed directly via its longpoll, continuous, or eventsource (Server-Sent Events) modes, allowing custom transformation, validation, or routing logic before data reaches downstream systems. The architecture decouples data producers from consumers, ensuring that network partitions or target outages do not corrupt source state or stall upstream ingestion.

Which endpoints your workers hit, and in what fan-out pattern, is a function of sync topology models: a hub-and-edge star, a peer mesh, or a tiered aggregation tree each distribute replication workers differently across regional aggregators, mobile clients, and edge nodes, directly influencing latency profiles and bandwidth consumption.

Production Detection & Configuration Pipeline

You cannot resolve conflicts you cannot see. The first operational capability any sync platform needs is a reliable way to surface divergence at scale, and the _changes feed is the canonical detection surface. The trick is to request the feed with conflicts=true and include_docs=true, then treat any document whose _conflicts array is non-empty as a work item. Note that _conflicts is a computed field: it does not exist on disk and only appears when you ask for it, so a naive _all_docs scan will never reveal a conflict.

The following worker polls a continuous feed, emits a conflict-rate metric, and hands each conflicted document to a resolver callback. It is intentionally dependency-light (standard library plus requests) and safe to run as a long-lived process:

import json
import logging
import time
from typing import Callable, Iterator

import requests

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger("conflict-detector")


class ConflictDetector:
    """Stream a CouchDB _changes feed and surface documents that carry conflicts.

    Emits a conflict-rate signal on every batch so operators can alert when a
    replication storm starts generating divergence faster than resolvers clear it.
    """

    def __init__(self, base_url: str, db: str, since: str = "now") -> None:
        self.feed_url = f"{base_url.rstrip('/')}/{db}/_changes"
        self.since = since
        self.session = requests.Session()

    def _stream(self) -> Iterator[dict]:
        params = {
            "feed": "continuous",     # hold the connection open, one JSON row per line
            "since": self.since,      # resume point; persist this to survive restarts
            "include_docs": "true",   # we need the body to inspect _conflicts
            "conflicts": "true",      # ask CouchDB to compute the _conflicts array
            "heartbeat": "10000",     # blank line every 10s so we can detect a dead socket
        }
        with self.session.get(self.feed_url, params=params, stream=True, timeout=(5, 60)) as resp:
            resp.raise_for_status()
            for line in resp.iter_lines(decode_unicode=True):
                if not line:
                    continue  # heartbeat keep-alive, not a change row
                yield json.loads(line)

    def run(self, resolve: Callable[[dict], None]) -> None:
        seen, conflicted = 0, 0
        for row in self._stream():
            if "seq" in row:
                self.since = row["seq"]  # advance the resume checkpoint
            doc = row.get("doc")
            if not doc:
                continue
            seen += 1
            if doc.get("_conflicts"):
                conflicted += 1
                log.warning("conflict detected id=%s leaves=%d",
                            doc["_id"], len(doc["_conflicts"]) + 1)
                resolve(doc)
            if seen % 100 == 0:
                rate = conflicted / seen if seen else 0.0
                log.info("conflict_rate=%.4f seen=%d", rate, seen)


if __name__ == "__main__":
    detector = ConflictDetector("http://127.0.0.1:5984", "iot_telemetry", since="now")
    detector.run(resolve=lambda d: log.info("would resolve %s", d["_id"]))

The metrics this loop emits — conflict_rate, seen, and per-document leaf counts — are the raw material for dashboards and alerts. Wire them to your time-series backend the same way the async monitoring & webhooks patterns wire replication state transitions, and set an alert threshold on sustained conflict_rate growth. A rising rate almost always maps to a specific document class — a shared configuration record, a counter, or a hot telemetry aggregate — and knowing which class is diverging is the difference between a five-minute fix and an afternoon of guessing. The taxonomy of what generates those collisions in the first place is catalogued under conflict generation models.

Deterministic Resolution Architecture

Conflicts in CouchDB are not exceptions; they are deterministic outcomes of concurrent writes to the same document across disconnected nodes. When two or more revisions share the same parent but diverge in content, CouchDB retains all branches rather than arbitrarily discarding one. The database automatically designates a “winning” revision — by the highest generation number, then the lexicographically highest revision hash as a tiebreaker — but this is purely a presentation convenience for single-document reads and never deletes the losing branches. Left alone, those losing leaves accumulate forever, bloating the revision tree and quietly corrupting any read that assumes the winner is authoritative.

True resolution requires explicit application logic: inspect every conflicting branch, apply a domain-specific merge rule, write one consolidated revision, and delete the losing leaves to clear the conflict. The canonical pattern is a resolver that separates strategy (how to merge two bodies) from mechanism (how to commit the winner and tombstone the losers), so the merge rule can be swapped without touching the transactional plumbing:

import logging
from typing import Callable, Sequence

import requests

log = logging.getLogger("resolver")

# A merge strategy takes all leaf bodies and returns the single merged body.
MergeStrategy = Callable[[Sequence[dict]], dict]


class ConflictResolver:
    """Commit a deterministic winner for a conflicted document and tombstone losers."""

    def __init__(self, base_url: str, db: str, strategy: MergeStrategy) -> None:
        self.db_url = f"{base_url.rstrip('/')}/{db}"
        self.strategy = strategy
        self.session = requests.Session()

    def _leaves(self, doc_id: str, revs: Sequence[str]) -> list[dict]:
        """Fetch the full body of every conflicting leaf revision."""
        bodies = []
        for rev in revs:
            r = self.session.get(f"{self.db_url}/{doc_id}", params={"rev": rev}, timeout=10)
            r.raise_for_status()
            bodies.append(r.json())
        return bodies

    def resolve(self, doc: dict) -> None:
        doc_id = doc["_id"]
        winner_rev = doc["_rev"]                      # CouchDB's presentation winner
        loser_revs = doc.get("_conflicts", [])
        if not loser_revs:
            return
        all_leaves = [doc] + self._leaves(doc_id, loser_revs)

        merged = self.strategy(all_leaves)            # pure function: no I/O, easy to test
        merged["_id"] = doc_id
        merged["_rev"] = winner_rev                   # extend the current winning branch

        # Bulk write: new merged winner + explicit tombstones for every loser.
        ops = [merged] + [
            {"_id": doc_id, "_rev": rev, "_deleted": True} for rev in loser_revs
        ]
        r = self.session.post(f"{self.db_url}/_bulk_docs", json={"docs": ops}, timeout=15)
        r.raise_for_status()
        log.info("resolved %s: merged 1, tombstoned %d", doc_id, len(loser_revs))


def field_union(leaves: Sequence[dict]) -> dict:
    """Example strategy: last-writer-wins per field, ordered by an app timestamp."""
    ordered = sorted(leaves, key=lambda d: d.get("updated_at", ""))
    merged: dict = {}
    for leaf in ordered:
        for key, value in leaf.items():
            if key.startswith("_"):
                continue  # never merge CouchDB metadata fields
            merged[key] = value
    return merged


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    resolver = ConflictResolver("http://127.0.0.1:5984", "iot_telemetry", field_union)
    sample = {"_id": "device-42", "_rev": "4-aaa",
              "_conflicts": ["4-bbb"], "updated_at": "2026-07-04T00:00:00Z"}
    log.info("resolver ready for %s", sample["_id"])

The resolver above deletes losers by writing an explicit _deleted: true tombstone against each conflicting _rev in the same _bulk_docs call that commits the winner — atomic from the client’s perspective and idempotent if replayed. The field_union merge function is deliberately trivial; production strategies are a design space of their own. Choosing the right one — last-write-wins, field union, numeric CRDT counters, or a hand-written domain merge — is exactly what algorithm selection for merge exists to guide, and the concrete last-write-wins recipe is written up in implementing last-write-wins in CouchDB. Anything the resolver cannot safely reconcile automatically should never be force-merged; it belongs in a human workflow, which is the role of manual review sync queues. The full menu of detection and resolution tactics is organized under Conflict Detection & Automated Resolution Strategies.

Declarative Automation & Rule Routing

Hard-coding merge logic inside a Python class works for one document type and collapses the moment a fleet carries dozens. The scalable pattern is to externalize the decision — which strategy applies to which document class — into declarative configuration that operators can change without a redeploy. A small routing table keyed on document type (or a doc_ns namespace) lets you map each class to a named strategy and a fallback, turning conflict handling into data rather than code.

Routing turns conflict handling into data: the router matches each document to a named strategy, commits the merged winner on success, and walks a fallback chain that ends at a manual review queue when no rule can safely reconcile the branches.

# conflict-routing.yaml — evaluated top to bottom, first match wins
rules:
  - match: { type: "sensor_reading" }
    strategy: crdt_counter        # numeric aggregates: sum the divergent deltas
    fallback: last_write_wins
  - match: { type: "device_config" }
    strategy: field_union         # config: merge non-overlapping field edits
    fallback: manual_review       # overlapping edits are too risky to auto-merge
  - match: { type: "user_profile" }
    strategy: last_write_wins
    fallback: manual_review
  - match: {}                     # catch-all
    strategy: last_write_wins
    fallback: manual_review

The router reads this file, selects a strategy per document, and — critically — defines what happens when the primary strategy declines. That escalation sequence is a first-class concept: a resolver that cannot merge should not silently drop data, it should walk an ordered fallback resolution chains until one link succeeds, with a human queue as the terminal link. The rule engines that evaluate these tables against live documents are the subject of auto-merge rule engines, and the Python building blocks for the individual strategies are documented in custom conflict resolver functions in Python. Externalizing the policy this way keeps the transactional resolver from the previous section stable while operators tune routing in response to whatever the conflict_rate dashboards reveal.

Failure Modes & Escalation Paths

Replication fails in characteristic ways, and each has a distinct detection signal and fallback order. Treating them generically — “the sync is broken” — guarantees slow incidents; naming them makes response mechanical.

Clock skew and NTP drift. Any last-write-wins strategy that trusts an application updated_at timestamp is only as correct as the clocks that produced it. A device whose RTC is an hour behind will lose every conflict it participates in, silently discarding its own writes. Detection signal: a single node or device class that always loses. Fallback: prefer a monotonic sequence or vector clock over wall time, or route the affected class to manual review. Clock-induced conflict storms are a recurring failure pattern rooted in the conflict generation models that govern how divergence is produced.
Schema mismatch across versions. When an old edge build and a new central build disagree on a document’s shape, a field-union merge can resurrect a field the new schema removed, or drop one it added. Detection signal: validation errors spiking immediately after a fleet rollout. Fallback: version-gate the merge strategy and quarantine documents whose schema_version the resolver does not recognize.
Network storms and retry loops. A misconfigured continuous job that keeps crashing and restarting can hammer _revs_diff in a tight loop, exhausting connections cluster-wide. Detection signal: crashing state transitions climbing in _scheduler/docs and connection-pool saturation. Fallback: exponential backoff with jitter, a circuit breaker per source, and the retry discipline codified in error handling & retry logic.
Interactive 409 Conflict under load. When your resolver and a live client both try to extend the same branch, one gets a 409. This is expected optimistic-concurrency behavior, not corruption. Detection signal: 409 rates on your own writes. Fallback: refetch the current _rev, re-run the merge, retry — the exact loop detailed in handling 409 conflicts in replication jobs.
Credential and access-boundary failures. A replication user that loses read access to the source, or write access to the target, will stall a job without corrupting data — but also without progressing. Detection signal: 401/403 in job error/reason fields. Fallback: route permanent denials to a dead-letter queue rather than retrying forever, as covered in Security Boundaries in Replication.

The unifying rule across every mode: a resolver or worker that hits a case it cannot safely handle escalates rather than improvises. Silent auto-merge of an ambiguous conflict is worse than a stalled job, because a stall is visible and a bad merge is not.

Operational Hardening

Getting replication correct in a lab is table stakes; keeping it correct across a fleet under adverse networks is the real work. Several hardening measures compound to make a sync platform durable.

Bound the revision tree. Set _revs_limit deliberately (the default of 1000 is generous for high-churn documents) and schedule compaction for low-traffic windows. On flash-constrained edge gateways, an unbounded tree is a slow-motion outage; the pruning tactics are detailed under revision tree mechanics and you can inspect a tree’s shape while debugging via visualizing revision trees in CouchDB.

Circuit-break flaky sources. Wrap each source in a breaker that opens after a threshold of consecutive failures, so one dead edge node cannot drag down the scheduler. Pair the breaker with capped exponential backoff and jitter so ten thousand devices reconnecting after a partition do not synchronize their retries into a thundering herd.

Cache and batch aggressively. Tune worker_batch_size and http_connections per replication document rather than accepting cluster defaults; a batch size matched to your average document size keeps _bulk_docs round-trips efficient without blowing the memory ceiling on a constrained gateway. Cache _revs_diff responses only within a single job cycle — never across cycles, because stale diff data will skip genuinely missing revisions.

Instrument everything. Poll _active_tasks and _scheduler/jobs for live throughput and state, mirror _replication_stats counters (docs_read, docs_written, doc_write_failures, missing_revisions_found) into a time-series store, and alert on the derivatives — a doc_write_failures slope, not just a raw value. Checkpoint drift, where source and target _local/ documents disagree on the last processed sequence, is a leading indicator of a stalled job and is worth a dedicated alert; the checkpoint monitoring pattern is in monitoring replication checkpoints via API.

Rehearse the emergency resync. When checkpoints are corrupt or a target has diverged beyond repair, the recovery path is to delete the job’s _local/ checkpoints and re-run from since=0, letting _revs_diff re-establish truth from the deltas. This is safe precisely because MVCC never overwrites — a full re-scan reconciles rather than clobbers — but it is I/O-heavy, so trigger it deliberately, not as a reflex. Whether the re-run should be continuous or one-shot depends on the trade-offs in continuous vs one-way sync, and the scripted version of the procedure is in automating continuous sync with Python scripts.

For authoritative implementation details on the replication protocol, checkpoint algorithm, and HTTP streaming semantics, consult the Apache CouchDB Replication Protocol documentation.

Conclusion

CouchDB’s replication model asks you to accept a single, load-bearing trade: it will never lose a write and never block on a partition, but in exchange it hands you the conflict to resolve. Everything in this section follows from that bargain — revisions are append-only leaves in a tree, the _changes feed is the detection surface, resolution is application logic that commits a winner and tombstones losers, and durability comes from bounding the tree, breaking flaky circuits, and instrumenting the sequence checkpoints. Teams that treat replication and conflict resolution as a first-class operational concern, with named failure modes and declarative escalation paths, ship synchronization infrastructure that scales from a single intermittently-connected sensor to a globally distributed mobile fleet. The pages below drill into each mechanism; start with whichever one is on fire.

Revision Tree Mechanics — how _rev IDs are minted and how the DAG stores lineage.
Conflict Generation Models — the patterns that produce divergence in the first place.
Sync Topology Models — star, mesh, and tiered fan-out and their latency profiles.
Security Boundaries in Replication — credential scope, _security, and revision authority.
Conflict Detection & Automated Resolution Strategies — the merge-strategy and routing companion section.
_replicator Configuration & Sync Pipeline Management — declaring, monitoring, and hardening replication jobs.

Part of: CouchDB Replication Conflict Resolution & Sync Automation

CouchDB Replication Architecture & Revision Fundamentals #

Core Concept & CouchDB Mechanics #

MVCC: append, never overwrite #

The _changes feed and the replication protocol #

Production Detection & Configuration Pipeline #

Deterministic Resolution Architecture #

Declarative Automation & Rule Routing #

Failure Modes & Escalation Paths #

Operational Hardening #

Conclusion #

Related #