Writing Custom Conflict Resolver Functions in Python for CouchDB Replication

You have a continuous replication job fanning documents in from edge devices, and reads on the central database are now returning a non-empty _conflicts array — sometimes several revisions deep. CouchDB will not merge those branches for you; it only picks a revision to return on reads (highest generation, then the lexicographically highest revision hash) and leaves every losing leaf in place until an application deletes it. This page walks through building the Python function that does that work: a resolver that takes a base document plus its conflicting revisions, computes one canonical winner, and tombstones the losers atomically. It is the implementation detail behind the auto-merge rule engines that sit downstream of your replicator, and it assumes you already understand which strategy to apply — if you do not, start with algorithm selection for merge and come back to wire it up here.

Immediate Triage & Prerequisites

Before writing a line of resolver code, isolate the revision-tree divergence so you know what your function actually has to reconcile. Query the feed with ?include_docs=true&conflicts=true&feed=longpoll to capture the exact document state triggering the failure, and inspect the conflicts array length — a depth exceeding three usually means repeated offline edits from clients operating under clock skew or intermittent connectivity, not a code bug. The shape of that tree is governed by revision tree mechanics; the concurrency patterns that produce it are catalogued in conflict generation models.

Confirm the following before you proceed:

Python 3.10+ and httpx (or requests) — the examples below use httpx for its sync/async parity.
Admin or an equivalent database role, because tombstoning losing leaves and writing to _replicator requires write access to every branch.
A running replication job. Verify it with curl "$COUCH/_scheduler/jobs" (or GET /_active_tasks) and check that the doc_write_failures counter is not climbing — a rising count means writes are already being rejected, and your resolver will inherit those 409s. Deploy the job with the standard _replicator document schema.

Cross-reference the CouchDB log for doc_update_conflict errors alongside your worker’s structured JSON logs, filtering for a resolver_status=error field. If you see HTTP 409 during resolution, verify the resolver is extracting the _rev of the current winner and marking losing branches with _deleted: true — misaligned revision tokens abort the checkpoint and force a full resync.

Step-by-Step Implementation

Build the resolver in four verifiable steps. Each step ends with a check you can run before moving on.

1. Fetch the full branch set. A conflicted read gives you the winner’s body plus a list of losing _rev strings, not their contents. Fetch each losing leaf explicitly with ?rev=:

winner = client.get(f"{db}/{doc_id}", params={"conflicts": "true"}).json()
losers = [client.get(f"{db}/{doc_id}", params={"rev": r}).json()
          for r in winner.get("_conflicts", [])]
branches = [winner, *losers]

Verify: assert len(branches) == 1 + len(winner.get("_conflicts", [])).

2. Compute a deterministic winner. The merge must be a pure function of its inputs so that every node reaches the same result. Prefer an embedded application timestamp or a logical counter over the _rev generation number, which encodes edit count, not edit order:

def merge(branches: list[dict]) -> dict:
    return max(branches, key=lambda d: d.get("updated_at", ""))

Verify: run merge(branches) is merge(list(reversed(branches)))-equivalent by asserting the result’s updated_at is the maximum across all branches.

3. Rebase the merged body onto the current winner. Write the merged fields under the winner’s _rev so CouchDB accepts the update as a normal edit rather than a new conflicting branch:

merged = {k: v for k, v in merge(branches).items() if not k.startswith("_")}
merged["_id"] = doc_id
merged["_rev"] = winner["_rev"]

Verify: assert merged["_rev"] == winner["_rev"] and no other underscore-prefixed key survives.

4. Commit the winner and tombstone the losers in one batch. This is the step that actually clears the conflict. Writing only a new winner adds a leaf; the document stays conflicted until each losing _rev is deleted:

batch = {"docs": [merged, *({"_id": doc_id, "_rev": r, "_deleted": True}
                            for r in winner.get("_conflicts", []))]}
resp = client.post(f"{db}/_bulk_docs", json=batch)

Verify: re-read the document with ?conflicts=true and assert the _conflicts key is gone. If a 409 comes back for any row, re-read the winner and rebuild the batch against the fresh _rev — do not blindly retry the stale batch. That retry discipline is shared with error handling & retry logic and the specifics of the status code are covered in handling 409 conflicts in replication jobs.

Complete Working Example

The script below is self-contained and runnable. It streams the _changes feed, resolves each conflicted document through a pluggable resolve_fn, commits winner-plus-tombstones atomically, and escalates anything the function declines to merge. Instantiate the resolver per batch — never share mutable global state across documents, or you will cross-contaminate _rev chains.

import json
import logging
import os
import time
from typing import Callable, Optional

import httpx

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger("custom-resolver")


def lww_resolver(branches: list[dict]) -> Optional[dict]:
    """Last-write-wins by application timestamp; returns None to escalate."""
    dated = [b for b in branches if b.get("updated_at")]
    if not dated:                      # no basis to decide -> manual review
        return None
    return max(dated, key=lambda d: d["updated_at"])


class ConflictResolver:
    """Drain conflicts off the _changes feed and reconcile them in CouchDB."""

    def __init__(self, db_url: str, resolve_fn: Callable[[list[dict]], Optional[dict]],
                 max_retries: int = 5):
        self.db = db_url.rstrip("/")
        self.resolve_fn = resolve_fn
        self.max_retries = max_retries
        self.client = httpx.Client(timeout=30)

    def _branches(self, doc_id: str, winner: dict) -> list[dict]:
        losers = [self.client.get(f"{self.db}/{doc_id}", params={"rev": r}).json()
                  for r in winner.get("_conflicts", [])]
        return [winner, *losers]

    def resolve(self, doc_id: str, winner: dict) -> str:
        branches = self._branches(doc_id, winner)
        chosen = self.resolve_fn(branches)
        if chosen is None:
            log.warning("escalating %s to manual review", doc_id)
            return "manual_review"        # caller routes to the review queue

        merged = {k: v for k, v in chosen.items() if not k.startswith("_")}
        merged.update(_id=doc_id, _rev=winner["_rev"])
        tombstones = [{"_id": doc_id, "_rev": r, "_deleted": True}
                      for r in winner.get("_conflicts", [])]
        return self._commit({"docs": [merged, *tombstones]}, doc_id)

    def _commit(self, batch: dict, doc_id: str) -> str:
        for attempt in range(1, self.max_retries + 1):
            resp = self.client.post(f"{self.db}/_bulk_docs", json=batch)
            if resp.status_code != 409:
                log.info("resolved %s (attempt %d)", doc_id, attempt)
                return "resolved"
            time.sleep(min(2 ** attempt, 30))  # exponential backoff on 409
            fresh = self.client.get(f"{self.db}/{doc_id}",
                                    params={"conflicts": "true"}).json()
            batch["docs"][0]["_rev"] = fresh["_rev"]  # rebase onto fresh winner
        raise RuntimeError(f"exhausted retries resolving {doc_id}")

    def run(self, since: str = "now") -> None:
        params = {"feed": "continuous", "since": since, "style": "all_docs",
                  "include_docs": "true", "conflicts": "true", "heartbeat": "10000"}
        with self.client.stream("GET", f"{self.db}/_changes",
                                params=params, timeout=None) as r:
            for line in r.iter_lines():
                if not line:               # heartbeat keep-alive
                    continue
                doc = json.loads(line).get("doc") or {}
                if doc.get("_conflicts"):  # only conflicted docs carry this key
                    self.resolve(doc["_id"], doc)


if __name__ == "__main__":
    url = os.environ.get("COUCH_URL", "http://localhost:5984/iot_telemetry")
    ConflictResolver(url, lww_resolver).run()

Swap lww_resolver for a field-union or CRDT function to change the merge semantics without touching the plumbing — the exact trade-offs are laid out in implementing Last-Write-Wins in CouchDB. When resolve() returns "manual_review", hand the document to your fallback resolution chains so it lands in the manual review sync queues with a full audit trail rather than being silently overwritten.

Gotchas & Edge Cases

Clock skew corrupts LWW silently. Wall-clock timestamps from unsynced edge devices can order edits backwards. Synchronise with NTP (or PTP for tighter bounds), parse every timestamp as timezone-aware UTC, and prefer hybrid logical clocks when correctness matters more than simplicity.
Dropped _attachments resurrect or vanish. If a branch carries a binary attachment, a naive field-only merge drops the _attachments stub and the next replication deletes the blob. Copy the _attachments metadata from the branch you keep, or fetch with ?attachments=true and re-post the stubs.
Revision depth limits. Deep conflict trees near _revs_limit (default 1000) can be truncated by compaction before your resolver runs. Resolve promptly and schedule POST /db/_compact only in low-traffic windows, never mid-resolution.
Tombstones are still revisions. Deleting a losing leaf writes a _deleted revision; it does not free space until compaction. A database that conflicts and resolves constantly will grow until you compact it.
Re-processing must be idempotent. During network flapping the same conflict can appear on the feed more than once. A second pass over an already-resolved document must be a no-op — check for the _conflicts key before acting, and never assume you are the only worker on the stream.

Verification & Observability

Confirm the resolver is actually clearing conflicts, not just running. The direct check is a re-read: curl "$COUCH/iot_telemetry/<doc_id>?conflicts=true" should return a document with no _conflicts key. To measure the fleet, run a temporary view that emits every document whose _conflicts array is non-empty (CouchDB ships no built-in conflicts view) and watch the count trend toward zero after deployment.

At the pipeline level, poll GET /_scheduler/jobs and GET /_active_tasks to confirm the replication job stays in the running state with a flat doc_write_failures counter — a climbing count means your _bulk_docs writes are being rejected upstream of the resolver. Emit three application metrics from the code itself: resolution latency (feed-detected timestamp to committed _rev), escalation rate (share of documents returning "manual_review"), and retry depth per commit. A rising escalation rate points at data your merge function cannot decide; a rising retry depth points at write contention, usually two workers racing the same partition. Deep monitoring hooks for this live under the broader Conflict Detection & Automated Resolution Strategies framework.

FAQ

Why does my document stay conflicted after I write a merged version?

Because writing a merged version only adds another leaf to the revision tree. The document remains conflicted until every revision listed in its _conflicts array is deleted with a {"_id": id, "_rev": rev, "_deleted": true} tombstone. Send the merged winner and those tombstones together in one _bulk_docs batch so the conflict clears atomically.

Should the resolver run as one process or can I scale it out?

Run one instance per replication partition. Two workers draining the same stream race to tombstone the same losing leaves, which multiplies 409 retries and inflates the revision tree. Scale horizontally by sharding on the document namespace (the _id prefix), not by cloning workers onto a single feed.

What should the function do when it cannot decide a winner?

Return None (or your equivalent sentinel) so the caller escalates rather than forcing a lossy write. The document should flow through the fallback chain into a manual review queue tagged with a resolution_status of manual_review, preserving every branch until an operator reconciles it.

Part of: Auto-Merge Rule Engines

Writing Custom Conflict Resolver Functions in Python for CouchDB Replication #

Immediate Triage & Prerequisites #

Step-by-Step Implementation #

Complete Working Example #

Gotchas & Edge Cases #

Verification & Observability #

FAQ #

Related #