How do I confirm the engine is keeping up?

Watch the gap between the feed's current last_seq and the engine's persisted checkpoint, resolution latency from detection to the winning commit, and the depth of the fallback queue.

Auto-Merge Rule Engines for CouchDB Replication Conflict Resolution & Sync Automation

Edge/IoT deployments and mobile backends routinely operate across partitioned, high-latency networks where concurrent document mutations are inevitable. CouchDB’s replication protocol intentionally avoids automatic field-level merging to preserve data integrity. Instead, it deterministically designates a winning revision (highest generation number, then the lexicographically highest revision hash as a tiebreaker) purely for the default read, while retaining every divergent leaf in the revision tree. Those losing leaves are surfaced as a computed _conflicts array when a document is read with ?conflicts=true. Production systems cannot rely on manual intervention at scale; they require an external auto-merge rule engine to intercept these conflicts, apply deterministic resolution logic, and push reconciled documents back into the replication stream. This guide details the exact _replicator configuration, a production-grade Python conflict resolver, deployment topology, and troubleshooting patterns for continuous sync automation, extending the routing and detection layer defined across the parent Conflict Detection & Automated Resolution Strategies framework.

An auto-merge rule engine sits downstream of the replicator: replication moves bytes between nodes and stacks divergent leaves onto the revision tree mechanics of each document, and the engine is the stateless process that reads those trees, decides a winner, and tombstones the losers. Keeping the two concerns separate is deliberate — the replicator must never block on business logic, and the engine must be able to crash, restart, and reprocess without corrupting data.

Configuration Schema & Required Parameters

Conflict resolution begins with a correctly declared _replicator document. The engine assumes replication is continuous and conflict-preserving, which is CouchDB’s default behaviour — no flag “enables” conflict retention, because losing leaves are never discarded on their own. Deploy the following document to establish a continuous, bidirectional-capable sync into a single database that the engine will monitor. For the full field reference and edge-node variants, see the canonical _replicator document schema.

{
  "_id": "edge_to_core_sync",
  "source": "http://edge-node-01:5984/iot_telemetry",
  "target": "http://core-cluster-01:5984/iot_telemetry",
  "continuous": true,
  "create_target": true,
  "user_ctx": {
    "name": "replication_svc",
    "roles": ["_admin"]
  }
}

The parameters below control whether the engine ever sees a complete, up-to-date revision tree:

Parameter	Type	Default	Effect on the merge engine
`source`	string / object	—	Database the divergent writes originate from. Object form carries `headers`/`auth` for authenticated edges.
`target`	string / object	—	Database the engine monitors and writes reconciled docs back into.
`continuous`	boolean	`false`	Must be `true`. Keeps a live `_changes` listener open so conflicts surface within seconds, not on the next one-shot run.
`create_target`	boolean	`false`	Set `true` when target databases are provisioned lazily during cluster scaling, to avoid replication stalls.
`user_ctx.roles`	array	`[]`	Role context replication runs under. `_admin` is required to write tombstones for losing leaves.
`doc_ids`	array	unset	Narrows the stream to specific documents. Mutually exclusive with `filter` and `selector` — set at most one.
`filter`	string	unset	`ddocname/filtername` design-document filter. Omit to replicate the whole database.
`selector`	object	unset	Mango selector applied server-side. Cheaper than a JS `filter` but still mutually exclusive with it.

Two operational notes matter here. First, conflict resolution is external by design: the replication daemon keeps syncing non-conflicting documents while conflicting revisions accumulate as additional leaf branches, visible per document via ?conflicts=true. Second, never point two engine replicas at the same target partition — concurrent resolvers will race on the same revision tree and produce 409 storms, covered under error handling & retry logic.

Streaming Detection / Monitoring Setup

The engine subscribes to conflicts through the database _changes feed queried with style=all_docs, include_docs=true, and conflicts=true. That last parameter is the load-bearing one: it causes each returned document to carry its computed _conflicts array. You cannot filter for conflicts with a Mango selector, because _conflicts is a read-time projection rather than a stored, indexed field — so the consumer pulls every change and discards the clean ones. The CouchDB Changes API delivers updates in sequence order with a resumable since token, which makes it a natural fit for a restartable streaming consumer.

A minimal detector that isolates only documents needing intervention:

import requests

def stream_conflicts(db_url: str, auth: tuple, since: str = "0"):
    """Yield (doc_id, conflicts) for every document that currently has
    divergent leaves. `conflicts=true` is what makes `_conflicts` appear."""
    params = {
        "feed": "normal",
        "style": "all_docs",
        "include_docs": "true",
        "conflicts": "true",
        "since": since,
    }
    resp = requests.get(f"{db_url}/_changes", params=params, auth=auth, timeout=30)
    resp.raise_for_status()
    body = resp.json()
    for row in body["results"]:
        doc = row.get("doc") or {}
        if doc.get("_conflicts"):
            yield doc["_id"], doc["_conflicts"]
    return body["last_seq"]  # persist this to resume without re-scanning

if __name__ == "__main__":
    for doc_id, revs in stream_conflicts(
        "http://localhost:5984/iot_telemetry", ("admin", "password")
    ):
        print(f"conflict: {doc_id} has {len(revs)} losing leaves")

Persist last_seq after each batch. On restart the engine resumes from the stored checkpoint instead of re-walking the whole database, which keeps read amplification flat as the dataset grows. The depth of each _conflicts array is also a useful early-warning signal — a document accumulating more than three or four leaves usually indicates repeated offline edits from an edge device under clock skew, a pattern rooted in the conflict generation models of multi-writer topologies.

Core Implementation

The rule engine runs as a long-lived daemon that consumes the _changes feed, applies configurable merge strategies, and writes reconciled documents through the _bulk_docs endpoint. Resolving a conflict is always a two-part write: the merged winner extends the winning branch, and every losing leaf is tombstoned in the same batch — a conflict is only cleared once no divergent leaves remain. The implementation below adds exponential backoff, checkpointed resumption, and structured logging.

#!/usr/bin/env python3
"""
CouchDB Auto-Merge Rule Engine
Consumes _changes feed, detects conflicts, applies merge rules, writes reconciled docs.
"""
import os
import time
import logging
import requests
from typing import Dict, List, Optional, Any
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)

class CouchDBMergeEngine:
    def __init__(self, db_url: str, username: str, password: str):
        self.db_url = db_url.rstrip("/")
        self.session = requests.Session()
        self.session.auth = (username, password)
        # Transport-level retries absorb transient 5xx/429 without duplicating
        # application logic; conflict (409) is handled explicitly, not retried.
        retry_strategy = Retry(
            total=5, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504]
        )
        self.session.mount("http://", HTTPAdapter(max_retries=retry_strategy))
        self.session.mount("https://", HTTPAdapter(max_retries=retry_strategy))

    def fetch_changes(self, since_seq: str = "0", limit: int = 100) -> Dict[str, Any]:
        """Poll the _changes feed, including each document's conflict metadata.

        `conflicts=true` is what makes returned docs carry a computed
        `_conflicts` array. You cannot filter by `_conflicts` with a Mango
        selector — it is a read-time projection, not a stored/indexed field —
        so we request all changes and let resolve_conflicts() skip clean docs.
        """
        params = {
            "feed": "normal",
            "style": "all_docs",
            "include_docs": "true",
            "conflicts": "true",
            "since": since_seq,
            "limit": limit,
        }
        resp = self.session.get(f"{self.db_url}/_changes", params=params)
        resp.raise_for_status()
        return resp.json()

    def resolve_conflicts(self, doc: Dict[str, Any]) -> Optional[Dict[str, Any]]:
        """Apply deterministic merge logic. Extend with typed resolvers as in
        Custom Conflict Resolver Functions in Python.

        Returns the merged winner plus the losing revisions that must be
        tombstoned, or None when the document has no conflicts.
        """
        conflicts = doc.get("_conflicts", [])
        if not conflicts:
            return None

        # Fetch every losing revision so the merge sees the full branch set.
        rev_docs = []
        for rev in conflicts:
            r = self.session.get(f"{self.db_url}/{doc['_id']}?rev={rev}")
            if r.status_code == 200:
                rev_docs.append(r.json())

        # Deterministic merge: field-level union with higher-generation fallback.
        merged = doc.copy()
        merged.pop("_conflicts", None)

        for rev_doc in rev_docs:
            for key, value in rev_doc.items():
                if key.startswith("_"):
                    continue
                if key not in merged or self._is_newer(rev_doc, merged):
                    merged[key] = value

        # The merged winner keeps the current winning _rev so the write extends
        # the winning branch; the losing leaves are returned for deletion.
        return {"merged": merged, "losing_revs": conflicts}

    def _is_newer(self, candidate: Dict, current: Dict) -> bool:
        """Heuristic tiebreaker: compare _rev generation numbers. Replace with
        an application `updated_at` comparison for wall-clock semantics."""
        try:
            cand_gen = int(candidate["_rev"].split("-")[0])
            curr_gen = int(current["_rev"].split("-")[0])
            return cand_gen > curr_gen
        except (ValueError, KeyError):
            return False

    def write_resolved(self, resolution: Dict[str, Any]) -> bool:
        """Commit the merged winner AND delete the losing leaf revisions.

        A conflict is only cleared once every losing leaf is tombstoned, so the
        winner and the deletions are sent together in one _bulk_docs batch.
        """
        merged = resolution["merged"]
        docs = [merged] + [
            {"_id": merged["_id"], "_rev": rev, "_deleted": True}
            for rev in resolution["losing_revs"]
        ]
        resp = self.session.post(f"{self.db_url}/_bulk_docs", json={"docs": docs})
        resp.raise_for_status()
        return all(r.get("ok") for r in resp.json())

    def run(self, poll_interval: float = 2.0):
        """Main event loop. Checkpoints on last_seq for restartable resumption."""
        since_seq = os.getenv("ENGINE_SINCE", "0")
        logging.info("Starting CouchDB Auto-Merge Engine...")
        while True:
            try:
                body = self.fetch_changes(since_seq)
                for change in body.get("results", []):
                    doc = change.get("doc", {})
                    if not doc or "_id" not in doc:
                        continue

                    resolution = self.resolve_conflicts(doc)
                    if resolution:
                        if self.write_resolved(resolution):
                            logging.info(f"Resolved conflict for {doc['_id']}")
                        else:
                            logging.warning(f"Failed to resolve {doc['_id']}, routing to fallback")
                # Advance the checkpoint only after the whole batch is processed.
                since_seq = body.get("last_seq", since_seq)
                time.sleep(poll_interval)
            except requests.exceptions.RequestException as e:
                logging.error(f"Network error: {e}. Retrying in {poll_interval * 2}s...")
                time.sleep(poll_interval * 2)

if __name__ == "__main__":
    engine = CouchDBMergeEngine(
        db_url=os.getenv("COUCHDB_URL", "http://localhost:5984/iot_telemetry"),
        username=os.getenv("COUCHDB_USER", "admin"),
        password=os.getenv("COUCHDB_PASS", "password")
    )
    engine.run()

The resolve_conflicts method is intentionally a single, replaceable seam. Real deployments swap its body for a registry of typed resolvers — one per doc.type or doc.schema_version — as described in Custom Conflict Resolver Functions in Python. Keeping the transport, checkpointing, and write-batching stable while the merge body varies is what lets teams hot-swap merge policy without touching the sync loop.

Strategy Variants & Trade-offs

Auto-merge engines must prioritise predictability over cleverness: a resolver that produces a different winner depending on which node ran it is worse than no resolver at all. Strategy choice should also stay consistent with the broader algorithm selection for merge matrix so that the same data model resolves the same way everywhere. Three strategies cover the majority of production workloads:

Field-Level Union — Non-overlapping keys from each branch are combined; overlapping keys defer to a deterministic tiebreaker (highest _rev generation, an explicit updated_at, or a source-priority rank). Preserves independent edits to structured configuration payloads.
Last-Write-Wins (LWW) — The branch with the highest sequence or timestamp overwrites the others wholesale. Cheapest and simplest; correct only where recency genuinely implies truth, such as telemetry samples.
CRDT-Inspired Merge — Commutative, associative, and idempotent operations (G-Counters, OR-Sets, LWW-registers) guarantee convergence without coordination, at the cost of a bespoke document schema and larger payloads.

Strategy	Consistency guarantee	Resolution latency	Implementation complexity	Best-fit data
Field-Level Union	Convergent for disjoint fields; tiebreak on collisions	Low–moderate (per-field diff)	Moderate — schema-aware diffing	Config / device state documents
Last-Write-Wins	Convergent; silently drops older writes	Lowest (O(1) compare)	Low	Telemetry, sensor metrics
CRDT-Inspired	Strong eventual consistency; no data loss	Moderate (op replay)	High — custom schema + ops	Counters, collaborative offline state

Rule routing is best expressed as a strategy pattern that maps a document classifier to a resolver, isolating business logic from the sync pipeline. Documents no resolver can safely reconcile — incompatible schemas, missing metadata, repeated resolver failure — must not be force-merged; they belong in a review workflow instead.

Deployment & Orchestration

Running the engine in production is mostly about guaranteeing exactly one resolver acts on a given revision tree at a time. Package it as a small container, inject configuration through environment variables, and enforce a single replica per database partition to avoid split-brain resolution:

Build a lightweight image. A python:3.12-slim base with only requests installed keeps the attack surface and cold-start small.
Configure via environment. Pass COUCHDB_URL, COUCHDB_USER, COUCHDB_PASS, and an optional ENGINE_SINCE checkpoint. Never bake credentials into the image.
Pin one replica per partition. Set the workload to a single instance per monitored database (for example, a Kubernetes Deployment with replicas: 1 and strategy: Recreate). Two engines on one partition race on _bulk_docs.
Expose a health endpoint. Publish liveness from the poll loop — report the last successful last_seq advance and the seconds since the last processed batch so orchestrators restart a stalled engine.
Persist the checkpoint. Write last_seq to durable storage (a small _local document or an external key store) so a restarted engine resumes instead of re-scanning.

For high-throughput edge fleets, cache conflict metadata between polls to reduce read amplification against the target, and emit conflict density, resolution latency, and fallback-queue depth to your metrics stack. An emergency resync workflow can be triggered by a health-check threshold that patches the _replicator document when conflict rates spike.

Troubleshooting & Common Errors

Symptom / code	Likely cause	Remediation
`409 Conflict` on `_bulk_docs` write	The winning `_rev` changed between read and write (another writer landed)	Re-read the doc with `?conflicts=true`, re-run the merge, resubmit; see handling 409 conflicts in replication jobs
`doc_update_conflict` in CouchDB logs	Stale `_rev` submitted, or two engine replicas on one partition	Enforce single-replica-per-partition; always send the current winning `_rev`
Conflicts never clear despite “ok” writes	Winner written but losing leaves not tombstoned	Ensure the batch includes every `_conflicts` rev with `_deleted: true`
Conflict count keeps climbing	Resolver output is itself non-deterministic across nodes	Make the tiebreaker total and deterministic (stable sort key, not dict order)
`_changes` returns no `_conflicts` arrays	`conflicts=true` omitted, or a Mango `selector` used to filter	Request `conflicts=true` and filter conflicts client-side, not with a selector
Engine reprocesses the same batch after restart	`last_seq` checkpoint not persisted	Persist `last_seq` after each batch and load it on startup
Growing revision-tree depth / slow reads	Merges extend the tree without pruning	Verify tombstones are landing; schedule compaction on the target database

FAQ

Does an auto-merge engine change CouchDB’s default winning-revision selection?

No. CouchDB still picks a deterministic winner for reads (highest generation, then highest revision hash). The engine does not override that choice at read time — it commits a real merged document that becomes the new winner and tombstones the losing leaves, so the conflict is resolved in stored data rather than only masked on read.

Why write the merged winner and the tombstones in one `_bulk_docs` batch?

A conflict is only cleared when no divergent leaves remain. Sending the winner and every losing-leaf deletion together keeps the two writes atomic from the engine’s perspective and avoids a window where the winner exists but stale branches still surface via ?conflicts=true.

Can I run multiple engine instances for throughput?

Only if each instance owns a disjoint set of documents or partitions. Two resolvers acting on the same revision tree race on _bulk_docs and generate 409 storms. Scale by partitioning the keyspace (via doc_ids, a filter, or per-database sharding), not by adding replicas to the same target.

What happens to documents the engine cannot resolve deterministically?

They must never be force-merged. Route them to a review workflow — see manual review sync queues — where an operator inspects the divergence and re-injects a corrected document into the replication stream. Escalation ordering across automated and human tiers is covered by fallback resolution chains.

How do I confirm the engine is actually keeping up?

Watch three signals: the gap between the feed’s current last_seq and the engine’s persisted checkpoint, resolution latency from first detection to the winning commit, and the depth of the fallback queue. Rising checkpoint lag with a flat error rate usually means the poll interval is too coarse for the conflict volume.

Part of: Conflict Detection & Automated Resolution Strategies

Auto-Merge Rule Engines for CouchDB Replication Conflict Resolution & Sync Automation #

Configuration Schema & Required Parameters #

Streaming Detection / Monitoring Setup #

Core Implementation #

Strategy Variants & Trade-offs #

Deployment & Orchestration #

Troubleshooting & Common Errors #

FAQ #

Related #