Setting Up Peer-to-Peer Sync Topologies

You are here because a fleet of edge nodes, mobile backends, or Python sync workers needs to synchronize directly with one another — with no central coordinator to route through — and a naive attempt has already started biting: replication documents that duplicate on every redeploy, _changes feeds that never advance, or a conflict count that climbs the moment two peers reconnect after a partition. This guide walks the exact provisioning path for a peer mesh in CouchDB: confirm the prerequisites, lay down one pair of _replicator documents per link, verify each edge reaches running, and confirm convergence. Peer-to-peer is the highest-resilience wiring covered in sync topology models, and it is also the one that most rewards disciplined setup — every link is bidirectional, so a single mis-scoped edge multiplies conflicts across the whole graph. The mechanics of how those revisions move and diverge come from CouchDB Replication Architecture & Revision Fundamentals.

In a peer mesh, every link is a pair of replication documents (one per direction); there is no central coordinator, so any node can sync directly with any peer:

Immediate Triage / Prerequisites

Before writing a single _replicator document, confirm the ground you are building on. A peer mesh fails silently far more often than it errors loudly, so the checks below catch the misconfigurations that would otherwise surface as phantom conflicts days later.

Confirm mutual reachability. Each peer must reach every peer it links to over HTTPS on the CouchDB port. From node A, curl -sk https://node-b.local:6984/ should return the welcome document. Peers behind NAT or a cellular gateway need a stable route (VPN, tunnel, or a relay) before any edge will hold.
Confirm the replication user exists on both ends. A link from A to B authenticates against B with the credentials in the document’s target.auth. Verify the user resolves: curl -sk https://node-b.local:6984/_users/org.couchdb.user:rep_peer. Scope it to a per-mesh replication role, never a server admin.
Confirm the databases already exist. Set create_target to false so a typo fails fast instead of minting an empty database on a constrained device. Pre-create the database on every peer: curl -skX PUT https://node-b.local:6984/fleet_state.
Decide the scope of each edge. An unscoped mesh replicates every document to every peer — the fastest way to manufacture the very conflict generation models you are trying to avoid. Pick one of selector, filter, or doc_ids per edge (they are mutually exclusive) and shard by device group.
Environment. Python 3.10+ and a single HTTP client (requests) are all the provisioner needs. Run exactly one provisioner against the mesh; two racing the same _replicator database just trade 409s.

Step-by-Step Implementation

Each step includes a verification you can run before moving on. The flow is: pre-create databases, write the two directed documents that make one link, confirm both reach running, then repeat for every link in the mesh.

Pre-create the database on both peers. A peer-to-peer edge writes into an existing database; create_target: false is the safety net, but the database must genuinely exist. curl -skX PUT https://node-b.local:6984/fleet_state. Verify: curl -sk https://node-b.local:6984/fleet_state | grep '"db_name"' returns the database.
Write the A→B replication document. POST a document to _replicator on node A with source pointing at A’s local database and target at B, carrying target.auth, continuous: true, create_target: false, a selector to scope it, and a checkpoint_interval tuned to the link. Give it a deterministic _id such as peer_nodeA__to__nodeB so re-applying the manifest updates the edge in place instead of duplicating it. Verify: the POST returns 201/202, and the document appears at GET /_replicator/peer_nodeA__to__nodeB.
Write the mirrored B→A document. The reverse direction is a separate document — on node B’s _replicator, with source and target swapped and source.auth presenting the credential B uses to read from A. Bidirectional sync is these two documents together, never one flag. Verify: both documents exist; the link is not live until both directions are present.
Confirm each edge reached running. Poll the scheduler on each peer: GET /_scheduler/docs/_replicator/peer_nodeA__to__nodeB. A healthy edge reports state: "running"; a crashing/failed state carries the cause in _replication_state_reason (for example an http_request_failed or unauthorized message). Verify: assert state == "running" for both directions before trusting the link.
Repeat for every link, then confirm convergence. Provision each remaining peer pair the same way. Once every edge is running, write a marker document on one peer and confirm it appears on the others: curl -skX PUT https://node-a.local:6984/fleet_state/mesh_probe -d '{"ok":true}', then read mesh_probe from node C. Verify: the probe document is readable on every peer within a checkpoint interval.

For scoping, checkpoint tuning, and the full field reference behind these documents, deploy them against the canonical _replicator document schema. Whether each edge should be continuous or a scheduled one-shot sweep is weighed in continuous vs one-way sync.

Complete Working Example

This self-contained script provisions a full bidirectional peer mesh from a list of peers. It expands every unordered pair into two directed _replicator documents, uses deterministic _ids so re-running converges instead of duplicating, scopes each edge with a Mango selector, and waits for every edge to reach running before returning. Set the peer URLs and credentials at the bottom before running.

import itertools
import os
import time
from dataclasses import dataclass

import requests


@dataclass
class Peer:
    """One CouchDB node participating in the mesh."""

    name: str          # short, stable id used to build the edge _id
    url: str           # e.g. https://node-a.local:6984
    db: str            # database to synchronize, e.g. fleet_state
    rep_user: str      # replication user other peers authenticate as
    rep_password: str  # that user's password (from the secret store)


def edge_id(src: Peer, dst: Peer) -> str:
    """Deterministic id so re-applying the mesh is idempotent."""
    return f"peer_{src.name}__to__{dst.name}"


def edge_doc(src: Peer, dst: Peer, selector: dict) -> dict:
    """One directed replication document: read from src, write to dst."""
    return {
        "_id": edge_id(src, dst),
        "source": {
            "url": f"{src.url}/{src.db}",
            "auth": {"basic": {"username": src.rep_user, "password": src.rep_password}},
        },
        "target": {
            "url": f"{dst.url}/{dst.db}",
            "auth": {"basic": {"username": dst.rep_user, "password": dst.rep_password}},
        },
        "continuous": True,
        "create_target": False,        # fail fast; never mint a DB on a constrained node
        "selector": selector,          # scope the edge so peers do not flood each other
        "checkpoint_interval": 10000,  # 10s balances durability vs. bandwidth on flaky links
    }


def put_edge(admin: requests.Session, host: Peer, doc: dict) -> None:
    """PUT a replication doc on host, updating the live _rev on a 409."""
    doc_url = f"{host.url}/_replicator/{doc['_id']}"
    existing = admin.get(doc_url)
    if existing.status_code == 200:
        doc["_rev"] = existing.json()["_rev"]  # update in place, do not duplicate the edge
    resp = admin.put(doc_url, json=doc)
    resp.raise_for_status()


def await_running(admin: requests.Session, host: Peer, doc_id: str, timeout: int = 60) -> str:
    """Block until an edge reaches running, or report its failure reason."""
    path = f"{host.url}/_scheduler/docs/_replicator/{doc_id}"
    deadline = time.time() + timeout
    while time.time() < deadline:
        info = admin.get(path).json()
        state = info.get("state")
        if state == "running":
            return state
        if state in {"failed", "crashing"}:
            reason = (info.get("info") or {}).get("error", "unknown")
            raise RuntimeError(f"{doc_id} is {state}: {reason}")
        time.sleep(3)
    raise TimeoutError(f"{doc_id} never reached running")


def build_mesh(peers: list[Peer], selector: dict, admin_auth: tuple[str, str]) -> None:
    """Provision a fully connected bidirectional mesh over the given peers."""
    admin = requests.Session()
    admin.auth = admin_auth
    admin.headers.update({"Content-Type": "application/json"})

    # Every unordered pair becomes two directed edges (A->B and B->A).
    for a, b in itertools.combinations(peers, 2):
        for src, dst in ((a, b), (b, a)):
            doc = edge_doc(src, dst, selector)
            put_edge(admin, src, doc)              # the document lives on the source peer
            print(f"applied {doc['_id']}")
        # Confirm both directions of this link are healthy before moving on.
        for src, dst in ((a, b), (b, a)):
            await_running(admin, src, edge_id(src, dst))
            print(f"running  {edge_id(src, dst)}")


if __name__ == "__main__":
    pw = os.environ.get("REP_PASSWORD", "changeme")
    fleet = [
        Peer("nodeA", "https://node-a.local:6984", "fleet_state", "rep_peer", pw),
        Peer("nodeB", "https://node-b.local:6984", "fleet_state", "rep_peer", pw),
        Peer("nodeC", "https://node-c.local:6984", "fleet_state", "rep_peer", pw),
    ]
    build_mesh(
        fleet,
        selector={"device_group": "west-1"},
        admin_auth=("admin", os.environ.get("ADMIN_PASSWORD", "admin")),
    )

Running the script twice converges to the same mesh rather than doubling it, because each edge resolves to a deterministic _id and put_edge updates the live _rev in place. That idempotence is what lets you keep the peer list under version control and re-apply it on every deploy.

Gotchas & Edge Cases

Bidirectional is two documents, not a flag. There is no bidirectional: true in CouchDB. Forgetting the mirrored document produces a one-way link that looks healthy in _scheduler/docs while changes flow in only one direction — a classic silent-divergence trap in a mesh.
The checkpoint lives in _local, not in _replicator. Progress is stored in _local/<replication_id> documents on the source and target. since_seq is only an optional start sequence you set when creating a job — it is not a knob you reset on a running one. To force a clean re-scan, delete the _local checkpoints (and, if needed, recreate the document); a one-shot job (continuous: false) does a single reconciliation pass.
Conflicts scale quadratically with peer count. A fully connected mesh of N nodes needs N × (N − 1) directed documents. Every unscoped edge is another path for concurrent writes to diverge, so scope aggressively with selector and reserve full meshes for small peer groups — larger fleets belong on hybrid arbitration.
new_edits: false does not override winner selection. CouchDB never lets an application intercept its winner-selection pipeline; the default read revision is chosen deterministically by revision tree mechanics. Resolution happens after the fact — fetch with conflicts=true, merge, then write the winner and delete the losing leaves in one _bulk_docs batch. new_edits: false is a replication-only mechanism for supplying explicit revision histories, not a resolution shortcut.
A reconnect storm is expected; a sustained storm is not. When a partition heals, every edge resumes at once and merges divergent trees simultaneously — a transient conflict spike. If it never subsides, an edge is unscoped or missing its checkpoint, not healing.

Verification & Observability

Confirm the mesh at three levels. First, per edge: GET /_scheduler/docs/_replicator/<id> on each peer must report state: "running", and its checkpointed_source_seq must advance over time. If checkpointed_source_seq stalls while the source’s update_seq keeps climbing, writes are failing rather than progressing — check doc_write_failures and inspect the CouchDB log filtered for couch_replicator entries. Second, across the fleet: GET /_active_tasks filtered for type: "replication" should show changes_pending draining rather than growing; excessive revisions_checked and 429 Too Many Requests responses on a constrained gateway (RFC 6585) usually signal an over-aggressive full re-scan. Third, at the data level: write a probe document on one peer and read it back from every other peer within a checkpoint interval to prove end-to-end convergence. Emit your own per-edge metrics — keyed by source and target — so a dashboard renders the mesh as a graph with each edge coloured by health; a single dark edge in a mesh strands only its two endpoints, but the aggregate conflict rate is the signal that a scope is wrong. Persistent authentication failures on an edge belong in error handling & retry logic rather than an unbounded retry loop, and documents no automated resolver can reconcile should escalate to manual review sync queues. Aligning each edge’s credential scope with the network partition it crosses follows security boundaries in replication, so a compromised peer cannot replicate outside its device group.

FAQ

How many _replicator documents does a peer mesh of N nodes need?

A fully connected bidirectional mesh needs N × (N − 1) directed documents — two per node pair, one for each direction — which grows quadratically. That growth is the main reason full meshes are reserved for small peer groups; larger fleets scope edges with selectors so peers exchange only shared documents, or adopt hybrid arbitration so most traffic flows through a coordinator instead of every possible pair.

Why do changes flow in only one direction between two peers?

Because only one of the two documents was written. CouchDB has no bidirectional flag — a peer link is a pair of directed _replicator documents with source and target swapped. If one is missing, the surviving edge still reports running in _scheduler/docs, so the link looks healthy while replication is actually one-way. Confirm both directions exist and both reach running.

How do I force a peer edge to re-scan from the beginning?

Delete the _local/<replication_id> checkpoint documents on the source and target, and if necessary recreate the _replicator document; the edge then rebuilds its checkpoint from the start. Setting since_seq only defines a starting point at job creation — it is not a reset applied to a running edge. For a single reconciliation pass without leaving a persistent listener, run the edge as a one-shot continuous: false job.

Part of: Sync Topology Models

Setting Up Peer-to-Peer Sync Topologies #

Immediate Triage / Prerequisites #

Step-by-Step Implementation #

Complete Working Example #

Gotchas & Edge Cases #

Verification & Observability #

FAQ #

Related #