_replicator Document Schema: Stateful Sync Control for Distributed Systems

The _replicator database serves as the operational control plane for CouchDB synchronization. Unlike legacy _replicate API calls, which are ephemeral and tied to transient HTTP sessions, _replicator documents persist as first-class JSON objects that the CouchDB scheduler continuously evaluates, schedules, and monitors. For distributed systems teams architecting edge/IoT gateways, mobile backend pipelines, or Python-based sync workers, treating replication as a stateful document workflow eliminates transient connection failures, enables deterministic conflict resolution, and provides auditable sync telemetry. This guide details the exact schema, field semantics, and production deployment patterns required to operationalize the _replicator Configuration & Sync Pipeline Management framework at scale.

Exact Document Schema & Field Mapping

Every replication job in CouchDB is represented by a single document in the _replicator database. The schema below aligns with CouchDB 3.x+ production standards and reflects the runtime expectations of the Erlang-based replication scheduler. Fields marked as required must be present for the scheduler to accept and enqueue the document.

{
  "_id": "rep_edge_node_01_to_central",
  "_rev": "1-abc123def456",
  "source": "https://edge-node-01.local:5984/iot_telemetry",
  "target": "https://central-cluster.prod:5984/iot_telemetry",
  "create_target": false,
  "continuous": true,
  "doc_ids": ["sensor_001", "sensor_002"],
  "filter": "app/by_device_type",
  "query_params": {"type": "temperature"},
  "user_ctx": {
    "name": "replicator_svc",
    "roles": ["_admin"]
  },
  "owner": "pipeline_automation",

  "_replication_id": "a1b2c3d4e5f6...",
  "_replication_state": "running",
  "_replication_state_time": "2026-05-29T12:00:00Z",
  "_replication_stats": {
    "doc_write_failures": 0,
    "docs_read": 1420,
    "docs_written": 1418,
    "missing_revisions_found": 2,
    "revisions_checked": 1420
  }
}

The fields above the blank line are authored by you; the _replication_* fields below it are written back by CouchDB and must not be set by hand.

Critical Field Behavior & Scheduler Mechanics

  • source / target: Accepts absolute database URLs, local database names (e.g., "iot_telemetry"), or structured JSON objects containing url, headers, and auth for cross-cluster authentication. When targeting remote clusters, ensure TLS termination and certificate validation align with your network security posture.
  • continuous: When true, CouchDB attaches a persistent changes feed listener to the source database. For bandwidth-constrained deployments, toggling this flag dictates whether you implement Continuous vs One-Way Sync strategies directly at the document level, removing dependency on external cron jobs or Kubernetes CronJob resources.
  • Cancellation: To stop a job, delete its _replicator document — that is the supported mechanism. (The cancel: true flag belongs to the one-off POST /_replicate API, not to _replicator documents.) On deletion the scheduler tears down the worker after committing in-flight writes.
  • _replication_id: A computed identifier (a hash of the replication parameters) that CouchDB writes back and uses internally to deduplicate jobs. You do not set it; observe it via the document or _scheduler/jobs. Use a deterministic _id (not _replication_id) to make deployments idempotent across pod restarts or leader elections.
  • _replication_state & _replication_stats: Read-only fields populated by the replication scheduler. _replication_state transitions through initializing, running, pending, crashing, completed, and failed (also observable via _scheduler/docs). _replication_stats provides real-time counters for pipeline observability, including revision-reconciliation metrics and write-failure tallies.
stateDiagram-v2
  [*] --> initializing
  initializing --> running
  running --> pending: paused / throttled
  pending --> running: resumed
  running --> completed: one-shot finished
  running --> crashing: transient error
  crashing --> running: retry (exponential backoff)
  crashing --> failed: gives up
  completed --> [*]
  failed --> [*]

Conflict Resolution & Deterministic State Tracking

CouchDB uses a multi-version concurrency control (MVCC) model. When replication encounters divergent revisions during a sync cycle, it writes all conflicting branches to the target’s revision tree but does not resolve them and does not store a _conflicts field; the losing leaves are surfaced as a computed _conflicts array only when a document is read with ?conflicts=true. Distributed systems engineers must implement deterministic resolution logic in application code (write a merged winner and delete the losing revisions).

To prevent conflict storms in high-churn IoT environments, leverage the filter and query_params fields to restrict replication scope to device-specific namespaces. When combined with structured user_ctx credentials, you can enforce role-based replication boundaries that align with zero-trust network architectures. The scheduler’s internal state machine tracks revision checkpoints, ensuring that interrupted syncs resume from the last verified sequence ID rather than reprocessing the entire changes feed.

For teams building Python-based sync pipelines, integrating the replication state machine with asynchronous event loops allows non-blocking telemetry ingestion. By polling the _replicator _changes feed or subscribing to database notifications, engineers can trigger downstream processing only after state transitions to running or completed. This pattern aligns with modern async I/O paradigms documented in the Python asyncio library, enabling high-throughput sync workers without thread contention.

Production Deployment & Pipeline Integration

Deploying _replicator at scale requires treating replication documents as infrastructure-as-code artifacts. Each document should be version-controlled, validated against a JSON schema, and deployed via configuration management tools. When provisioning edge devices, ensure that _replicator documents are seeded with deterministic _id values derived from hardware serial numbers or cloud-assigned node identifiers. This guarantees idempotent deployments across fleet rollouts.

For mobile backend engineers, integrating replication lifecycle hooks with centralized logging systems provides critical visibility into sync health. By routing _replication_state transitions and _replication_stats counters to time-series databases, teams can establish alert thresholds for doc_write_failures or missing_revisions_found. Implementing Async Monitoring & Webhooks ensures that pipeline operators receive near-real-time notifications when replication workers enter crashing or failed states, enabling automated remediation workflows.

When targeting resource-constrained IoT gateways, memory and connection pooling must be explicitly tuned. The http_connections, worker_processes, and worker_batch_size fields can be set per replication document (and have cluster-wide defaults in the [replicator] configuration section). Pairing these cluster-level settings with document-level continuous: false toggles during high-latency network conditions prevents memory exhaustion. Detailed deployment blueprints for low-power environments are available in Configuring _replicator for IoT Edge Nodes, which outlines CPU throttling, batch size optimization, and offline queue strategies.

For authoritative reference on replication internals, checkpointing algorithms, and scheduler configuration parameters, consult the official CouchDB Replicator Documentation. By treating _replicator documents as declarative sync contracts rather than imperative API calls, distributed systems teams achieve resilient, observable, and conflict-aware data synchronization across heterogeneous network topologies.