Arbitration provides high availability for the monitoring layer by running two replication-manager instances across separate datacenters. An external arbitrator service ensures that exactly one instance is active at any time, preventing conflicting operations on your database clusters during network partitions. Arbitration requires a registered instance with a support or partner subscription plan.
Under the free plan, you typically deploy a single replication-manager instance in a third datacenter that acts as the arbitrator of your databases and proxies availability. Because such an instance does not hold critical data, it is easy to relocate in case of failure — you may lose monitored data and statistics, but you can restore configuration from personal backups or from the Signal18 GitLab for registered instances.
Both instances in the pair must share the same registration URI and the same encryption key. On the second instance, copy the following from the first:
monitoring-key-path, default .replication-manager.key)cloud18-domaincloud18-sub-domaincloud18-sub-domain-zonecloud18-gitlab-usercloud18-gitlab-passwordWhen arbitration is enabled, one instance is elected active and the other enters standby mode. The active instance is the sole authority for all cluster operations. The standby instance monitors the same database clusters but does not modify them.
When deploying across two datacenters, you can run two replication-manager instances in an active/standby pair. Both instances exchange heartbeats over HTTP via the replication-manager API, with an external arbitrator service resolving split-brain situations.
/api/heartbeat endpoint on the standard API port. This traffic stays on the local network or VPN. No authentication is required (the endpoint is unprotected).replication-manager-arb) that both instances contact when a split brain is detected. The arbitrator decides which instance becomes active based on heartbeat reports. The arbitrator can be a shared public service behind a TLS reverse proxy.Make sure the web server of replication-manager is active on both instances — the peer heartbeat uses the API port.
The standby instance suppresses all operations that could conflict with the active instance or cause unintended changes to the monitored databases.
When arbitration is enabled, the active instance pushes cluster configuration to the git repository (git-url). Standby instances pull from the same repository and reload configuration changes automatically.
This ensures that configuration changes made on the active instance (enabling a backup schedule, changing a monitoring threshold, etc.) are propagated to the standby without manual intervention. If the standby is later promoted to active, it operates with the latest configuration.
The sync uses the existing git-url repository — no additional git configuration is required beyond what is already set up for configuration persistence.
When the first instance starts, the peer is not yet reachable. This triggers split-brain detection and the instance contacts the arbitrator:
INFO Arbitrator: External check requested
INFO Arbitrator say winner
The arbitrator grants active status to the first instance.
When the second instance starts, it detects the peer is already active and enters standby mode:
INFO No peer split brain setting status to S
If the active instance goes down, the standby detects the peer is unreachable (split brain), contacts the arbitrator, and is elected as the new active instance.
Failover in a replication-manager cluster requires arbitration. If the arbitrator cannot be contacted, you can use the command line or API to failover manually — but make sure all other replication-manager instances are stopped first.
The primary goal of arbitration is to protect the database infrastructure from having two masters accepting writes simultaneously. During a network partition, the active replication-manager may failover to a slave, promoting it as the new master. If the standby then takes over as active, it may still see the old master as writable — leaving two servers accepting writes and causing data divergence.
When the arbitrator elects a winner, the losing instance compares its own master with the winner's elected master. If they differ, the loser demotes its master by reattaching it as a slave of the winner's master using GTID-based replication (CHANGE MASTER). This converges the topology back to a single writer as quickly as possible.
If you need custom handling of the demoted master (for example, fencing it from client traffic before rejoining), you can use the arbitration-failed-master-script setting to run an external script instead of the automatic GTID rejoin. The script receives the failed master's host and port as arguments.
Note: Automatic master rejoin after lost arbitration only works with GTID-based replication.
Starting from the release following 3.1.30, the arbitrator binary (replication-manager-arb) will be included in all release editions, not only the Pro edition. This allows any deployment to set up active/standby pairs with external arbitration.
However, automatic split-brain resolution requires a registered instance with a support or partner subscription plan. Instances that are not registered to Cloud18 via a GitLab user, or that are on the free plan, will:
ERR00104 error state when split-brain is detectedThis means the active/standby mechanism works for everyone, but automatic recovery from a network partition between the two instances is a supported-plan feature. Manual recovery via the API (ForceArbitratorElection) or the dashboard toggle remains available regardless of plan.