Limitation: Two-node clusters lack fault tolerance for automatic failover decisions.
Problems with two nodes:
Split-brain risk:
No quorum:
Single point of failure:
Recommendation: Use minimum three nodes:
Alternative: Two-node with external arbitrator to break ties.
Reference: /pages/05.configuration/05.replication/docs.md
Symptom: After a failover in a 2-node cluster, restarting replication-manager leaves both servers stuck — one as Failed, the other as Suspect. No master is discovered, and the cluster doesn't recover.
Cause: On restart, all servers start in Suspect state. Without a quorum (minimum 3 nodes or an arbitrator), replication-manager cannot safely determine which server should be master. It refuses to guess because promoting the wrong server could cause data loss or split-brain.
Why it can't auto-recover: Replication-manager intentionally avoids promoting a Suspect server to master to prevent a dangerous scenario during network glitches — a server briefly going Suspect could be wrongly treated as standalone, triggering an unwanted rejoin that breaks a working cluster.
Solutions:
Use an arbitrator (recommended for 2-node production):
arbitration-external = true
arbitration-external-hosts = "arbitrator.example.com:10001"
The arbitrator remembers who was master and provides the quorum needed for safe election after restart.
Use 3+ nodes: Natural quorum — replication-manager can safely elect a master by majority.
Manual recovery after restart: If stuck, manually bootstrap replication from the GUI (Actions → Bootstrap Master-Slave) or CLI:
replication-manager-cli bootstrap --cluster=mycluster --topology=master-slave
Prevention: For production 2-node clusters, always configure an arbitrator. Without one, any replication-manager restart after a failover requires manual intervention.
Supported but with constraints:
No serialized isolation:
Deadlock risk:
Certification-based replication:
Recommendations:
Reference: /pages/04.architecture/03.topologies/06.multi-master-galera/docs.md
Required configuration for master-master (multi-master) topology:
Critical setting:
read_only = 1
Must be set in MariaDB configuration file (my.cnf), not just dynamically.
How it works:
read_only moderead_only = 0 on active master onlyread_only = 1 on standby masterWithout read_only = 1 in config:
Additional protection:
Reference: /pages/05.configuration/05.replication/docs.md
Problem: Relay node failures are not automatically managed.
Multi-tier example:
DC1: Master → Relay1
DC2: Relay1 → Slave1, Slave2
If Relay1 crashes:
Limitation: Designed for master failure, not intermediate relay failures.
Workaround options:
Option 1: Manually repoint slaves to master
CHANGE MASTER TO MASTER_HOST='master', ...
Option 2: Use scripts with failover-post-script to handle relay failures
Option 3: Avoid multi-tier topologies in critical paths
When multi-tier makes sense:
Reference: /pages/05.configuration/05.replication/docs.md