edit this page

Multi node monitoring

Active standby with external arbitrator

When inside a single DC, we can use a single replication-manager that performs failover using keepalived, corosync, openha or etcd, but if you run on 2 DC it is possible to run two replication-manager instances in the same infrastructure. Both instances will use a heartbeat mechanism via the http protocol.

Make sure you activate the web server of replication-manager.

To configure active standby and arbitration, use the following settings on each replication-manager:

`arbitration-external` (1.0)

Item	Value
Description	Enable external arbitration on split brain.
Type	Boolean
Default Value	false

`arbitration-external-hosts` (1.0)

Item	Value
Description	Hosts list of external arbitrator service.
Type	List
Default Value	"arbitrator-fr.signal18.io,arbitrator-us.signal18.io"

Define one secret arbitration-external-secret it should be unique across all users of replication-manager, it is use by the arbitrator to identify your cluster, organization name and random alpha-numeric are very welcome.

`arbitration-external-secret` (1.0)

Item	Value
Description	Full path to an alerting script.
Type	String
Default Value	"cluster01.signal18.io"

Give each replication-manager server a different value for arbitration-external-unique-id and instruct it for the other replication-manager in the cluster.

`arbitration-external-id` (1.0)

Item	Value
Description	Unique value on each replication-manager.
Type	Integer
Default Value	0

`arbitration-peer-hosts` (1.0)

Item	Value
Description	Peer replication-manager node.
Type	String
Default Value	"replication-manager01:10002"

`arbitration-failed-master-script` (2.1)

Item	Value
Description	External script when a master lost arbitration during split brain
Type	String
Default Value	""

Internals

On each instance instruct it's peer replication-manager node

On instance "127.0.0.1:10001" arbitration-peer-hosts ="127.0.0.1:10002"

On instance "127.0.0.1:10002" arbitration-peer-hosts ="127.0.0.1:10001"

Once done start the fist replication-manager.

INFO[2017-03-20T09:48:38+01:00] [cluster_test_2_nodes] ERROR :Get http://127.0.0.1:10001/heartbeat: dial tcp 127.0.0.1:10001: getsockopt: connection refused
INFO[2017-03-20T09:48:38+01:00] [cluster_test_2_nodes] INFO : Splitbrain     
INFO[2017-03-20T09:48:38+01:00] [cluster_test_3_nodes] CHECK: External Abitration
INFO[2017-03-20T09:48:38+01:00] [cluster_test_3_nodes] INFO :Arbitrator say winner
INFO[2017-03-20T09:48:40+01:00] [cluster_test_2_nodes] ERROR :Get http://127.0.0.1:10001/heartbeat: dial tcp 127.0.0.1:10001: getsockopt: connection refused
INFO[2017-03-20T09:48:40+01:00] [cluster_test_2_nodes] INFO : Splitbrain     
INFO[2017-03-20T09:48:40+01:00] [cluster_test_3_nodes] CHECK: External Abitration
INFO[2017-03-20T09:48:40+01:00] [cluster_test_3_nodes] INFO Arbitrator say :winner

The split brain detection is trigger because your are the first instance to start, the peer replication-manager is not joinable so it ask for an arbitration to arbitration-external-hosts where the arbitrator daemon is running.

The arbitrator will enable that node to enter Active Mode

When you start the peer replication-manager server, the split brain is resolve and replication-manager will detect an other active instance is running so it will get the Standby status.

failover in a replication-manager cluster is requesting an arbitration. If arbitrator can't be contacted, you can come back to normal command line mode to failover but make sure you stopped all other running replication-manager.

Run a private arbitrator

`arbitrator-bind-address` (2.0 arb)

Item	Value
Description	Arbitrator bind address
Type	String
Default Value	"0.0.0.0:10001"

`arbitrator-driver` (2.0 arb)

Item	Value
Description	Arbitrator backend storage type
Type	Enum
List Values	sqlite,mysql
Default Value	"sqlite"

Configuration for MySQL backend should be enable via this configuration.

[arbitrator]
title = "arbitrator"  
db-servers-hosts = "192.168.0.201:3306"
db-servers-credential = "user:password"
[default]

Start it via

/usr/bin/replication-manager-arb arbitrator --arbitrator-port=80

Multi node monitoring

Active standby with external arbitrator

arbitration-external (1.0)

arbitration-external-hosts (1.0)

arbitration-external-secret (1.0)

arbitration-external-id (1.0)

arbitration-peer-hosts (1.0)

arbitration-failed-master-script (2.1)