replication-manager writes several log files and diagnostic files. Knowing where each one lives and what it contains is the first step when investigating unexpected behaviour.
The main process log captures everything replication-manager does: topology discovery, failover decisions, API calls, job execution, and internal errors.
| Setting | Effect |
|---|---|
log-file |
Path to write the daemon log. Defaults to stdout if not set. Example: /var/log/replication-manager.log |
log-level |
Verbosity from 1 (errors only) to 7 (full monitoring loop trace). Levels above 3 print every monitoring tick and are intended for debugging only. |
log-syslog |
Mirror all log output to the local UDP syslog port in addition to the log file. Useful for feeding Graylog, Splunk, or ELK without a file agent. |
log-rotate-max-age |
Rotate logs older than this many days. Default: 7. |
log-rotate-max-backup |
Keep at most this many rotated log files. Default: 7. |
log-rotate-max-size |
Rotate when the log file exceeds this size in MB. Default: 5. |
verbose |
Add extra debugging output on top of log-level. |
All of these are set in the global (non-cluster) section of the configuration file. See Daemon Monitoring for the full parameter reference.
| Level | When to use |
|---|---|
1 |
Production — errors and critical events only |
2 |
Default — normal operations including state changes |
3 |
Failover debugging — shows decision logic |
5–7 |
Full trace — every SQL sent to monitored servers, every monitoring tick |
These files are written inside the replication-manager data directory, one subdirectory per cluster:
{monitoring-datadir}/{cluster-name}/
├── sql_general.log All SQL sent to backend servers during monitoring
├── capture_*.log Snapshot dumps triggered by error codes
└── {host}_{port}/
└── ... Per-server config and state files
Enabled by log-sql-in-monitoring = true. Records every SQL query that replication-manager sends to monitored servers (SHOW SLAVE STATUS, SHOW GLOBAL STATUS, etc.).
This log is valuable during failover and rejoin analysis: you can replay exactly what replication-manager queried and in what order to reconstruct the decision path.
log-sql-in-monitoring = true
When monitoring-capture is enabled, replication-manager automatically dumps a diagnostic snapshot whenever a monitored error code fires. Each capture saves:
SHOW SLAVE STATUSSHOW PROCESSLISTSHOW GLOBAL STATUSSHOW ENGINE INNODB STATUSCapture is triggered by the error codes listed in monitoring-capture-trigger (default: ERR00076,ERR00041 — max connections threshold and slave delay). The dump runs for 5 monitoring ticks and the oldest files are purged once the count exceeds monitoring-capture-file-keep.
| Setting | Default | Effect |
|---|---|---|
monitoring-capture |
true |
Enable automatic capture on error |
monitoring-capture-trigger |
"ERR00076,ERR00041" |
Comma-separated error codes that start a capture |
monitoring-capture-file-keep |
5 |
Maximum number of capture files to retain |
memprofile = "/tmp/repmgr.mprof"
When set, replication-manager writes a Go memory profile to this path. Load it with go tool pprof to investigate memory growth:
go tool pprof /tmp/repmgr.mprof
log-level = 5 temporarily and restartdb-servers-hosts, db-servers-credential, and network accesslog-sql-in-monitoring = true to see the exact queries being sentERR code that fired{monitoring-datadir}/{cluster-name}/ — they contain the exact server state at the moment of the errormonitoring-capture-trigger to see which codes arm a capturemonitoring-alert-trigger (see Alerting Configuration Guide)monitoring-ignore-errors (suppression list)scheduler-alert-disable is active (scheduled blackout)log-level = 3 to see alert dispatch in the daemon loglog-level back to 1 or 2 for productionlog-rotate-max-size to cap file growthlog-syslog with a log aggregator that handles volume and retention externally