replication-manager uses a multi-tier logging system built on Logrus. Every message is written simultaneously to one or more destinations: a rotating log file, journald (via stdout), the web interface, and optionally syslog or alert channels such as Slack.
Set by log-file. All cluster monitoring, failover, switchover, and API activity is written here.
log-file = "/var/log/replication-manager.log"
When log-file is set, three specialised logs are created automatically beside it by appending a suffix to the base name:
| File | Suffix | Content |
|---|---|---|
| Main log | (base) | General monitoring, HA operations, API |
| Security log | -security |
Authentication, authorisation, credential access |
| Workload log | -workload |
Performance spikes, slow query events |
| Maintenance log | -maintenance |
Backups, SST, tasks, purge, orchestrator provisioning |
Example: if log-file = /var/log/replication-manager.log then:
/var/log/replication-manager.log
/var/log/replication-manager-security.log
/var/log/replication-manager-workload.log
/var/log/replication-manager-maintenance.log
Two SQL logs are written per cluster under the monitoring data directory:
{monitoring-datadir}/{cluster-name}/sql-error.log — SQL errors returned by the database
{monitoring-datadir}/{cluster-name}/sql-general.log — All statements executed by replication-manager
The default monitoring-datadir is /var/lib/replication-manager.
If replication-manager panics, a JSON-formatted stacktrace is written to:
{monitoring-datadir}/panic.log
When the embedded Graphite backend is enabled:
{monitoring-datadir}/carbonapi.log
Log verbosity is controlled with integers. A lower number means fewer, more severe messages; a higher number means more verbose output.
| Level | Value | Meaning |
|---|---|---|
| ERROR | 1 | Errors only |
| WARN | 2 | Errors and warnings |
| INFO | 3 | Normal operational output (default) |
| DEBUG | 4 | Full diagnostic output |
The global level applies to all output unless overridden per module:
log-level = 3
Every subsystem has its own level key. Set any module to 4 (DEBUG) when diagnosing a specific area without flooding the rest of the log.
# Core operations
log-level-topology = 2 # Topology discovery and changes
log-level-heartbeat = 1 # Heartbeat monitoring
log-level-writer-election = 1 # Primary election process
# Config and Git
log-level-config-load = 2 # Config file merging and reload
log-level-git = 2 # Git config backup operations
# Backups
log-level-backup-stream = 4 # Streaming backup verbosity
log-level-restic = 3 # Restic backup operations
log-level-sst = 1 # State Transfer (SST / IST)
# Proxies
# (set in the cluster section — no global key, use log-level)
# Orchestrator and provisioning
log-level-orchestrator = 2 # Provisioning orchestrator calls
# Tasks and maintenance
log-level-task = 3 # Scheduled task execution
# Database log forwarding
log-level-database-errors = 1 # MariaDB/MySQL error log lines
log-level-database-sql-errors = 1 # SQL errors from general_log
log-level-database-slowquery = 2 # Slow query log lines
log-level-database-audit = 3 # Audit plugin events
log-level-database-optimize = 4 # Optimizer trace
# Alerting
log-level-mailer = 3 # Email alert delivery
log-level-support = 2 # Support bundle creation
# Other
log-level-external-script = 3 # External script execution
log-level-stats = 1 # Internal statistics collection
log-level-vault = 1 # HashiCorp Vault operations
log-level-graphite = 2 # Graphite/Carbon metrics
log-level-binlog-purge = 1 # Binary log purge operations
log-level-sql = 2 # SQL statements executed by repman
# Plugin log tailer
log-plugin = false # Enable log tailer plugins
log-level-plugin = 2 # Plugin verbosity
Each log line carries a timestamp, cluster name, module tag, level, and message:
2006/01/02 15:04:05 [mycluster] [heartbeat] INFO - Primary heartbeat received
2006/01/02 15:04:06 [mycluster] [topology] WARN - Replica 10.0.1.11:3306 lag > threshold
2006/01/02 15:04:07 [mycluster] [general] STATE - OPENED ERR00076: Lost primary
Level strings:
| String | Meaning |
|---|---|
ERROR |
Hard error |
WARN |
Warning |
INFO |
Informational |
DEBUG |
Diagnostic detail |
STATE |
Cluster state change (OPENED / RESOLV prefix) |
ALERT |
Critical operational alert |
ALERTOK |
Alert cleared |
START |
Service start / bootstrap message |
TEST |
Regression test output |
BENCH |
Benchmark output |
Module tags appearing in log lines:
| Tag | Area |
|---|---|
general |
General monitoring |
topology |
Topology discovery |
heartbeat |
Heartbeat checks |
election |
Writer election |
sst |
State Transfer |
backup |
Backup streaming |
restic |
Restic operations |
task |
Task / job execution |
maintenance |
Maintenance operations |
proxy |
Generic proxy operations |
proxysql |
ProxySQL |
haproxy |
HAProxy |
maxscale |
MaxScale |
orchestrator |
Provisioning orchestrator |
conf |
Config load and merge |
git |
Git backup |
vault |
HashiCorp Vault |
graphite |
Graphite metrics |
purge |
Binlog purge |
mailer |
Email alerting |
support |
Support bundles |
external-script |
External scripts |
stats |
Internal statistics |
sql |
SQL statements |
errorlog |
Database error log |
slowquery |
Database slow query log |
auditlog |
Database audit log |
sqlerrorlog |
Database SQL errors |
optimize |
Database optimizer trace |
plugin |
Log tailer plugins |
app |
Application-level |
Rotation is handled by lumberjack. All file-based logs (main, security, workload, maintenance, SQL) rotate automatically.
log-rotate-max-size = 5 # Rotate when file reaches this size in MB
log-rotate-max-backup = 7 # Number of rotated files to keep
log-rotate-max-age = 7 # Days before rotated files are deleted
Rotated files are gzip-compressed automatically. A daily forced rotation also runs via the internal scheduler.
To forward all log entries to the local syslog daemon (rsyslog, syslog-ng, etc.):
log-syslog = true
replication-manager connects to localhost:514 over UDP at LOG_INFO severity. The syslog hook is additive — file and syslog output both receive messages.
# Follow the main service journal
journalctl -u replication-manager -f
# Follow and show the last 100 lines
journalctl -u replication-manager -n 100 -f
# Last hour
journalctl -u replication-manager --since "1 hour ago"
# Specific window
journalctl -u replication-manager --since "2024-06-01 00:00" --until "2024-06-01 06:00"
# Since last boot
journalctl -u replication-manager -b
journald maps Logrus levels to syslog priorities automatically when replication-manager logs to stdout:
# Errors only
journalctl -u replication-manager -p err
# Warnings and above
journalctl -u replication-manager -p warning
# Full debug output
journalctl -u replication-manager -p debug
# JSON output — pipe to jq
journalctl -u replication-manager -o json | jq '.MESSAGE'
# Short with microseconds
journalctl -u replication-manager -o short-precise
# Export to file
journalctl -u replication-manager --since today > /tmp/repman-today.log
# Find all failover events
journalctl -u replication-manager -g "failover"
# Find specific cluster
journalctl -u replication-manager -g "\[mycluster\]"
# Find STATE changes
journalctl -u replication-manager -g "STATE"
# Find a specific error code
journalctl -u replication-manager -g "ERR00076"
If log-file is configured:
# Follow main log
tail -f /var/log/replication-manager.log
# Follow only security events
tail -f /var/log/replication-manager-security.log
# Follow maintenance/backup activity
tail -f /var/log/replication-manager-maintenance.log
# Filter a module in real time
tail -f /var/log/replication-manager.log | grep '\[topology\]'
# Filter an error code
tail -f /var/log/replication-manager.log | grep 'ERR0'
# Filter STATE changes only
tail -f /var/log/replication-manager.log | grep 'STATE'
# All ERR codes in the last 7 days of rotated logs
zgrep 'ERR0' /var/log/replication-manager.log* | sort
# All STATE OPENED events
zgrep 'OPENED' /var/log/replication-manager.log*
# Failover events across all rotated files
zgrep -h 'failover' /var/log/replication-manager.log* | sort -k1,2
In-memory log buffers (last ~200 entries per cluster) are available through the API without reading the log files:
| Endpoint | Content |
|---|---|
GET /api/clusters/{name}/log |
General cluster log |
GET /api/clusters/{name}/log-task |
Task and backup log |
GET /api/clusters/{name}/log-security |
Security events |
GET /api/clusters/{name}/log-workload |
Workload spikes |
These endpoints require authentication. Use the admin or dba credential:
curl -sk -u admin:repman \
https://localhost:10005/api/clusters/mycluster/log | jq .
Bump a specific module to DEBUG without restarting — set the level via the API or edit the config and reload:
# Investigate a topology issue
log-level-topology = 4
# Trace a backup problem
log-level-backup-stream = 4
log-level-restic = 4
# Debug SST / IST
log-level-sst = 4
After diagnosing, drop the levels back to their defaults to reduce log volume.
| Goal | Command |
|---|---|
| Follow all logs | journalctl -u replication-manager -f |
| Find switchovers | journalctl -u replication-manager -g "switchover" |
| Find failovers | journalctl -u replication-manager -g "failover" |
| Find all ERR codes | journalctl -u replication-manager -g "ERR0" |
| Find WARN codes | journalctl -u replication-manager -g "WARN0" |
| Find state changes | journalctl -u replication-manager -g "STATE" |
| Topology module only | journalctl -u replication-manager -g "\[topology\]" |
| Security events | tail -f /var/log/replication-manager-security.log |
| Backup / maintenance | tail -f /var/log/replication-manager-maintenance.log |
| Slow SQL | tail -f /var/log/replication-manager-workload.log |
| SQL executed by repman | tail -f {datadir}/{cluster}/sql-general.log |
| Panic stacktrace | cat {datadir}/panic.log \| jq . |