replication-manager embeds a cluster scheduler that automates database maintenance tasks: backups, log collection, table optimization, rolling restarts, and more. The scheduler is disabled by default — each task is individually enabled and given its own cron expression.
Maintenance tasks are coordinated through a message queue table that replication-manager creates and manages on every monitored database server. The flow is:

replication_manager_schema.jobs. All writes to this table use sql_log_bin=0 so they do not produce binlog eventsscheduler-db-servers-sender-ports to receive the streamed result. It passes this address and port in the job rowprocessing, executes the work, and streams the output back to replication-manager via socatresult field and sets done=1; replication-manager confirms completion on the next checkCREATE TABLE replication_manager_schema.jobs (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
task VARCHAR(20),
port INT,
server VARCHAR(255),
done TINYINT NOT NULL DEFAULT 0,
state TINYINT NOT NULL DEFAULT 0,
result MEDIUMTEXT,
payload MEDIUMTEXT,
start DATETIME,
end DATETIME,
KEY idx1 (task, done),
KEY idx2 (result(1), task),
KEY idx3 (task, state),
UNIQUE (task)
) ENGINE=InnoDB;
The state column tracks the task lifecycle:
0 — available (pending)1 — processing3 — finished recently (post-job check)5 — halted (replication-manager restarted mid-task; aborted for safety)| Task | Config key | Description |
|---|---|---|
| Logical backup | scheduler-db-servers-logical-backup |
mydumper or mysqldump full backup |
| Physical backup | scheduler-db-servers-physical-backup |
mariabackup or xtrabackup xbstream |
| Log collection | scheduler-db-servers-logs |
Fetch error log and slow query log from each server |
| Log table rotate | scheduler-db-servers-logs-table-rotate |
Rotate mysql.general_log / slow_log tables |
| Optimize | scheduler-db-servers-optimize |
OPTIMIZE TABLE on all schemas |
| Analyze | scheduler-db-servers-analyze |
ANALYZE TABLE on all schemas |
| Rolling restart | scheduler-rolling-restart |
Restart replicas then primary one at a time |
| Rolling reprov | scheduler-rolling-reprov |
Re-provision services on a rolling basis (pro) |
| SSH job runner | scheduler-jobs-ssh |
Trigger the dbjobs script via SSH (osc) |
Not all jobs run the same way. Some are executed logically by replication-manager itself (via SQL connections or local binaries on the repman host), while others require a remote script (dbjobs_new) running on the database host because they need local filesystem access.
| Task | Execution | How | Why |
|---|---|---|---|
| Logical backup (mysqldump/mydumper) | Logical | repman executes the dump binary locally, connects to DB over TCP | SQL-based dump, no filesystem access needed |
| Physical backup (mariabackup/xtrabackup) | Remote | dbjobs script runs backup tool on DB host, streams via socat | Requires local access to InnoDB data files |
| Physical restore (reseed) | Remote | dbjobs script receives stream, unpacks and prepares backup locally | Requires local access to data directory |
| Optimize | Remote | dbjobs script runs mysqlcheck -o on DB host |
Batched via local client binary with --skip-write-binlog |
| Analyze | Logical | repman sends ANALYZE TABLE SQL via connection |
Pure SQL command, no local access needed |
| Log collection (errorlog/slowquery/auditlog) | Remote | dbjobs script reads log files and sends via TCP or API | Log files are local to the DB host |
| Log table rotate | Logical | repman sends TRUNCATE TABLE SQL on master |
Pure SQL command |
| Rolling restart | Remote | dbjobs script runs systemctl restart on DB host |
Requires OS-level daemon control |
| Rolling reprovision | Logical | repman coordinates re-provisioning via orchestrator | Orchestration-level operation |
| Schema monitoring | Logical | repman queries information_schema.TABLES |
Pure SQL reads |
| Checksum | Logical | repman sends CHECKSUM TABLE SQL via connection |
Pure SQL command |
Rule of thumb: if the task needs access to the database host's filesystem (data files, log files) or system services (systemctl), it runs remotely via the dbjobs script. If it only needs a SQL connection, it runs logically from replication-manager.
Each task has a matching -cron key for its schedule, e.g.:
monitoring-scheduler = true
scheduler-db-servers-logical-backup = true
scheduler-db-servers-logical-backup-cron = "0 0 1 * * *" # 01:00 every day
scheduler-db-servers-logs = true
scheduler-db-servers-logs-cron = "0 0 */6 * * *" # every 6 hours
scheduler-db-servers-optimize = true
scheduler-db-servers-optimize-cron = "0 0 3 * * 0" # Sundays at 03:00
The script that runs on each database node is a bash script named dbjobs_new. Its source lives in the replication-manager repository under the configurator's compliance init directory at share/dashboard/static/configurator/init/dbjobs_new. This script can be replaced with a custom script using the onpremise-ssh-db-job-script configuration key (see below). How it gets to the database nodes depends on the orchestration mode:
In pro mode with OpenSVC, Kubernetes, or local orchestration, the dbjobs_new script is packaged into a gzip'd config tarball. An init container that shares the same network namespace as the database container downloads this tarball from replication-manager at startup and unpacks it. The script runs inside the container on the same host as the database process, connecting to the database over 127.0.0.1.
In osc mode, or when using the onpremise orchestrator in pro, replication-manager delivers and runs the script over SSH:
{datadir}/init/init/dbjobs_newscheduler-jobs-ssh-cron), replication-manager SSHs into the database host, prepends the environment variable block, and pipes the script through the remote shellEnable SSH job execution:
scheduler-jobs-ssh = true
scheduler-jobs-ssh-cron = "*/1 * * * * *" # run every minute
onpremise-ssh = true
onpremise-ssh-port = 22
onpremise-ssh-credential = "deploy" # user; key auth used if no password
onpremise-ssh-private-key = "/home/repman/.ssh/id_rsa"
To use a custom script instead of the built-in one, point onpremise-ssh-db-job-script to your own script. The custom script will receive the same environment variables (see section 6.1.5) and should follow the same job table protocol to mark tasks as processing/done:
onpremise-ssh-db-job-script = "/opt/custom/dbjobs.sh"
When the script runs via SSH, replication-manager prepends a block of export statements before the script body. The script can use these variables without hardcoding any credentials or addresses:
| Variable | Content |
|---|---|
REPLICATION_MANAGER_URL |
Full HTTPS URL of the replication-manager API (https://host:port) |
REPLICATION_MANAGER_URL_HOST |
API hostname / IP (monitoring-address) |
REPLICATION_MANAGER_URL_PORT |
API port (api-port) |
REPLICATION_MANAGER_USER |
API admin username |
REPLICATION_MANAGER_PASSWORD |
API admin password |
REPLICATION_MANAGER_HOST_NAME |
Database server hostname |
REPLICATION_MANAGER_HOST_PORT |
Database server port |
REPLICATION_MANAGER_HOST_USER |
Database user replication-manager connects with |
REPLICATION_MANAGER_HOST_PASSWORD |
Database password (also exported as MYSQL_ROOT_PASSWORD) |
MYSQL_ROOT_PASSWORD |
Alias for REPLICATION_MANAGER_HOST_PASSWORD |
REPLICATION_MANAGER_CLUSTER_NAME |
Cluster section name from the config file |
These variables allow the script to call back into the replication-manager API, connect to the local database, and stream results without any static configuration.
Tasks that produce output (backups, log fetches) stream their data back to replication-manager using socat. replication-manager opens a TCP listener on a port from scheduler-db-servers-sender-ports and records that address in the job row. The script reads the address from the row and pipes the output directly:
# Example: stream a physical backup back to replication-manager
mariabackup --stream=xbstream ... | socat -u stdio TCP:$ADDRESS
replication-manager writes the received stream to:
{datadir}/{cluster}/{host_port}/{datadir}/Backups/{cluster}/{host_port}/backup-restic = true)Configure the TCP address replication-manager advertises to the database nodes:
monitoring-address = "10.0.1.5" # must be reachable from DB hosts
scheduler-db-servers-sender-ports = "4000,4001,4002" # port pool; one per concurrent task
If
scheduler-db-servers-sender-portsis not set, replication-manager picks any available port on the host. In firewalled environments, setting an explicit port pool allows precise firewall rules.Important: When replication-manager runs as an unprivileged user (non-root), do not use ports below 1024 as they are reserved and require root privileges to bind. Use ports in the range 1024–65535, avoiding ports already used by replication-manager itself (default: 10001 for HTTP, 10002 for Graphite Carbon, 10005 for the REST API). For example:
4000,4001,4002.
In addition to streaming bulk data (backups) over TCP/socat, the dbjobs script reports execution logs back to replication-manager through encrypted REST API calls. This mechanism is used for error logs, slow query logs, optimize output, and all other job status reporting.
The dbjobs script authenticates with replication-manager using the database server password:
POST /api/clusters/{clusterName}/servers/{host}/{port}/secret-login with the database server password encrypted using AES-256-CBCsystem API user is created (or reused) with grants db proxyAuthorization: BearerThe system user receives all db-* grants via prefix matching, including:
db-jobs — required for all job dispatch endpoints (task discovery, state reporting, log push, script upgrade)db-backup, db-restore, db-logs, db-maintenance — for backup and maintenance operationsNo manual user configuration is needed — the system service account is auto-provisioned on first secret-login call.
ACL grant:
db-jobs— Controls access to all job dispatch API endpoints. Granted automatically to thesystemuser. Can be assigned to custom service accounts via the user management API if needed.
Job execution logs are sent in real time while the job is running:
mariabackup, mysqldump, OPTIMIZE TABLE){log_dir}/{job}.outprocess_log_file() runs in parallel with the job, tailing the output file from a checkpointPOST /api/clusters/{clusterName}/servers/{host}/{port}/write-log/{task} receives each encrypted batchAll log data is encrypted in transit using AES-256-CBC:
The encrypted payload is wrapped in a JSON envelope: {"data":"<base64_encrypted_content>"}.
To avoid unnecessary network traffic, the script checks whether a given log level should be sent before transmitting:
GET /api/clusters/{clusterName}/jobs-log-level/{task}/{level}"true" or "false"Before executing a task, the script asks replication-manager whether the task is actually needed:
GET /api/clusters/{clusterName}/servers/{host}/{port}/needs/{taskname}"true" if the task should run, "false" otherwiseThis allows replication-manager to skip tasks that are not relevant (e.g. an optimize that was already completed, or a backup that is not configured for this server).
| Endpoint | Method | Purpose |
|---|---|---|
/api/clusters/{cluster}/servers/{host}/{port}/secret-login |
POST | Authenticate and obtain JWT token |
/api/clusters/{cluster}/servers/{host}/{port}/write-log/{task} |
POST | Push encrypted job execution logs |
/api/clusters/{cluster}/servers/{host}/{port}/needs/{taskname} |
GET | Check if a task should run on this server |
/api/clusters/{cluster}/jobs-log-level/{task}/{level} |
GET | Check if a log level is enabled for a task |
/api/clusters/{cluster}/servers/{host}/{port}/actions/receive-jobs-check |
GET | Open TCP receiver port for script update |
The script includes retry logic for API calls:
{log_dir}/api_calls.log## Log batch size — number of log lines sent per API call (default: 5)
scheduler-db-servers-log-batch-size = 5
replication-manager supports two modes for dispatching maintenance tasks to database hosts. The mode is controlled by scheduler-jobs-mode.
scheduler-jobs-mode = "sql"
The traditional mode. replication-manager inserts task rows into the replication_manager_schema.jobs table on each database server. The dbjobs script polls this table, picks up pending tasks, executes them, and updates the row with the result.
This mode requires:
replication_manager_schema database and jobs table (created automatically)scheduler-jobs-mode = "api"
In API mode, replication-manager sets cookie files (trigger signals) instead of inserting SQL rows. The dbjobs script discovers pending tasks by calling the replication-manager REST API, executes them, and reports completion back via the API.
This mode:
replication_manager_schema.jobs table (it is automatically dropped when switching to API mode)start task to bring the database back up)replication_manager_schema database is still created silently for checksum and benchmark featuresAll job API endpoints require the db-jobs ACL grant. The dbjobs script authenticates via secret-login to obtain a JWT token, then passes it as Authorization: Bearer on every call.
| Endpoint | Method | When called | Purpose |
|---|---|---|---|
/api/clusters/{cluster}/servers/{host}/{port}/secret-login |
POST | Script startup | Authenticate with encrypted DB password, obtain JWT token |
/api/clusters/{cluster}/servers/{host}/{port}/needs/{task} |
POST | Before each task | Check if a task is pending (consumes cookie). Returns true or false |
/api/clusters/{cluster}/servers/{host}/{port}/actions/receive-task/{task} |
POST | After needs returns true |
Open a TCP receiver for tasks that stream data (backups, logs). Returns RECEIVER_PORT=<port> |
/api/clusters/{cluster}/servers/{host}/{port}/actions/job-state/{task}/{state} |
POST | During/after task | Report task state: processing, done, error, waiting |
/api/clusters/{cluster}/servers/{host}/{port}/write-log/{task} |
POST | During task | Push encrypted execution log lines |
/api/clusters/{cluster}/jobs-log-level/{task}/{level} |
POST | Before sending logs | Check if a log level (INFO/ERROR/WARN/DEBUG) is enabled |
/api/clusters/{cluster}/servers/{host}/{port}/actions/receive-jobs-check |
POST | Script version check | Open receiver for script checksum comparison |
/api/clusters/{cluster}/servers/{host}/{port}/actions/send-jobs-upgrade |
POST | Script upgrade | Request new script version from replication-manager |
1. Scheduler fires
└── replication-manager sets a cookie file (@cookie_wait*)
└── WARN state set (e.g. WARN0073 for backup pending)
2. dbjobs script runs (via SSH cron or container entrypoint)
├── POST secret-login → obtains JWT token
├── POST needs/{task} → API checks cookie, returns true, deletes cookie
├── POST actions/receive-task/{task} → opens TCP receiver (for streaming tasks)
├── POST actions/job-state/{task}/processing → repman updates in-memory state
├── Script executes the task (backup, optimize, etc.)
├── POST write-log/{task} → push execution logs
├── POST actions/job-state/{task}/done → repman closes WARN state
└── Next monitoring tick: WARN not re-set → state machine closes it
Some tasks can run either locally (by replication-manager) or remotely (by the dbjobs script). Use scheduler-jobs-exec-remote to control which tasks are dispatched to the dbjobs script.
| Task | Default | Can override |
|---|---|---|
| Physical backup (xtrabackup/mariabackup) | Remote | No — requires filesystem access |
| Reseed / Flashback (physical) | Remote | No — requires filesystem access |
| Log collection (errorlog/slowquery/auditlog) | Remote | No — reads local log files |
| ZFS snapback | Remote | No — requires ZFS commands |
| Optimize | Remote | Yes — can run as SQL from repman |
| Logical backup (mysqldump/mydumper) | Local | Yes — can run on DB host to avoid version mismatch |
| Stop / Start / Restart | Remote | Yes — orchestrator API (OpenSVC/K8S) or SQL SHUTDOWN handles it locally |
## Force mysqldump and optimize to run remotely via dbjobs
scheduler-jobs-exec-remote = "mysqldump,optimize"
When a task is forced remote, the dbjobs script runs the command on the database host using the local client binaries. This avoids client/server version mismatch issues (e.g. mysqldump version must match the server version).
Job results are persisted to serverstate.json alongside other server metadata (variables, table dictionary, etc.). After a replication-manager restart, the maintenance tab shows full job history immediately — no need to wait for the next SQL refresh or API callback.
| Layer | SQL mode | API mode |
|---|---|---|
| Task dispatch | SQL INSERT (requires DB credentials) | Cookie file on repman host (local filesystem) |
| Script authentication | secret-login → JWT token |
Same |
| Endpoint protection | SecretLoginCheck (DB password) |
JWT validateTokenMiddleware + db-jobs ACL grant |
| Log encryption | AES-256-CBC (key = SHA256 of DB password) | Same |
| Data streaming | socat TCP (port from jobs table row) | socat TCP (port from receive-task API) |
The system user created by secret-login receives grants "db proxy", which includes all db-* grants (including db-jobs) via prefix matching. No additional user configuration is needed — see Authentication and Authorization for details.
Note: The
secret-loginendpoint itself is unauthenticated — it is the login mechanism. It validates the caller by requiring them to prove knowledge of the database server password (encrypted with AES-256-CBC). All other job endpoints require the JWT token with thedb-jobsACL grant.