Config File
sznuper uses a single YAML config file. By default it looks for:
~/.config/sznuper/config.yml(as user)/etc/sznuper/config.yml(as root)
Override with --config <path>. Environment variables are supported anywhere in the file using ${VAR_NAME} syntax.
The config has four top-level sections:
options
Section titled “options”Paths for healthcheck storage, caching, and logs.
options: healthchecks_dir: /etc/sznuper/healthchecks cache_dir: /var/cache/sznuper logs_dir: /var/log/sznuperAll fields are optional. Defaults depend on whether sznuper runs as root or user.
| Field | Root default | User default |
|---|---|---|
healthchecks_dir | /etc/sznuper/healthchecks | ~/.config/sznuper/healthchecks |
cache_dir | /var/cache/sznuper | ~/.cache/sznuper |
logs_dir | /var/log/sznuper | ~/.local/state/sznuper/logs |
globals
Section titled “globals”Arbitrary key-value pairs accessible in notification templates. Useful for shared values like hostname.
globals: hostname: my-server environment: productionchannels
Section titled “channels”Notification channels using Shoutrrr URLs. Each channel has a name, a URL, and optional default params.
The channel name is arbitrary - it’s just a label you use to reference it later in alerts. You can have multiple channels of the same type with different names:
channels: telegram-ops: url: telegram://${TELEGRAM_TOKEN}@telegram params: chats: ${OPS_CHAT_ID} telegram-alerts: url: telegram://${TELEGRAM_TOKEN}@telegram params: chats: ${ALERTS_CHAT_ID} discord: url: discord://${DISCORD_TOKEN}@${DISCORD_WEBHOOK_ID}Then reference them by name in your alerts:
alerts: - name: disk ... notify: - telegram-ops - discordalerts
Section titled “alerts”A list of alerts. Each alert defines what to check, when to check it, and who to notify. Example:
alerts: - name: disk healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/disk_usage sha256: abc123... triggers: - interval: "5m" timeout: "30s" args: mount: / threshold_warn_percent: 80 threshold_crit_percent: 95 template: "Disk usage on {{globals.hostname}}: {{event.usage_percent}}%" cooldown: "1h" notify: - telegramAlert fields
Section titled “Alert fields”| Field | Required | Description |
|---|---|---|
name | yes | Unique name for this alert |
healthcheck | yes | URI of the healthcheck (file://, https://, or builtin://) |
sha256 | no | SHA-256 hash for remote healthchecks, or false to skip verification |
triggers | no | List of triggers (see below) |
timeout | no | Max execution time (e.g. "30s") |
args | no | Key-value arguments passed as HEALTHCHECK_ARG_* env vars |
side_effects | no | Shell commands to run after event processing |
template | yes | Go template for the notification message (see below) |
cooldown | no | Suppress repeated notifications (e.g. "5m", "1h") |
notify | yes | List of channels to notify |
events | no | Per-event-type configuration (see below) |
Triggers
Section titled “Triggers”A list of triggers. Each alert can have multiple triggers and they all run independently. Example:
triggers: - interval: "5m" - cron: "0 9 * * 1" - cron: "0 18 * * *"This alert would run every 5 minutes, every Monday at 9am, and every day at 6pm.
Available trigger types:
| Type | Description |
|---|---|
interval | Run on a fixed interval (e.g. "5m", "30s") |
cron | Cron expression, 5 or 6 fields (e.g. "0 9 * * *") |
watch | Run when a file changes (e.g. /var/log/app.log) |
pipe | Continuous shell command whose stdout is fed to the healthcheck (e.g. "tail -F /var/log/app.log") |
lifecycle | Special trigger that fires on daemon start, stop, and config reload. Only works with the builtin://lifecycle healthcheck. |
Lifecycle event types
Section titled “Lifecycle event types”The builtin://lifecycle healthcheck emits a single event whose type field indicates what happened:
| Event type | When it fires |
|---|---|
started | Daemon starts |
stopped | Daemon stops |
reload_success | Config reloaded successfully via SIGHUP |
reload_failure | Config reload failed validation (daemon keeps running with previous config) |
You can use events.override to customize behavior per event type, for example routing reload_failure to a higher-priority channel.
Templates
Section titled “Templates”Templates use Go’s text/template syntax with Sprig functions. Four scopes are available:
| Scope | Description |
|---|---|
event | Fields from the healthcheck output (e.g. {{event.type}}, {{event.usage_percent}}) |
globals | Values from the globals config section (e.g. {{globals.hostname}}) |
alert | Alert metadata (e.g. {{alert.name}}) |
args | Arguments from the alert’s args field (e.g. {{args.mount}}) |
Example:
template: |- [{{event.type | upper}}] {{globals.hostname}}: Disk {{args.mount}} at {{event.usage_percent}}% ({{event.available}} remaining)Notify targets
Section titled “Notify targets”A list of channels to notify. In the simplest form, just the channel name:
notify: - telegram - discordYou can also override params per notification. The params are merged on top of the channel’s base params - any key you set here wins over the channel default. Params are passed as query parameters in the Shoutrrr URL.
notify: - telegram - telegram: params: chats: ${ANOTHER_CHAT_ID} # override the default chat notification: "false" # send silently - discord: params: username: sznuper-bot avatar_url: https://example.com/avatar.pngThis sends to the default telegram chat, a second telegram chat silently, and discord with a custom bot name and avatar.
Events
Section titled “Events”Each alert has template, notify, and optionally cooldown that apply to all event types by default. The events section lets you override these per event type.
| Field | Description |
|---|---|
healthy | List of event types considered healthy. When sznuper sees a healthy event after unhealthy ones, it resets cooldowns. |
on_unmatched | What to do with event types not listed in override: "notify" (default) or "drop". |
override | Per-event-type overrides for template, cooldown, and notify. |
For example, say you have a disk usage alert with a default cooldown of "1h" and a simple template. But for critical_usage events you want a more urgent message, a shorter cooldown, and to also notify discord:
alerts: - name: disk healthcheck: ... triggers: - interval: "5m" template: |- [{{event.type | upper}}] {{globals.hostname}}: Disk at {{event.usage_percent}}% cooldown: "1h" notify: - telegram events: healthy: - ok on_unmatched: notify override: critical_usage: template: |- CRITICAL: {{globals.hostname}} disk at {{event.usage_percent}}%! Only {{event.available}} remaining on {{args.mount}} cooldown: "5m" notify: - telegram - discordHere, ok and high_usage events use the alert-level defaults (1h cooldown, telegram only). But critical_usage gets its own template, a 5m cooldown, and notifies both telegram and discord.
Example config
Section titled “Example config”options: healthchecks_dir: /etc/sznuper/healthchecks cache_dir: /var/cache/sznuper logs_dir: /var/log/sznuper
globals: hostname: my-server
channels: telegram: url: telegram://${TELEGRAM_TOKEN}@telegram params: chats: ${TELEGRAM_CHAT_ID}
alerts: - name: disk healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/disk_usage sha256: abc123... triggers: - interval: "10m" args: mount: / threshold_warn_percent: 80 threshold_crit_percent: 95 template: |- Disk usage on {{globals.hostname}} Mount: {{event.mount}} Usage: {{event.usage_percent}}% Available: {{event.available}} cooldown: "1h" notify: - telegram events: healthy: - ok
- name: ssh healthcheck: https://github.com/sznuper/healthchecks/releases/download/v0.4.0/ssh_journal sha256: abc123... triggers: - pipe: "journalctl -fu sshd --output=json" template: |- SSH {{event.type}} on {{globals.hostname}} User: {{event.user}} Host: {{event.host}} notify: - telegram events: override: login: cooldown: "0" failure: cooldown: "5m"