Prometheus-alertmanager and graphana (especially graphana!) seem a bit too involved for monitoring my homelab (prometheus itself is fine: it does collect a lot of statistics I don’t care about, but it doesn’t require configuration so it doesn’t bother me).
Do you know of simpler alternatives?
My goals are relatively simple:
- get a notification when any systemd service fails
- get a notification if there is not much space left on a disk
- get a notification if one of the above can’t be determined (eg. server down, config error, …)
Seeing graphs with basic system metrics (eg. cpu/ram usage) would be nice, but it’s not super-important.
I am a dev so writing a script that checks for whatever I need is way simpler than learning/writing/testing yaml configuration (in fact, I was about to write a script to send heartbeats to something like Uptime Kuma or Tianji before I thought of asking you for a nicer solution).
It’s as complex as you make it, is linux native, is scriptable, doesn’t use YAML, is native to the OS, and is free as in beer. Just like SNMP. however they’ll also get logs at a central server they can drill into if needed.
Which I believe fulfills the requirements of OPs post.
Sidenote, self-hosting is absolutely overkill just as a theory and process. I often read responses to suggestions as this or that is overkill, or complicated, or non-trivial effort.
The self hosting community is a broad spectrum of users , from those with home labs on an old dying laptop to those with a full rack setup. People have different needs and interests. Some are learning infra and devops for work or to get into a new job. Some are privacy minded. Some are trying to get the most bang for their buck. Some just want to pay for a cloud hosted solution. Some just want an automated home. Some run a home business.
Edit: to the point of your valid and helpful SNMP post, most syslog servers also will ingest and report on SNMP traffic as well. The container I linked does exactly that. If they find they want to automate processes in the future they can also trigger on the syslog stream as well. But that complexity is only there if they want it. Otherwise it’s just a stream they can parse and trigger an alert, just like SNMP. So OP could have an extensible solution if they want to expand. Also Grafana/Prometheus will take in syslog natively with a couple standard YAML configs if they choose that they want to look at that solution again in the future.
/Rant