Files
homelab-configs/monitoring/node-exporter-compute5-compose.yml
tommy e3ee020d53 monitoring: add postfix queue + cert expiry scripts, Phase 4D alerts
postfix-queue-check.sh:
  - Reads mailq queue depth, writes postfix_queue_size{host=X} textfile metric
  - Deployed on compute3 (systemd node_exporter) and compute5 (Docker)
  - Cron: */5 * * * * as root on each host
  - Prometheus alert: postfix_queue_size > 10 (uid: efl8kjns461a8f)

node-exporter-compute5-compose.yml:
  - Added textfile volume mount /var/lib/node_exporter/textfile:/textfile:ro
  - Added --collector.textfile.directory=/textfile flag

cert-expiry-check.sh:
  - Also stored here for monitoring/ grouping

Phase 4D Grafana alert rules (all in Infrastructure Alerts folder):
  cfl8jqdlhu680d  TLS Cert Expiry Warning (30d)        — break-tested ✓
  afl8jqdoepwqod  TLS Cert ACME Renewal Failure (14d)  — no real certs in window
  ffl8k2ry0nu2od  Alertmanager Down                     — break-tested, fired ✓
  efl8kjns461a8f  Postfix Queue Backing Up              — metric confirmed, 5m window
  dfl8k2s0xjklcf  Authelia Restart Loop                 — cadvisor-based proxy metric

Rules stored in grafana.db only — not yaml-provisioned (Phase 5 candidate)
2026-05-06 05:46:07 -05:00

22 lines
656 B
YAML

# node-exporter - Host metrics exporter for Prometheus
# Host: compute5 (192.168.99.196), Port: 9100
services:
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
pid: host
network_mode: host
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- /var/lib/node_exporter/textfile:/textfile:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
- '--collector.textfile.directory=/textfile'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'