2 Commits

Author SHA1 Message Date
tommy
10b60761ff traefik: add cert-expiry-check.sh with Prometheus textfile output
Reads acme.json hourly on docker-node01, writes:
  traefik_cert_expiry_days{domain=X} N
  traefik_cert_check_last_run_seconds EPOCH

Two Grafana alert thresholds:
  Warning  < 30d: auto-renewal window opened, ntfy high priority
  Critical < 14d: ACME renewal failed, ntfy urgent

Textfile at /var/lib/node_exporter/textfile/cert_expiry.prom
Scraped by existing node-exporter job on 192.168.99.186:9100
Grafana rules: cfl8jqdlhu680d (warning), afl8jqdoepwqod (critical)
Break-tested: 35d threshold fired for vault/pdf/scrutiny/gitea correctly.

Cron: 0 * * * * sudo /usr/local/bin/cert-expiry-check.sh
2026-05-06 05:34:31 -05:00
tommy
7fac4fc9c7 traefik: sync dynamic_conf.yml and add drift-check cron
node02 was missing two blocks from node01 (canonical):
- strip-trailing-dot-speedtest middleware (regex redirect for speedtest.goattw.net. URLs)
- speedtest-trailing-dot router (catches trailing-dot Host header variant)

crowdsecLapiHost intentionally differs: node01 uses Docker service name
(crowdsec:8080, container on same host); node02 points to node01 IP
(192.168.99.186:8081, node02 has no local CrowdSec instance).

Added traefik-drift-check.sh — runs daily at 06:00 on ansible-control,
diffs both configs (excluding known crowdsecLapiHost difference),
posts to ntfy homelab-alerts on unexpected divergence.

Traefik hot-reloaded on node02 via SIGHUP — no restart required.
2026-05-05 20:17:17 -05:00