docs: Phase 5 — incident log, maintenance runbook, NUT config archive

beast/nut/upssched.conf: fixed earlyshutdown timer (30s→300s) and added
  cancel rule. Archived post-fix from Beast.
beast/nut/upssched-cmd: archived for reference.

runbooks/phase5-incident-log.md:
  INC-005: NUT upssched earlyshutdown bug — cause, fix, break-test findings.
  INC-001 through INC-004: PBS USB hub/data/re-diagnosis/sdj end-of-life.
  P5-01 through P5-11: pending work queue including Pi4 node-exporter,
  compute5 PCIe PM, PBS recovery gate, qnetd migration, Authelia metrics,
  cert renewal verification, ansible-control disk, Phase 4A resume.

runbooks/maintenance-runbook.md: PBS recovery steps, UPS test procedure,
  cert renewal schedule (May 11/18), Phase 4A per-node procedure, qnetd
  migration, Authelia metrics setup, recurring checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
tommy
2026-05-06 21:25:49 -05:00
parent 545d3563b8
commit 909fe3dc12
4 changed files with 551 additions and 0 deletions

20
beast/nut/upssched-cmd Normal file
View File

@@ -0,0 +1,20 @@
#!/bin/sh
case $1 in
onbatt)
logger -t upssched-cmd "UPS running on battery"
;;
earlyshutdown)
logger -t upssched-cmd "UPS on battery too long, early shutdown"
/usr/sbin/upsmon -c fsd
;;
shutdowncritical)
logger -t upssched-cmd "UPS on battery critical, forced shutdown"
/usr/sbin/upsmon -c fsd
;;
upsgone)
logger -t upssched-cmd "UPS has been gone too long, can't reach"
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac