Healthcheck endpoints, done right.
Most healthchecks are subtly wrong — they check the database in liveness and turn a brief blip into a restart storm. Pick your framework and get correct /healthz (is the process alive) and /readyz (are dependencies reachable) endpoints.
The subtle bug in most healthchecks
It looks harmless: a single /health endpoint that pings the database and returns 200. The problem shows up the day the database has a two-second hiccup — every instance fails its check at the same moment, the orchestrator concludes they're all dead and restarts them, and a blip you'd never have noticed becomes a full restart storm. The fix is the liveness/readiness split: liveness fails only when a restart would actually help, readiness pulls an instance out of rotation without killing it.
Getting the endpoints right is half of it. The other half is what acts on them — the supervisor that restarts on a failed liveness check, the thing that notices a wedged-but-not-crashed process, and the record of when and why it happened. That layer, watching your services across the hosts you own, is what a control plane provides.
Endpoints report. Something has to act.
Infraveil watches your healthchecks across every host you own, restarts what's actually dead, routes around what's not ready, and keeps a tamper-evident record of every recovery — the layer that turns a healthcheck into uptime.
See how it worksGet the keep-it-running playbook
Healthchecks, supervision, and recovery for a backend you run yourself. No spam.