Uptime targets and the downtime they allow
An uptime SLA is a promise about how little downtime you'll have over a period. The arithmetic is simple — allowed downtime = (1 − uptime) × length of the period — but the results surprise people because the percentages sound airtight and the minutes are brutal. Three nines (99.9%) sounds bulletproof; it's under 44 minutes a month. Four nines leaves you about 4 minutes. Five nines is 26 seconds.
Error budgets: the number that actually runs your reliability
The gap between your target and 100% is your error budget — the downtime you're allowed to "spend" before you breach. Every outage, maintenance window, and bad deploy draws it down. The error-budget tracker above shows how fast a single incident eats a month: at 99.9%, one 30-minute bad deploy spends two-thirds of your entire monthly allowance in one afternoon.
The biggest, most controllable drain on an error budget isn't hardware — it's change. Deploys, migrations, config edits, and AI agents acting on production. That category is self-inflicted, which means it's the one you can take off the table.
Spend your budget on real surprises, not avoidable mistakes
If change is what burns the budget, gate change. A control plane sits between everything that can modify production — humans, scripts, and AI agents — and production itself: deploys and migrations pause for approval, run with least-privilege access, and are reversible with an audit trail. Your error budget then gets spent on genuine incidents, not a typo in a migration.
Stop spending nines on self-inflicted outages.
Infraveil is a control plane you run on your own servers. Every production-changing action is gated by your approval, scoped to least privilege, reversible, and written to a tamper-evident audit trail — so the deploys and agent mistakes that quietly burn your error budget mostly never ship.
See the live demo →Frequently asked questions
How much downtime does 99.9% uptime allow?
About 43.8 minutes per month, ~10 minutes per week, ~1.4 minutes per day, and ~8h 46m per year. That allowance is your error budget.
What's the difference between 99.9% and 99.99%?
An order of magnitude: 99.9% allows ~44 min/month, 99.99% allows ~4 min/month. Each extra nine cuts your allowed downtime by 10×, and usually costs far more than 10× the engineering to achieve.
What is an error budget?
The amount of downtime your SLA permits before you're in breach. Track it like a balance: outages and risky deploys are withdrawals.
How do I stop deploys from burning my budget?
Gate every production-changing action behind human approval with least-privilege access and one-click rollback. That removes the largest self-inflicted source of downtime. That's what Infraveil does →