What's in this guide
The one mental model that fixes everything Five non-negotiable principles The safe setup, step by step Anti-patterns that get people breached The pre-flight checklist FAQThe one mental model that fixes everything
Almost every safe-agent question answers itself once you adopt a single mental model: treat the agent as an untrusted actor. Not because the model is malicious, but because it's non-deterministic. It can be right 999 times and, on the 1,000th, decide that the cleanest way to fix a credential mismatch is to reset the database. You cannot review reasoning that happens in tokens. So you don't try — you put controls around the actions instead.
An untrusted actor doesn't get raw production credentials. It doesn't get to perform an irreversible action without a human saying yes. It doesn't get access to systems it doesn't need. And everything it does is recorded. That's it. That's the whole philosophy — the rest of this guide is just how to implement it.
"It's been fine so far" is not a control. The agents that deleted production databases had also been fine — right up until the run where they weren't. Safety has to be structural, not statistical.
Five non-negotiable principles
1. The agent never decides its own access
Access is granted by a layer the agent doesn't control, scoped to exactly what the current task needs. An agent debugging a slow endpoint needs to read logs and traces — it does not need a credential that can drop tables. Least privilege, enforced outside the agent.
2. The agent never sees raw credentials
If the agent holds your database password or a root API key, you've already lost — a prompt injection, a hallucinated command, or a bad plan now has the keys. The agent should act through a control layer that holds the credentials and exposes only governed actions.
3. State changes require human approval
Reads can flow freely. But anything that changes production — a deploy, a migration, a delete, a scale, a config edit — stops at a gate and waits for an explicit human approval. This is the single highest-value control you can add, and it's the one the headline incidents were missing.
4. Everything is reversible and recorded
Every approved action carries a rollback path before it runs, and emits a signed audit entry after. "Undo" is a button. "What did it do?" has an instant, tamper-evident answer.
5. Production is a destination, not a default
The agent's normal working environment is not production. Reaching prod is itself an explicit, gated step — so the agent can't drift into it by accident.
The safe setup, step by step
Put a control layer between the agent and prod
Instead of handing the agent SSH keys and a database URL, route its production actions through a control plane that holds the credentials and exposes a fixed set of governed operations (deploy, restart, read logs, trace a request, roll back). The agent calls the operation; the layer decides whether and how it runs.
Scope permissions to the task
Grant the minimum: read-only for diagnosis, narrow write scopes for specific actions. No standing credential that can delete data or backups. If the agent needs more, that's a deliberate, logged grant — not a default.
Gate every state change behind human approval
Configure the layer so destructive and production-changing actions pause and require a person to approve. The agent can draft the fix, attach the reasoning and a rollback plan, and queue it — but nothing ships until you click approve.
Isolate backups out of reach
Keep snapshots off-box, under credentials the operating layer cannot touch. The blast radius of any single action must never include your recovery path.
Record and prove everything
Every action — proposed, approved, executed — produces a signed, inspectable record. When something looks wrong, you can see exactly what happened and revert it, instead of interrogating a "panicking" agent.
Infraveil is this control layer.
It installs on your own servers and becomes the one place your tools — humans and AI agents alike — act on your backend. Agents propose; you approve; the plane executes with scoped permissions and writes a tamper-evident audit trail. You get the speed of agentic ops without handing the model the keys to production.
See the live demo →Anti-patterns that get people breached
--dangerously-skip-permissionson a production box. It removes the one gate between the model's plan and an irreversible action. The flag is named after exactly what it does.- Pasting prod DB credentials or root API keys into the agent's context. Now any bad plan or injected instruction has full power.
- Letting the agent run in the same environment it's meant to manage. An agent should never control the box it itself runs on.
- Auth or paywall checks the agent "added" living only in the client. AI frequently ships access control to the browser; verify it server-side.
- "We'll add guardrails later." Later is after the incident. The gate is cheap before; the breach is expensive after.
The pre-flight checklist
- The agent acts through a control layer, not with raw production credentials.
- Access is least-privilege and granted outside the agent's control.
- Every state-changing action requires explicit human approval.
- Production is an explicitly-gated destination, separate from the agent's working environment.
- Backups are off-box and beyond the operating credentials' reach.
- Every action is reversible and produces a signed audit record.
--dangerously-skip-permissionsis never used against production.
Frequently asked questions
Is it safe to give Claude Code or Cursor access to production?
It can be — if the agent acts through a control layer instead of holding raw production credentials, every state change is human-approved, and everything is audited. Direct, unsupervised production access is how the worst incidents happened.
What's the safest way for an AI agent to act on production?
Agent proposes → human approves → control layer executes with scoped permissions → signed audit entry recorded. The agent never sees raw secrets and never runs a destructive action without sign-off.
Should I use --dangerously-skip-permissions on a production server?
No. It removes the approval gate that prevents the model's plan from becoming a live, irreversible action. It's the setting behind the most-cited AI-agent disasters.
How is this different from just using a deploy platform?
A deploy platform gets your code onto infrastructure. It doesn't put a human-approval gate, least-privilege governance, and tamper-evident audit between your agents and production. That's the operating layer — a different job.
Give your agents production access without the risk.
Infraveil is one control plane on your own servers — deploy, supervise, secure, recover, and prove what happened, with every change gated by your approval. Let the agent move fast; keep the keys yourself.
Enter the live demo →