Agents like Claude Code and Cursor can now ship backend changes on their own. Banning them loses the speed; trusting them blindly loses production. This guide is the middle path: the principles and the concrete, inspectable practices for letting an agent operate production without handing it the keys.
A human who fat-fingers a destructive command usually notices, hesitates, or gets stopped by a teammate. An AI agent executes at machine speed, doesn't hesitate, and will confidently take an action that looks right from the text it was given but is wrong for your system. The failure modes that matter aren't malice — they're a plausible-but-wrong command, a hallucinated file path, an over-broad permission, or a remediation that fixes one thing and breaks another.
So the goal isn't to make the agent perfect. It's to make the system around the agent safe: bound what any single action can do, require a human where the stakes are high, and make everything that happened inspectable after the fact. The rest of this guide is how.
An agent should hold the narrowest set of capabilities that lets it do its job — and nothing it doesn't currently need. Don't hand it your cloud root credentials so it can restart one service. Scope tokens to a single client, agent, or service; prefer per-action grants over standing access.
Reads are cheap and safe; let the agent read freely. Mutations — deploys, migrations, deletes, permission changes — should be requests that enter an approval queue, not actions the agent applies itself. The agent proposes; a human disposes. This single boundary eliminates most catastrophic outcomes.
You should never run privileged code you can't read, and an agent should never run code it can't verify. The agent that has authority over your machine should verify its own source (signature and hash) before executing anything, and you should be able to diff what's running against published source. Trust by inspection, not assertion.
Assume any single action could be wrong, and design so that "wrong" is survivable. Constrain an agent to one host or one service rather than the whole fleet; roll changes out gradually; keep one-click rollback ready. The question to ask of every grant is: "if this goes wrong, what's the largest thing it can take down?"
Every action an agent takes should land in an append-only, hash-chained ledger that you can verify yourself — offline, trusting nothing. After an incident, "what exactly happened, in what order, and did anyone edit the record?" should have a cryptographic answer, not a verbal one.
Governance that lives in someone's head doesn't survive an incident at 3am. Put it in a policy file in your repo that states what may change production, which agent may do what, and what always needs a human — then enforce that same policy in CI and at runtime so local and prod can't drift. Infraveil's policy DSL is open source; you can enforce it free in a CI gate.
Instead of handing an agent raw SSH or cloud credentials, give it a governed surface where it can read state and request changes that route through approval. An MCP server is a natural fit for Claude Code / Cursor: the agent queries runtime state and files deploy or remediation requests, but cannot apply a change on its own.
Before an agent goes near production, know what it could touch and whether the target is even ready. Two quick checks: the AI-agent blast-radius checker ("what can this agent actually destroy?") and the production-readiness checker (paste a Dockerfile/compose/.env and get a graded checklist).
Trust, then verify — yourself. Re-hash your agent's audit ledger to confirm nothing was edited, deleted, reordered, or gapped, and verify release signatures against the published key. See how Infraveil makes the customer-side code inspectable.
The three practices above compose into one shape: a signed, supervised agent on your own server; a governance policy enforced in CI; and a governed MCP server where every change needs human approval. We've published a working, forkable skeleton — policy file, CI gate, MCP wiring, and verification scripts:
What can this agent actually destroy?
Grade a Dockerfile/compose/.env before you ship.
Catch leaked keys before an agent commits them.
Translate a cryptic deploy error into a fix.
More at infraveil.com/tools. Related reading: add governance to your existing deploy.
Yes — but not with unrestricted access. Let it propose and request changes; route everything that touches production through a policy and a human approval. The agent gets speed, you keep control.
Least-privilege scoping, allowed/denied actions declared in a governance policy enforced in CI and at runtime, and a blast radius limited to a single host or service rather than the whole fleet.
Run an agent that verifies its own code (signature and hash) before executing, keep a tamper-evident audit ledger you can re-hash yourself offline, and verify release signatures against a published key.