INFRAVEILLive demo →
● Open source · MCP server

A seatbelt for your AI agent

Coding agents are great until the one time they run rm -rf in the wrong directory or drop the production database to "fix" a migration. infraveil-guard puts a governed gate in front of the destructive things an agent can do — and stops the catastrophic ones until a human says yes.

On this page
What it does · Wire it into your agent · How approval works · Inspect everything · What it isn't

What it does

Three things, all on your machine, all readable in about 400 lines of plain Python:

1 — Classify

Knows what's dangerous

Every action your agent proposes is scored for blast radius: rm -rf, DROP TABLE, DELETE with no WHERE, terraform destroy, git push --force, kubectl delete namespace, world-writable perms, piping the internet into a shell, and more. Ordinary work sails through.

2 — Gate

Stops the catastrophic ones

Anything at or above your threshold (default: high) is blocked until a human approves it out of band. The agent gets back proceed: false and a clear summary of what it wanted to do — and waits.

3 — Record

Leaves tamper-evident proof

Allowed, blocked, approved, denied — every decision is appended to a hash-chained local ledger. Editing, deleting, or reordering any line breaks the chain, and infraveil-guard verify tells you exactly where.

Wire it into your agent

Add it as an MCP server in Claude Code, Cursor, or any MCP client:

{ "mcpServers": { "infraveil-guard": { "command": "infraveil-guard" } } }

Then add one line to your agent's instructions (CLAUDE.md, system prompt, rules):

Before running any shell command, SQL statement, or infrastructure/cloud operation, first call guard_action with the exact command. Only proceed if it returns proceed: true. If it returns blocked, stop and ask me to approve it.

How approval works

When the agent hits something dangerous it stops and hands you an action_id. You approve in your own terminal — the agent can't do this for itself:

$ infraveil-guard approvals [9b58e9c499b3] CRITICAL drop table. Irreversible. DROP TABLE users; $ infraveil-guard approve 9b58e9c499b3 command: DROP TABLE users; Approve this action? [y/N] y APPROVED. Give the agent this one-time code: 8f2510 (valid 15 min, works once)

You hand the agent the code; it calls guard_action again with approval_code set; the gate lets it through exactly once and records the approval. The code is minted only by the human CLI, is single-use, and expires — so it can't be forged or replayed.

Inspect everything — trust nothing

It talks to nobody. No account, no network, no telemetry — open your tools panel and watch. Your log lives at ~/.infraveil-guard/ledger.jsonl, and you can re-verify its integrity any time:
$ infraveil-guard verify { "ok": true, "count": 42, "message": "Hash chain verified across 42 entries - no tampering." }

It's AGPL and the whole thing is short enough to read in a sitting. That's the point — a safety tool you can't audit isn't a safety tool.

What it is — and isn't

It is a high-signal classifier, an out-of-band human-approval gate, and a tamper-evident local log: the smallest honest version of "a human approves before anything irreversible happens."

It is not a sandbox. It works because your agent is told to route actions through it — a cooperative guardrail, not an unbypassable jail. When you need a gate the agent can't skip, because it runs inside the governed runtime, with least-privilege scoping, central audit, and one-click rollback across a whole fleet — that's the full Infraveil control plane. This is the doorway; that's the house.

Get it on GitHub → See the control plane →