March 30, 2026

Why I Built Warden

Name: Warden
Author: Bitmill

by Liel Kaysari

I spend my days building cloud SDKs and .NET tooling. When AI coding agents arrived, I did what every developer does — I went all in. Claude Code, Gemini CLI, multiple agents running in parallel across different repos. It was great until it wasn’t.

The wake-up calls

The first time an agent ran rm -rf on a directory it had misidentified as temp files, I caught it by luck — I happened to be watching the terminal. The second time an agent looped for 47 turns trying to fix a type error it had introduced three turns earlier, I wasn’t watching. I came back to a burnt context window and zero progress.

Then there was the agent that helpfully added // Generated by Claude to every file it touched. And the one that ran git push --force to main because it thought that was the fastest way to resolve a merge conflict.

These aren’t hypothetical failure modes. These are Tuesday.

The gap in the market

The obvious answer is “just write better instructions.” So I did — carefully crafted CLAUDE.md files, detailed rules, explicit prohibitions. The problem: those instructions live inside the context window. The model can ignore them. It can hallucinate past them. And when context gets compacted (which happens every session), the rules are often the first thing to go.

Other tools exist. Bash wrappers that alias dangerous commands. Prompt engineering frameworks. Output formatters. But none of them operate where it matters: at the tool call level, before the action executes.

I needed something that sits between the agent and my codebase. Something that evaluates every tool call — read, write, bash, MCP — and can block, redirect, or rewrite it. Something the model can’t talk its way around.

Why it had to be Rust

This thing runs on every single tool call. If your agent makes 200 tool calls in a session (which is normal), and each one adds even 50ms of overhead, that’s 10 seconds of accumulated latency. You’d feel it.

Warden evaluates hooks fast enough that the agent never waits. The entire rule engine — hundreds of compiled patterns covering safety, governance, and intelligence — runs as a single static binary with zero runtime dependencies. No Node runtime, no Python interpreter, no Docker container. One binary, three platforms.

I considered Go (fast enough, but the binary size and GC pauses bothered me) and TypeScript (the ecosystem is right, but the performance ceiling is too low for something on the critical path). Rust was the only option that gave me the performance headroom I needed with deterministic behavior.

What makes this different

Warden is not a prompt wrapper. It doesn’t inject text into your system prompt and hope the model follows it. It’s not an API validator that checks responses after the fact.

It’s a runtime layer. When Claude Code calls the Bash tool with grep -r "TODO" src/, Warden intercepts that call before it executes, checks it against compiled rules, and returns a block: “Use rg instead.” The model never gets a chance to ignore it — the hook returns deny and that’s the end of it.

The same mechanism handles output compression. When cargo test produces 500 lines of output, Warden compresses it to the lines that matter — failures and the summary — before it enters the context window. Over a full session, that’s the difference between hitting the context wall at turn 30 versus cruising past turn 80.

The result

Sessions stay productive longer. Context windows don’t overflow with test output and build logs. Dangerous commands get caught before they execute, not after you notice the damage. And the agent gets steered toward better tools — rg instead of grep, fd instead of find — without any prompt engineering.

The best part: when Warden is doing its job well, you don’t notice it. Healthy sessions run silently. It only becomes visible when something goes wrong — and by then, it’s already handled.

What’s next

Right now Warden supports Claude Code, Gemini CLI, and Codex CLI. I’m working on broader agent support, editor integrations, and a governance layer that lets teams define and enforce coding policies across their entire AI-assisted workflow.

If you’re using AI coding agents in production — or even just on side projects you care about — take a look. It’s fully local, free to use, and installs in one command: npx @bitmilldev/warden init.

The docs are at bitmill.dev.