Wardenby Bitmill
Documentation

Design Philosophy

Warden exists because of a single observation: intelligence is only valuable if it improves the next decision. A system that detects drift but cannot course-correct is just expensive logging. A system that learns patterns but never applies them is academic. Every module, every hook, every line of Rust in Warden traces back to this premise.

This page documents the principles that guide Warden’s design. They are not aspirational — they are enforced in code, tested in CI, and violated only with explicit justification in a commit message.


Principle 1: Invisible First

The highest compliment a developer can pay Warden is forgetting it exists.

Zero-friction operation means:

  • No manual setup per project. Warden detects the assistant, reads its configuration, and starts working. There is no .warden.yml to create, no init wizard to run, no onboarding flow to complete.
  • No visible output during normal operation. Hooks that approve a tool call produce no terminal output. Only denials, warnings, and injections are surfaced — and even those are delivered through the assistant’s own UI, not a separate channel.
  • No performance tax. The p99 latency budget for a hook invocation is 50ms. If Warden cannot decide in 50ms, it approves and logs the miss. The developer never waits.
  • No cognitive load. There are no commands to memorize for daily use. warden status exists for debugging; warden daemon start exists for CI. Neither is required for normal development.

If a feature requires the developer to change their workflow, it must justify itself against the alternative of not existing.

Principle 2: Bounded Intelligence

Every module in Warden has an explicit cost ceiling. This is not a suggestion — it is enforced by the Budget struct that flows through every engine.

The boundaries take several forms:

  • Time bounds. Sentinel pattern matching uses a compiled RegexSet with DFA matching. It does not backtrack. The entire pattern library evaluates in a single pass. Loopbreaker limits n-gram history to 50 entries and entropy windows to 20 turns.
  • Space bounds. Session state is capped at 64KB serialized. Cross-session artifacts are capped at 32KB per project. The Dream engine’s worker thread checks these limits before every write.
  • Complexity bounds. Trust scores use a fixed formula with 6 input variables. There is no machine learning model, no neural network, no gradient descent. The formula is legible, debuggable, and deterministic given the same inputs.
  • Token bounds. Context injections are measured in tokens (estimated via byte-length heuristics) and capped per trust tier. A high-trust session gets at most 1 injection; a low-trust session gets up to 15. This is counterintuitive until you realize that low-trust sessions need more guardrails, not fewer.

When a module approaches its ceiling, it degrades gracefully rather than failing hard. The Compass phase detector, for instance, falls back to the Exploring phase if it cannot confidently classify the current phase — and Exploring has the most conservative injection budget.

Principle 3: Fail-Open

Warden must never be the reason a developer cannot complete their work.

This manifests as:

  • Hook errors return approval. If Warden panics, if the daemon is unreachable, if the binary is missing — the hook script returns exit code 0 (approve). The assistant continues unimpeded.
  • Parse failures skip the module. If a tool call’s JSON is malformed, Warden logs the error and skips analysis rather than denying the call.
  • Version mismatches degrade. If the daemon is running v58 and the hook binary is v57, the system detects the mismatch, logs a warning, and falls back to stateless mode rather than refusing to operate.
  • Missing state initializes defaults. If the session file is corrupted or absent, Warden creates a fresh session with conservative defaults rather than erroring out.

The only exception to fail-open is the Sentinel safety layer. A pattern that matches a known-dangerous command (like rm -rf /) will deny even if surrounding context is ambiguous. Safety overrides liveness.

Principle 4: Deterministic Over Advisory

Hooks are more reliable than prompts. This is the core architectural bet.

An advisory system might inject a message saying “Please avoid running destructive commands.” The LLM might ignore it, rephrase it, or hallucinate a justification for why this particular destructive command is fine. There is no enforcement mechanism.

A hook-based system intercepts the tool call before execution and returns a verdict: Approve, Deny, or Modify. The LLM never sees the dangerous command succeed. There is no ambiguity, no negotiation, no prompt injection that can bypass the check.

This is why Warden is built as a hook system rather than a system-prompt generator:

  • Hooks are synchronous gates. The tool call cannot proceed until the hook returns.
  • Hooks see the actual payload. Not a summary, not a description — the exact command or file path.
  • Hooks compose. Multiple hooks can analyze the same call independently. A deny from any hook is final.
  • Hooks are testable. Given input X, hook produces output Y. No temperature, no sampling, no stochastic variation.

Context injections (system prompt additions) are Warden’s secondary mechanism, used for guidance that cannot be expressed as a binary gate — phase awareness, focus tracking, convention reminders. But they are always subordinate to hook verdicts.

Principle 5: Runtime Value Only

Warden does not generate code. It does not write tests. It does not refactor. It does not create files.

Warden’s entire value proposition is making the AI assistant better at doing those things. The distinction matters because it defines the scope of what Warden can break. If Warden has a bug in its Compass phase detector, the worst outcome is a slightly mistuned injection budget. The assistant still writes code, still runs tests, still creates files — just with slightly less contextual guidance.

This principle also means Warden has no opinion about what the developer is building. It does not know or care whether you are writing a React app or a Kubernetes operator. Its analysis operates at the tool-call level: is this bash command safe? Is this file write consistent with the session’s focus? Is the assistant looping on the same error?

Principle 6: Small Outputs

Every byte Warden injects into the assistant’s context window has a cost. That cost is measured in tokens displaced — tokens that could have been source code, documentation, or user instructions.

Warden’s output budget is therefore ruthlessly constrained:

  • Deny messages are one sentence. “Blocked: rm -rf matches safety pattern dangerous_recursive_delete.”
  • Context injections are 2-5 lines. A phase indicator, a focus reminder, a convention note. Never a paragraph.
  • Session summaries are capped at 500 tokens. They capture phase, focus, trust, active files, and recent errors — nothing more.
  • Dream artifacts are structured data, not prose. A RepairPattern is a JSON object with error_signature, fix_sequence, and confidence, not a natural-language explanation.

If a module’s output exceeds its budget, it is truncated, not wrapped. Truncation is a feature — it forces authors to front-load the most important information.

The 4 Runtime Questions

Every hook invocation in Warden answers at most four questions:

1. Should this be blocked?

The Reflex engine answers this in under 50ms. Pattern matching against known-dangerous commands, loop detection against repetitive behavior, injection detection against prompt manipulation attempts. Binary output: yes or no.

2. Is the session drifting?

The Anchor engine answers this continuously. The Compass tracks phase transitions, the Focus module tracks file-set coherence, the Ledger tracks turn counts and verification gaps. The output is not binary — it is a set of signals with varying severity.

3. What survives this session?

The Dream engine answers this in the background. When a session ends or goes idle, the worker thread analyzes the session log for patterns worth remembering: successful repair sequences, project conventions, error signatures. These artifacts persist in .warden/cross-session/ and are loaded into future sessions.

4. What should the assistant do next?

The Harbor engine answers this through context injection. Based on signals from the other three engines, Harbor formats a compact context block and delivers it to the assistant via the appropriate protocol — system field for Claude Code, MCP tool response for other integrations.

Not every hook invocation touches all four questions. A simple cat README.md might only hit question 1 (not blocked) and skip the rest. A git push --force hits all four.

Harness Engineering vs. Prompt Engineering

Warden represents a bet on harness engineering — building structured systems around LLMs — over prompt engineering — crafting natural-language instructions for LLMs.

Prompt engineering is fragile. It depends on the LLM following instructions reliably, which varies across models, versions, and context lengths. A perfectly crafted system prompt for Claude 3.5 Sonnet might behave differently on Claude 4 Opus, or degrade when the context window fills up.

Harness engineering is structural. Hooks fire regardless of which LLM is behind the assistant. Trust scores compute the same way whether the session has 10 turns or 10,000. Pattern matching does not depend on the model understanding what “dangerous” means — it depends on a regex matching a string.

This does not mean prompts are useless. Warden uses context injections (which are a form of prompt engineering) for guidance that cannot be expressed as binary gates. But the load-bearing logic — the decisions that protect repositories, prevent loops, and maintain session coherence — lives in deterministic Rust code, not in natural-language instructions.

The goal is a system where the LLM’s reliability determines the quality of the output, but the harness’s reliability determines the safety of the process. Warden is the harness.


These principles are not abstract. Each one maps to specific architectural decisions documented in the Engine Overview and enforced by the test suite. When a proposed feature conflicts with a principle, the principle wins unless the conflict is resolved by updating the principle itself — which requires a design discussion, not just a code change.