The Zed AL Extension is a 12-crate Rust workspace with strict crate boundaries, an in-process .NET bridge, and a daemon-based server architecture. Building it requires enforcing rules that an AI agent will cheerfully violate if you let it: thin adapters must not import core internals, leaf crates must not reach up the dependency tree, every task needs test evidence before it can be marked done.
I needed enforcement that actually works. Not “remind the agent in the prompt” — that’s a suggestion, and agents ignore suggestions when they’re inconvenient. I needed violating a boundary to make the tool call fail. Hard stop, no workaround.
Claude Code has a plugin system that turned out to be the right fit. The whole methodology lives in .claude/ as markdown and TOML files. No shell scripts, no custom tooling. Just declarations about what the agent can and can’t do.
What .claude/ looks like
.claude/
agents/ # subagent definitions (YAML frontmatter + instructions)
skills/ # slash commands (SKILL.md files)
rules/ # always-loaded or path-scoped context
hookify.*.local.md # enforcement hooks (file/stop events)
constraints.toml # machine-readable index of what enforces what
deferred-issues.toml # known acceptable failures with blocking task refs
settings.json # Claude Code permissions
The CLAUDE.md puts it bluntly: “Zero custom executable code. Plugins enforce methodology. Agent does the work. Hookify blocks bad patterns.”
Rules
Rules are markdown files in .claude/rules/ that get loaded into every agent session. They’re information, not enforcement. The agent reads them so it knows what it’s working with.
Seven files:
| File | What it tells the agent |
|---|---|
architecture.md | Four dependency layers, direction of allowed imports, one-code-path principle |
code-boundaries.md | Per-crate import permission matrix |
testing.md | Test commands, harness behavior, test-first mandate |
thin-adapters.md | Zero al-core dependency, al-protocol is the only shared crate |
zed-fidelity.md | WASM constraints, extension file structure, LSP conformance |
agentic-output.md | --json flag conventions for CLI/MCP output schemas |
token-efficiency.md | Truncation rules, model selection, “never stop between tasks” |
code-boundaries.md is the one I keep adding to. The crate layout:
al-lsp → al-core → al-syntax, al-symbols, al-semantic, al-diag
↑
al-protocol (types only, shared)
↓
al-cli, al-explorer, al-mcp (runtime JSON-RPC only)
zed-al (WASM)
al-test-harness
The thin adapters (al-cli, al-explorer, al-mcp) have zero compile-time dependency on al-core. They talk to the running al-lsp process over a Unix socket using JSON-RPC. The architecture article explains why: al-core links against netcorehost for the in-process .NET bridge, and that native dependency makes compilation slow and cross-compilation painful. Keeping it out of the tool crates is worth the protocol hop.
Rules explain this to the agent. Hookify enforces it.
Hookify
Hookify rules are markdown files with YAML frontmatter: an event type, an action, and matching conditions. They fire on file edits or when the agent tries to stop.
Ten rules. Eight guard architecture boundaries:
---
name: block-adapter-cargo-deps
event: file
action: block
conditions:
- path: "crates/al-cli/Cargo.toml"
- path: "crates/al-explorer/Cargo.toml"
- path: "crates/al-mcp/Cargo.toml"
- content: "al-core"
- content: "al-syntax"
- content: "al-symbols"
- content: "al-semantic"
- content: "al-diag"
---
Thin adapters must not depend on internal crates.
Only al-protocol is allowed.
When the agent writes to a thin adapter’s Cargo.toml and the content includes al-core, the write fails. Not a warning. The tool call returns an error and the agent has to find another way.
Same pattern for Rust imports. use al_core:: inside al-cli/src/? Blocked. al-core importing from al-lsp? Blocked. al-syntax importing from a sibling crate? Blocked. One rule file per boundary.
The other two rules are completion gates on the stop event:
require-cargo-check-before-stop: blocks session end without evidence ofcargo checkrequire-cargo-test-before-stop: same forcargo test
The agent can’t walk away from broken code. It sees “your session cannot end because these checks haven’t run” and has to fix whatever is failing.
Skills
Skills are slash commands in .claude/skills/*/SKILL.md. YAML frontmatter with a name, description, optional arguments. They define workflows.
/start-work
The main loop:
- Health check on
.claude/infrastructure - Read
docs/plan.mdanddocs/progress.md, find the first unchecked task - TDD implementation: failing test, make it pass, refactor
- Invoke
/pofto record test evidence - Mark the task done in
docs/progress.md - Spawn the adversarial agent in the background
- Pick up the next task immediately
Step 7 is the one that matters most. Without it, the agent completes a task, writes a summary of what it did, and waits for input. The skill overrides that default: keep going until there are no more tasks or something blocks you.
/pof (proof of functionality)
Takes a task ID and work package number. Runs cargo test, writes a TOML entry to proof_of_functionality.toml with the actual output:
[[entries]]
task = "T301"
wp = "WP3"
timestamp = "2026-03-12T14:23:00Z"
command = "cargo test --workspace"
exit_code = 0
tests_passed = 461
tests_failed = 0
output_excerpt = "test result: ok. 461 passed; 0 failed"
Real output, not “tests pass.” What ran, when, what happened. If a future task breaks something, you can trace back to what was passing and when.
/adversarial
Launches the adversarial subagent in the background. It reads recently completed code, looks for bugs, fixes what it can, defers the rest to deferred-issues.toml. Meanwhile the main agent keeps working.
The rest
/ci runs a full validation: hookify rule validity, compilation, tests, config checks. Heavier than /pof because it includes clippy and formatting.
/audit spawns the supervisor for a deep work-package review. /fix-infra launches the infra-fixer to repair broken .claude/ files. /report reads the plan, progress tracker, and proof-of-functionality log and spits out a summary.
Agents
Three subagents, each a markdown file in .claude/agents/.
Adversarial runs on Sonnet. Reads the diff from a completed task, generates edge cases, runs them. If it finds a bug it can fix, it fixes it inline. If the fix depends on infrastructure that hasn’t been built yet, it writes a deferred-issues.toml entry with a blocked_by field pointing to the blocking task.
That deferred issues file turned out to be more useful than I expected. Some bugs genuinely can’t be fixed until later tasks land. Rather than TODO comments (which other hookify rules would catch anyway), deferred issues live in a structured format that /start-work checks each session. When task 7 lands and unblocks the fix for something found during task 3, the agent sees it.
Supervisor also runs on Sonnet. Spawned by /audit, it checks task completion against the plan, verifies evidence in the proof-of-functionality log, runs tests, and reviews architecture conformance. Outputs a pass/fail report per task.
Infra-fixer is Sonnet too. Long sessions sometimes produce bad edits to .claude/ config files. The infra-fixer reads the error output and fixes the config. Scoped to .claude/ only — can’t touch project source.
All three are spawned by skills, never run directly.
constraints.toml
With enforcement spread across ten hookify files and context spread across seven rule files, it gets easy to lose track. constraints.toml maps each architectural constraint to whatever enforces and documents it:
[constraints.thin-adapter-no-core-dep]
description = "Thin adapters must not have compile-time dependency on al-core"
enforced_by = ["hookify.block-adapter-cargo-deps.local.md", "hookify.block-adapter-rs-imports.local.md"]
documented_in = ["rules/thin-adapters.md", "rules/code-boundaries.md"]
[constraints.leaf-crate-isolation]
description = "al-syntax, al-symbols, al-semantic, al-diag must not import sibling crates"
enforced_by = ["hookify.block-syntax-boundaries.local.md", "hookify.block-symbols-boundaries.local.md"]
documented_in = ["rules/code-boundaries.md"]
When I add a crate or change a boundary, I update this file first. It tells me which hookify rules and which rule files need changes.
A session
I type /start-work. Health check, find the next task, start implementing. TDD: failing test, pass, clean up. Task done, /pof records the evidence, adversarial spawns in the background, next task starts.
If the agent tries to add al-core to al-cli/Cargo.toml, the edit fails. If it tries to use al_core:: in a thin adapter, the edit fails. It routes through JSON-RPC like the architecture requires.
If the agent tries to stop without running cargo check and cargo test, the stop is blocked.
When I come back, I check the progress tracker and the proof-of-functionality log. The adversarial agent may have found and fixed issues, or deferred them. I review diffs, run /report, and either start the next session or /audit a completed work package.
The deferred issues file is the part I didn’t expect to care about. Bugs from task 3 that depend on task 7 infrastructure don’t disappear into forgotten TODOs. They sit in deferred-issues.toml with a blocked_by reference, and when the agent gets to task 7, it picks them up. I’ve tried half a dozen other ways to track cross-task dependencies and this is the first one that actually works without me having to remember anything.
Getting the hookify rules right took a few rounds. The first block-adapter-rs-imports was too broad and caught legitimate al-protocol re-exports because the pattern matched the al_ prefix. Narrowing to exact crate names (al_core, al_syntax, al_symbols, al_semantic, al_diag) fixed the false positives.
Why not shell hooks
The original setup used shell-based hooks. A PreToolUse hook that grepped proposed file content. A PostToolUse hook that kicked off cargo check in the background. Shell scripts checking exit codes and writing state files.
It worked, but it was fragile. What if cargo check is already running? What if the state file is locked? The scripts needed error handling for every edge case, and maintenance every time a path changed. Worse, they were invisible to the agent. It couldn’t read or understand them. Tool calls just failed or succeeded with no context.
The declarative approach fixes this. The agent can read every rule file and understand what it’s not allowed to do. Hookify rules state their conditions and actions in plain text. No hidden state, no race conditions. When something gets blocked, the error message says which rule and why.
constraints.toml closes the loop. One file to audit the full enforcement surface, instead of grepping through shell scripts.
What I’d change
The adversarial agent is too eager on small changes. A three-line function edit doesn’t need a full adversarial pass, but the skill dispatches one anyway. A diff-size threshold would cut the noise.
The stop-event hooks have a gap. They check that cargo check and cargo test ran during the session, but not that they ran after the last edit. An agent can run tests at the start, make 15 changes, and still pass the completion gate. The hooks should require tests after the most recent file modification.
Seven rule files is too many. There’s overlap. I’d consolidate to one per crate layer instead of one per concern, which would cut context tokens without losing information.
If you want to try this on your own project, rules and hookify are where the leverage is. Skills and agents are nice but they’re refinements. The boundary enforcement is the part that saves you from spending an hour untangling something the agent broke twenty edits ago.