AI Part 4: Safety Rails — Allowlists, Atomic Writes, Audit Logs, Rollback

26 April 2026 AI MCP LLM Claude Self-Hosting Tool Design

You can write a perfectly designed tool surface and still have a bad time if the underlying mechanics are sloppy. This post is about the four small things that, between them, make it boring rather than nerve-wracking to leave write access on.

Allowlists, again

I covered allowlists in the last post from a design angle — choosing them, naming them, describing them. The implementation matters too, because an allowlist that’s easy to misimplement is worse than one that fails loudly.

A few things I do that aren’t obvious until you’ve shipped one:

Resolve the path before checking it. A path like src/../../../etc/passwd is not “under src/” once resolved; it’s somewhere else entirely. Resolve first, then compare resolved paths. Anything that does string-prefix matching on the un-resolved input is broken — sometimes silently, often dangerously.
Check that the resolved path stays inside the root. A relative-path computation back to the root should not start with ... This is your “no escape” check, and it lives next to the allowlist check, not in a separate function you forget to call.
One predicate per tool family. Reads share an assertion. Writes (and deletes) share another, which calls the read assertion before doing its own work. Two assertions, used everywhere they’re needed. Easier to audit, easier to extend.

Atomic writes

If you write a file by opening it, truncating it, and streaming bytes into it, there is a window — usually short, sometimes not — when the file on disk is partially written. If anything reads it during that window, it reads a corrupt file. If anything kills your process during that window, the file stays corrupt.

The fix is the oldest trick in the book: write to a sibling tempfile, then rename over the destination. On the same filesystem, rename is atomic. Either the old file is there, or the new file is there; there is no in-between state visible to a reader.

This matters more for an LLM-driven workflow than for a human one, because the LLM might write a file and then immediately read it back to verify, or write several files in quick succession that other parts of the system are reading. The cost of doing it right is one line of code. The cost of doing it wrong is intermittent confusion you won’t reproduce.

The audit log

Every tool call writes one line to a log file. Timestamp, tool name, a one-line summary of what happened. That’s the entire format. It’s append-only, plain text, and lives in a directory the tools can’t write to.

I look at it weekly. Not because I expect to find something — I look because it’s calibration. Reading what the agent actually did, in order, with my own eyes, makes me a better prompter. I notice the prompts that produced four tool calls when I expected one. I notice the times I asked for a thing and the agent picked the wrong tool. I notice the hours when I was tired and asked for things I shouldn’t have asked for.

The log isn’t a security control. It’s a feedback loop. If you only build it for security, you’ll forget to look at it. If you build it as a way to read your own collaboration history, you’ll look at it often.

Rollback

The last rail is the cheapest and the one I’d insist on first if I were starting over: every irreversible thing has a manual undo path that I have written down somewhere I can find at 1am.

For deployments, that’s a backup of the previous server binary on disk, kept around indefinitely, with a versioned suffix. Restoring is a copy and a restart. For site content, it’s the version history in the site repository — every deploy is a commit, every commit is one revert away from being undone. For the audit log, there’s nothing to roll back; it’s append-only and that’s the point.

What I do not have: an in-tool undo. The agent cannot un-publish a post by calling a tool. To revert, I have to do it myself — either by hand or by asking the agent and then reviewing its proposed change before it lands. This is on purpose. An undo tool the agent can call without me means a post can be published, then un-published, then published again, all without me noticing. Keeping the undo out-of-band makes the human-in-the-loop property load-bearing.

What this all costs

These four things — allowlists, atomic writes, audit logs, manual rollback — are maybe two hundred lines of code combined. They aren’t clever. None of them are the part of the project I’d want to talk about at a meetup. They’re the part that means I sleep fine with the service running.

If you’re building something similar and you have to cut scope, cut the tools, not the rails. A two-tool server with all four rails is safer than a ten-tool server with three of them.

The next post is about what it actually feels like to use this, day to day.