AI Part 3: Designing Tools for an LLM, Not for Yourself

When you write an internal API for yourself, you optimise for terseness. You know what the function does; the name is a reminder, not a teacher. When you write a tool for an LLM, you have to flip that. The model has never used the tool before, won’t see the implementation, and will pick from a list of similarly-named candidates based mostly on the description you give it. If your description reads like an internal API doc, the model will pick wrong, often.

This post is about three things I had to learn the slow way: tool naming, tool descriptions, and the shape of error messages.

Naming: predictable verbs win

The verb is the most important part of the name. If the model can guess what write_draft does without reading the description, that’s a win — the description gets to be about the interesting parts (constraints, side effects, when to call it, what to do after) instead of restating the verb.

I use a small verb vocabulary: read_*, write_*, list_*, delete_*, deploy_*, rebuild_*. Same verbs across tool families. read_site_file and read_draft behave the way read_* always behaves. When I added the delete tool, naming it was a non-decision.

Names I rejected: update_post, manage_drafts, commit_changes. They sound right to a human and read as ambiguous to a model. Update is a write or maybe a partial write — which? Manage is several verbs. Commit is git or it’s a transaction or it’s a publish, depending on context. Pick one verb per intent, use it everywhere.

Descriptions: write them for the call site, not the docs page

The model reads tool descriptions when it’s deciding which tool to call right now. The description has to answer the questions a chooser asks, not the questions a maintainer asks.

Things I now put in every description:

  • What it does in one sentence, with the important nouns (what the input is for, what the output is of).
  • What it does NOT do. “Does NOT publish — use deploy_post after.” “Does NOT rebuild — call rebuild_site after.” This is the line that prevents the model from assuming a side effect that isn’t there.
  • Hard constraints, in plain language. Allowlist scope, file size cap, accepted path shape. Not “see schema” — the actual rule, where the model is reading.
  • An example argument. A literal example of what a valid input looks like. Worth its weight; it short-circuits a class of malformed calls.

What I cut from descriptions: motivation, history, internal terminology, anything starting with “originally we…”. The model doesn’t care.

Allowlists vs blocklists

For anything that mutates state, I prefer an allowlist. Allowlists fail closed: if I forgot a path, the tool refuses, and I find out by the model telling me it couldn’t write somewhere I expected. Blocklists fail open: if I forgot to block a path, the tool succeeds, and I find out when I notice the change weeks later.

The read tool is the exception. It uses a blocklist (refuse environment files, dependencies, version control internals, build output) and accepts everything else under the site root. Reads are recoverable; the failure mode is “the model saw something it shouldn’t have”, not “the model destroyed something”. A blocklist for reads keeps the tool useful for inspecting layouts and components without me having to enumerate every directory in advance.

The write and delete tools share the same allowlist. That isn’t accidental — having one set of rules to reason about means I can describe both tools’ constraints in the same sentence and the model won’t get confused about which paths each one accepts.

Error messages are also instructions

When a tool refuses, the message it returns is the next thing the model reads. So the message has to do two jobs: tell the model why, and (often) tell it what to do next.

Compare:

  • “Permission denied” — uninformative; the model retries the same call or invents a workaround.
  • “Write blocked: ‘README.md’ not in writable allowlist (src/, public/, or root config)” — the model now knows the rule and can pick a valid path.

I write all my error messages this way. The cost is a few extra characters per throw; the benefit is that when the model hits a guard, it usually gets the next call right without me intervening. That feels like a small thing until you’ve watched a model thrash on an opaque error for three turns.

The shape of a good tool

If I had to compress what I’ve learned into one sentence: a good tool for an LLM has a predictable verb in its name, a description written for the moment of choice, an allowlist where the cost of forgetting is high, and error messages that double as instructions. None of that is hard. All of it is easy to skip when you’re moving fast.

The next post is about the boring infrastructure that makes any of this safe to actually leave running.