AI agent skills — why I wrote 33 of them

Friday, April 24, 2026

ai-assisted
#ai
#ai-agents
#vibecoding
#developer-experience
#markdown
#yaml
#python
#automation
#postgresql
#bash
#cursor

I didn’t sit down one afternoon and write thirty-three skills. Nobody does that. The drawer filled up the way drawers usually do: one thing at a time, mostly because I needed it that day, occasionally because I’d promised myself I’d be more organized.

This post is about how that happened. Not the taxonomy — that’s a separate post. Just the story. What a skill is, when I started reaching for one, when the count tipped from “useful tool” into “this is an actual library now,” and why I’d recommend the whole exercise to anyone who works with an AI coding agent on the same project for more than a few weeks.

Vintage natural-history specimen plate in watercolor and fine ink line-work, on cream paper with a subtle aged watermark and the gentle texture of an old herbarium sheet. Approximately fourteen small specimens are arranged in a loose grid across the plate, each rendered in a different botanical-or-zoological technique: a pressed fern frond, a pinned iridescent beetle, a dried wildflower with bent stem, a watercolor cross-section of a sprouting seed, a crystalline mineral fragment outlined in ink, a leaf vein study, a single butterfly wing in dusty pastel pigment, a tiny snail shell, a small feather tip, a moth with subdued markings, a dried mushroom cap, a spiraled ammonite-style cross-section, a barbed seed pod, and a sprig of lavender. Each specimen is pinned to the paper with a tiny brass pin and surrounded by a small empty paper label tag — the tags carry no readable text, only abstract pencil tick marks and small gestural dots. The specimens are rendered in varied watercolor and ink techniques: some fully painted, some only inked outlines, some washy and atmospheric. A soft pencil margin trace runs along three edges of the plate, with no readable letters — just hand-drawn squiggles. Soft natural daylight on cream paper, generous whitespace, no readable text or letters anywhere in the composition. — The plate filled up one specimen at a time. Most were small. A few of them I now consult almost daily.

The short version of what a skill is

A skill is a Markdown file. That’s it. It has a name, a one-line description, and a body that explains how to do something — usually with a snippet of shell or code at the top that does the actual work.

The agent doesn’t run skills automatically. You invoke a skill by name, and the agent reads the file into its context. From there it knows the procedure: where the script is, what flags to pass, what the output means, what to do with it next. The skill is a tiny playbook the agent can pick up on demand.

A minimal one looks like this:

---
name: get-recent-commits
description: List recent commits on the current branch with author and short message.
---

# Get recent commits

Quick view of the last N commits on the current branch.

## Quick start

```bash
git log --pretty=format:"%h %an: %s" -n 20
```

## Options

- `-n N` — how many commits to show (default 20)

That’s a skill. The frontmatter has a name and a description, the body explains what it does, the bash block at the top is what the agent will run when I ask it to “get recent commits.” The format you’ll see varies by tool — some call them skills, some call them slash-commands, some call them snippets — but the bone structure is the same.

How the first one happened

The first skill I wrote was for an AI agent project I work on. The agent persists state across conversations using a graph database, and the way that state gets serialized is a little custom. When something looked weird in production, the natural debugging move was to crack open the database, decode the checkpoint blob, and see what the agent actually thought it was doing.

The first time I did that, I wrote a small Python script. The second time I did it, I forgot which database the script connected to and burned ten minutes finding out. The third time I did it, I made my first skill — a SKILL.md file with the script path, the command, the flags, and a note about which tables to look at if the first one was empty.

The next time something went weird, I asked the agent: check the latest checkpoint for this conversation. The agent read the skill, ran the script, parsed the output, and told me what was off. Total time from question to answer, about thirty seconds. The skill paid for itself the first time it was used.

The threshold for writing one

After that I had a rough rule: the second time I do something, I write a skill.

Not the first time. The first time you don’t yet know what the procedure is. You’re discovering. Writing it down on the way is mostly noise.

But the second time, the procedure has emerged. You know which command, which flags, which gotchas, which related commands. That’s the moment to capture it. If you wait for the third or fourth time, you’ve already burned the savings the skill would have given you.

A skill at this stage is small — a paragraph or two of context, the command, a list of flags. Maybe an example output if it’s the kind of thing that needs interpreting. It takes ten minutes to write. The cost is real but trivial.

The threshold matters for the inverse reason too. If a thing has only ever been done once, do not write a skill for it. I tried that early and it produced a graveyard of skills I never used. Skills are caching. If you cache a thing you’ll never look up again, you’re just paying the storage cost.

Why thirty-three is roughly the right number

That number isn’t a goal. It’s where I happened to land for the project I was working on. The useful observation is not the count; it’s that different surfaces of a system naturally produce different skills: live state, configuration, external systems, internal procedures, and QA.

When a skill beats a rule

This was the thing I had to get straight in my own head. Rules and skills look similar from the outside — both are Markdown files in a folder, both are loaded into the agent’s context. They do different things.

A rule loads automatically when you’re working on a matching file. It’s ambient. The agent reads it whether or not you ask. Rules are good for steady-state guidance: voice, conventions, safety policies, things that should always shape the agent’s behavior on a given file type.

A skill loads on demand, by name. It’s invoked. The agent picks it up only when you tell it to (or when the description matches the user’s request closely enough). Skills are good for procedures: things you sometimes do, with a defined input and output.

Two practical tests I use:

Does this guidance apply every time I touch a certain file? That’s a rule.
Is there a command I run when I want to know X? That’s a skill.

You’ll have a few that feel like both. When that happens I usually split them — a small rule for the always-on part, a skill for the procedural part — because mixing them makes the rule longer than it should be and the skill harder to find.

The shape of a good skill body

After I’d written a few, a pattern emerged for what makes a skill body actually useful. The agent reads the body, decides what to do, and either runs the command or answers from what it knows. The structure that serves both modes well:

Frontmatter with name (matches the directory) and description (explains when to use the skill — not what it does, but when the agent should reach for it).
A one-paragraph context block that says what the skill operates on and what assumptions it makes (e.g. expects a .env with database credentials).
The Quick start command, copy-paste ready. Including the working directory if it matters.
Options table with each flag and what it controls.
A short reference to the actual implementation files in the codebase — so the agent can find the source of truth if the skill body is ever wrong.

The fifth one is the part most people skip and shouldn’t. The skill is documentation; documentation drifts. Pointing the agent at the live source code means that when the skill body is out of date, the agent can fall through to the real thing instead of confidently doing the wrong outdated procedure.

Single-specimen botanical study plate in watercolor and fine ink line-work, on cream paper with subtle aged texture. The center of the plate carries a single large pressed-plant specimen rendered with care: a broad open leaf with visible veins, a slender stem branching to a bud, a paired root structure spread below, a small open flower at the top, and a developing seed pod beside it — five distinguishable parts of one organism. From each of the five parts, a thin pencil callout line extends outward to the margins, where it terminates in a small empty annotation marker — an unfilled circle, a small geometric sigil, an open square, an outlined diamond, a hollow triangle — none of the markers contain any readable letters or numbers. A delicate ruled scale-bar mark sits at the bottom edge of the plate, with no measurement units printed. Soft watercolor pigment in muted greens, ochres, and rust over crisp brown ink linework; a faint herbarium-sheet specimen border traces the outer edge in pencil. Vintage natural-history feel, no readable text or letters anywhere in the composition. — The shape settled after about a dozen specimens. Five distinguishable parts, each answering a different question the agent asks while reading.

What I didn’t expect

A few things kept happening that I didn’t predict.

The skills became a sort of onboarding doc. I joined a project mid-flight, and the people who’d been there longer didn’t always know what every part of the system did either. The skill drawer slowly accreted into a tour of the system: read these names, you have a rough map of what we operate on. New conversations start with the agent already pointing at relevant skills, which is faster than any wiki I’d ever maintained.

The skill descriptions mattered more than the bodies. When you have thirty-three skills, the agent doesn’t read all of them; it reads the directory of names and descriptions, and pulls in the few that sound relevant. A bad description means the right skill sits there unused. A good description (“Use when debugging why a cached score is stale”) makes the skill discoverable. I rewrote a lot of descriptions a few weeks in, after I’d noticed which ones the agent kept missing.

Skills aged better than I expected. I assumed each skill would rot when the underlying code changed. Some did. But many of them were stable — a script that lists records by ID isn’t going to change much. The ones that aged poorly were the ones tightly coupled to internal logic; those got rewritten or deleted. The boring ones survived.

The hardest skills to write were the procedural ones. Skills that just run a command are easy. Skills that walk the agent through a multi-file change (“add a new filter to the pipeline”) were much harder, because the procedure is the kind of thing that lives in a senior engineer’s head. Writing those forced me to actually understand the procedure, which had value beyond the skill itself.

Should you have thirty-three?

Probably not. You should have however many your project naturally produces, written one at a time, with the threshold from above: if I do this twice, I capture it. The number is useful as a smell, not as a target.

That’s what made me keep writing them. Not a plan. Just the steady realization that yesterday’s twenty-minute debugging session is now today’s thirty-second skill invocation.