AI agent skills — why I wrote 33 of them
I didn’t sit down one afternoon and write thirty-three skills. Nobody does that. The drawer filled up the way drawers usually do: one thing at a time, mostly because I needed it that day, occasionally because I’d promised myself I’d be more organized.
This post is about how that happened. Not the taxonomy — that’s a separate post. Just the story. What a skill is, when I started reaching for one, when the count tipped from “useful tool” into “this is an actual library now,” and why I’d recommend the whole exercise to anyone who works with an AI coding agent on the same project for more than a few weeks.
The short version of what a skill is
A skill is a Markdown file. That’s it. It has a name, a one-line description, and a body that explains how to do something — usually with a snippet of shell or code at the top that does the actual work.
The agent doesn’t run skills automatically. You invoke a skill by name, and the agent reads the file into its context. From there it knows the procedure: where the script is, what flags to pass, what the output means, what to do with it next. The skill is a tiny playbook the agent can pick up on demand.
A minimal one looks like this:
---
name: get-recent-commits
description: List recent commits on the current branch with author and short message.
---
# Get recent commits
Quick view of the last N commits on the current branch.
## Quick start
```bash
git log --pretty=format:"%h %an: %s" -n 20
```
## Options
- `-n N` — how many commits to show (default 20)
That’s a skill. The frontmatter has a name and a description, the body explains what it does, the bash block at the top is what the agent will run when I ask it to “get recent commits.” The format you’ll see varies by tool — some call them skills, some call them slash-commands, some call them snippets — but the bone structure is the same.
How the first one happened
The first skill I wrote was for an AI agent project I work on. The agent persists state across conversations using a graph database, and the way that state gets serialized is a little custom. When something looked weird in production, the natural debugging move was to crack open the database, decode the checkpoint blob, and see what the agent actually thought it was doing.
The first time I did that, I wrote a small Python script. The second time I did it, I forgot which database the script connected to and burned ten minutes finding out. The third time I did it, I made my first skill — a SKILL.md file with the script path, the command, the flags, and a note about which tables to look at if the first one was empty.
The next time something went weird, I asked the agent: check the latest checkpoint for this conversation. The agent read the skill, ran the script, parsed the output, and told me what was off. Total time from question to answer, about thirty seconds. The skill paid for itself the first time it was used.
The threshold for writing one
After that I had a rough rule: the second time I do something, I write a skill.
Not the first time. The first time you don’t yet know what the procedure is. You’re discovering. Writing it down on the way is mostly noise.
But the second time, the procedure has emerged. You know which command, which flags, which gotchas, which related commands. That’s the moment to capture it. If you wait for the third or fourth time, you’ve already burned the savings the skill would have given you.
A skill at this stage is small — a paragraph or two of context, the command, a list of flags. Maybe an example output if it’s the kind of thing that needs interpreting. It takes ten minutes to write. The cost is real but trivial.
The threshold matters for the inverse reason too. If a thing has only ever been done once, do not write a skill for it. I tried that early and it produced a graveyard of skills I never used. Skills are caching. If you cache a thing you’ll never look up again, you’re just paying the storage cost.
Why thirty-three is roughly the right number
That number isn’t a goal. It’s where I happened to land for the project I was working on, and looking back, it makes sense.
The project has a few natural surfaces:
- Live state — what the agent is currently thinking. Skills for inspecting checkpoints, decoding graph state, reading the message thread the agent is responding to.
- Configuration / cache — what’s been computed before. Skills for reading and clearing cached evaluations, looking up rate limits, checking timing.
- External systems — the third-party platforms the agent talks to. Skills for fetching a record by ID, running a search query, posting a message back.
- Internal procedures — repeatable multi-step changes. Skills that walk the agent through “add a new search filter” or “wire a new intent through the pipeline.”
- Quality assurance — replaying scenarios, capturing outputs, regression checks.
Each surface produces somewhere between three and ten skills as the project matures. Multiply by five surfaces and you land somewhere in the high twenties or low thirties. That’s not magic — it’s the natural shape of a non-trivial system that one person plus an AI agent operates on.
When a skill beats a rule
This was the thing I had to get straight in my own head. Rules and skills look similar from the outside — both are Markdown files in a folder, both are loaded into the agent’s context. They do different things.
A rule loads automatically when you’re working on a matching file. It’s ambient. The agent reads it whether or not you ask. Rules are good for steady-state guidance: voice, conventions, safety policies, things that should always shape the agent’s behavior on a given file type.
A skill loads on demand, by name. It’s invoked. The agent picks it up only when you tell it to (or when the description matches the user’s request closely enough). Skills are good for procedures: things you sometimes do, with a defined input and output.
Two practical tests I use:
- Does this guidance apply every time I touch a certain file? That’s a rule.
- Is there a command I run when I want to know X? That’s a skill.
You’ll have a few that feel like both. When that happens I usually split them — a small rule for the always-on part, a skill for the procedural part — because mixing them makes the rule longer than it should be and the skill harder to find.
The shape of a good skill body
After I’d written a few, a pattern emerged for what makes a skill body actually useful. The agent reads the body, decides what to do, and either runs the command or answers from what it knows. The structure that serves both modes well:
- Frontmatter with
name(matches the directory) anddescription(explains when to use the skill — not what it does, but when the agent should reach for it). - A one-paragraph context block that says what the skill operates on and what assumptions it makes (e.g. expects a
.envwith database credentials). - The Quick start command, copy-paste ready. Including the working directory if it matters.
- Options table with each flag and what it controls.
- A short reference to the actual implementation files in the codebase — so the agent can find the source of truth if the skill body is ever wrong.
The fifth one is the part most people skip and shouldn’t. The skill is documentation; documentation drifts. Pointing the agent at the live source code means that when the skill body is out of date, the agent can fall through to the real thing instead of confidently doing the wrong outdated procedure.
What I didn’t expect
A few things kept happening that I didn’t predict.
The skills became a sort of onboarding doc. I joined a project mid-flight, and the people who’d been there longer didn’t always know what every part of the system did either. The skill drawer slowly accreted into a tour of the system: read these names, you have a rough map of what we operate on. New conversations start with the agent already pointing at relevant skills, which is faster than any wiki I’d ever maintained.
The skill descriptions mattered more than the bodies. When you have thirty-three skills, the agent doesn’t read all of them; it reads the directory of names and descriptions, and pulls in the few that sound relevant. A bad description means the right skill sits there unused. A good description (“Use when debugging why a cached score is stale”) makes the skill discoverable. I rewrote a lot of descriptions a few weeks in, after I’d noticed which ones the agent kept missing.
Skills aged better than I expected. I assumed each skill would rot when the underlying code changed. Some did. But many of them were stable — a script that lists records by ID isn’t going to change much. The ones that aged poorly were the ones tightly coupled to internal logic; those got rewritten or deleted. The boring ones survived.
The hardest skills to write were the procedural ones. Skills that just run a command are easy. Skills that walk the agent through a multi-file change (“add a new filter to the pipeline”) were much harder, because the procedure is the kind of thing that lives in a senior engineer’s head. Writing those forced me to actually understand the procedure, which had value beyond the skill itself.
Should you have thirty-three?
Probably not. Probably you should have however many your project naturally produces, written one at a time, with the threshold I described above: if I do this twice, I capture it.
Some projects are well-served by five skills. Some by fifty. The number is downstream of the system’s complexity and how often you operate on it. Counting your skills is roughly as informative as counting your test cases — useful as a smell, not as a target.
The pattern that holds across project sizes is the simpler one: when an AI coding agent operates on the same project repeatedly, its memory of the project ends up in your repo. Some of that memory ends up in rules, which steer behavior on file edits. The rest ends up in skills, which let the agent run procedures on demand. Both compound. Both are cheap to write the first time, and most of them keep paying you back for as long as the project lives.
That’s what made me keep writing them. Not a plan. Just the steady realization that yesterday’s twenty-minute debugging session is now today’s thirty-second skill invocation.