Random Thoughts

Developer tooling

When your AI writes 200 lines and it could be 50

Friday, May 1, 2026

  • ai-assisted
  • #ai
  • #ai-agents
  • #vibecoding
  • #best-practices
  • #code-review
  • #code-quality
  • #refactoring
  • #karpathy
  • #python
  • #typescript
Sumi-e ink-wash diptych on cream rice paper with hand-deckled edges, separated by a soft vertical fold line down the center of the page. The left half shows a tangled, chaotic mass of overlapping ink brush gestures — many strokes of varying weights piled on top of one another to form a dense bird's-nest-like bundle, with drying-pull marks where the brush ran dry, occasional ink splatter dots around the edges, and smaller ink puddles where the strokes crossed. The composition reads as visual excess: too many gestures saying the same thing. The right half shows a single confident bamboo stalk rendered in just three deliberate brushstrokes — one upward tapering vertical stroke for the stalk, one short angled stroke for a node, and one tapered angular leaf at the top. Generous negative space surrounds the bamboo. A small red square seal-stamp accents the lower right of the right half. The two halves share the same rice-paper ground with its faint paper grain. No readable text or characters anywhere in the composition.
Same gesture, four times less ink. The left version's tangle is roughly the recurring patterns this post is about.

I’ve been keeping a small list. Every time the AI agent I work with produces a chunk of code that’s noticeably more elaborate than the problem requires, I jot down what shape the overengineering took. After a few months, the list isn’t long. There are maybe six or seven recurring patterns. Once you can name them, you can spot them in seconds.

This post is a tour of those patterns. Real before/after examples, mostly from sessions where I caught the bloat in time and pushed back. The point isn’t to dunk on AI-generated code — these patterns also show up in code I’ve written by hand, just less aggressively. The point is to give the patterns names so they’re easier to fix.

The mental check

Before I get into the patterns, the one-line test that matters more than any of them:

Would a senior engineer say this is overcomplicated?

That’s the check. It’s in the always-on rule file the agent loads at the start of every conversation. Most of what follows is just specific shapes that, when you look at the resulting code, the answer is clearly yes.

The check works because senior engineers — the kind who’ve maintained codebases over years — have a strong allergy to code that does more than the moment requires. They’ve seen too many cleverly-abstracted helpers calcify into legacy. They’ve debugged too many mock-everything tests. They know that 50 lines that anyone can read beats 200 lines that only the original author can extend.

The agent, asked to apply that check to its own output, becomes pretty good at simplifying. Not perfect, but pretty good.

Sumi-e study sheet on cream rice paper with hand-deckled edges, in a vertical landscape composition. Seven small individual ink studies are arranged across the page in a relaxed grid — three above, three below, one offset — each contained in its own quiet area of paper with generous breathing room between studies. Each study is a different natural subject rendered in extreme brush-economy of one to three strokes only, all sharing the same sumi-e ink language: a single bamboo stalk in three strokes; a single open enso circle drawn in one continuous brush sweep with a small gap at its closing point; a single mountain peak silhouette in one rolling stroke; a single fish in motion suggested by two curling strokes; a single dragonfly in three small marks; a single rolling wave with a tiny crest fleck; and a single birch leaf with a delicate stem and three vein hatches. The strokes vary in wetness — some confidently saturated, some dry-brushed and sketchy — but each study honors the sumi-e principle of saying as much as possible with as little ink as possible. A single small red square seal-stamp accents the lower-right corner of the page like an artist's signature. No readable text or characters anywhere in the composition.
Seven recurring shapes. Once you can name them, you can spot them in seconds — and refuse them.

Pattern one: a class for a single function

This is the most common one I see, especially in TypeScript and Python. A single function — useful, narrow, called from one place — gets wrapped in a class.

The agent’s first take:

class CategoryAssigner {
  private categoryMap: Map<string, string>;

  constructor() {
    this.categoryMap = new Map([
      ['this-blog', 'blog feature posts'],
      ['developer-tooling', 'practical posts'],
      ['strategy', 'workflow philosophy'],
      ['ai-work', 'reflective AI posts'],
    ]);
  }

  public assign(post: { tags: string[]; topic: string }): string {
    if (post.tags.includes('blog-feature')) return 'this-blog';
    if (post.tags.includes('philosophy')) return 'ai-work';
    if (post.tags.includes('reflection')) return 'ai-work';
    if (post.topic.includes('rule') || post.topic.includes('skill')) {
      return 'developer-tooling';
    }
    return 'strategy';
  }

  public describe(category: string): string {
    return this.categoryMap.get(category) ?? 'unknown';
  }
}

What’s wrong: the class isn’t doing anything a function couldn’t. The categoryMap is constructed in the constructor but never depends on anything per-instance. The describe method is dead — it’s never called by anything in the original task. The assign method is the actual work.

The simplification:

function assignCategory(post: { tags: string[]; topic: string }): string {
  if (post.tags.includes('blog-feature')) return 'this-blog';
  if (post.tags.includes('philosophy')) return 'ai-work';
  if (post.tags.includes('reflection')) return 'ai-work';
  if (post.topic.includes('rule') || post.topic.includes('skill')) {
    return 'developer-tooling';
  }
  return 'strategy';
}

A function. Fifteen lines including the signature. Same behavior. Reviewable in five seconds. If a second method ever genuinely needs to share state with the assignment logic, that’s the moment to consider a class — not before.

Pattern two: configuration for things that won’t be configured

The agent loves making things configurable. Even things nobody asked it to make configurable.

The first take, where I’d asked for a function that posts a Slack message:

class SlackPoster:
    def __init__(
        self,
        token: str,
        default_channel: str | None = None,
        username_override: str | None = None,
        icon_emoji: str | None = None,
        thread_ts: str | None = None,
        as_user: bool = False,
        link_names: bool = True,
        unfurl_links: bool = True,
        unfurl_media: bool = True,
        retry_count: int = 3,
        retry_delay: float = 1.0,
    ):
        self.token = token
        self.default_channel = default_channel
        self.username_override = username_override
        self.icon_emoji = icon_emoji
        self.thread_ts = thread_ts
        self.as_user = as_user
        self.link_names = link_names
        self.unfurl_links = unfurl_links
        self.unfurl_media = unfurl_media
        self.retry_count = retry_count
        self.retry_delay = retry_delay
        self._session = self._build_session()

    def _build_session(self): ...
    def post(self, channel: str, text: str, **overrides): ...

I had asked for a function. What I got was a class with eleven configuration parameters, three of which I’d actually use, and a session-builder method I hadn’t asked for.

The simplification:

def post_slack_message(token: str, channel: str, text: str) -> dict:
    response = requests.post(
        "https://slack.com/api/chat.postMessage",
        headers={"Authorization": f"Bearer {token}"},
        json={"channel": channel, "text": text},
    )
    response.raise_for_status()
    return response.json()

Six lines plus the signature. If we ever need the eleven configuration options, we can add them when we need them. YAGNIyou aren’t gonna need it — is one of the oldest pieces of advice in software, and it applies to AI-generated code with extra force because the agent has zero context on whether you’ll need the configurability later.

Pattern three: error handling for impossible scenarios

The agent has been trained on a lot of production code, which has a lot of defensive error handling, much of it for cases that can’t actually happen. That tendency carries over.

The first take, on a function that parses a known-good config file:

def parse_config(path: str) -> Config:
    if not isinstance(path, str):
        raise TypeError(f"Expected str, got {type(path)}")
    if not path:
        raise ValueError("Path cannot be empty")

    try:
        if not os.path.exists(path):
            raise FileNotFoundError(f"Config not found: {path}")

        if not os.path.isfile(path):
            raise ValueError(f"Path is not a file: {path}")

        if os.path.getsize(path) == 0:
            raise ValueError(f"Config file is empty: {path}")

        with open(path, "r", encoding="utf-8") as f:
            try:
                content = f.read()
            except UnicodeDecodeError as e:
                raise ValueError(f"Config is not valid UTF-8: {e}")

        try:
            data = yaml.safe_load(content)
        except yaml.YAMLError as e:
            raise ValueError(f"Invalid YAML: {e}")

        if not isinstance(data, dict):
            raise ValueError(f"Config must be a YAML mapping, got {type(data)}")

        return Config(**data)
    except (OSError, IOError) as e:
        raise RuntimeError(f"Failed to read config: {e}")

The function parses a config file the same script writes. The path is hardcoded, the file is always UTF-8, it’s never empty, it’s always a YAML mapping. Every single one of those error paths is unreachable in this codebase. They look thorough; they’re noise.

The simplification:

def parse_config(path: str) -> Config:
    with open(path, encoding="utf-8") as f:
        return Config(**yaml.safe_load(f))

Three lines. If the file is missing, Python raises FileNotFoundError and the message is clear. If the YAML is malformed, yaml.safe_load raises YAMLError with the line number. If the keys don’t match, Config(**...) raises TypeError with the offending field. The defaults are good. Adding twenty lines of pre-checks just hides the cleaner error you’d have gotten anyway.

The rule isn’t don’t handle errors. The rule is don’t handle errors that can’t happen. If validation matters at this layer because untrusted input flows through it, fine — handle it. If the input is known-good, lean on the standard library’s defaults.

Pattern four: speculative abstraction

The most subtle one. Code is written in a way that anticipates needs that haven’t materialized.

A version of this came up when I asked the agent to write a function that scores blog posts by tag relevance. The first take:

class ScoringStrategy(ABC):
    @abstractmethod
    def score(self, post: Post, query: Query) -> float: ...

class TagOverlapStrategy(ScoringStrategy):
    def score(self, post: Post, query: Query) -> float:
        return len(set(post.tags) & set(query.tags))

class TitleMatchStrategy(ScoringStrategy):
    def score(self, post: Post, query: Query) -> float:
        return 1.0 if query.text.lower() in post.title.lower() else 0.0

class CombinedScoringStrategy(ScoringStrategy):
    def __init__(self, strategies: list[tuple[ScoringStrategy, float]]):
        self.strategies = strategies

    def score(self, post: Post, query: Query) -> float:
        return sum(s.score(post, query) * w for s, w in self.strategies)

The Strategy pattern, a base class, three implementations, a combiner. Beautiful. Also unnecessary — the function I asked for has exactly one use case: rank posts by tag overlap on the search page. There’s no second strategy. There may never be one.

The simplification:

def score_post(post: Post, query: Query) -> float:
    return len(set(post.tags) & set(query.tags))

One line. If a second scoring strategy ever needs to coexist with this one, that’s the moment to consider extracting an interface. Pulling the interface out before the second implementation exists is solving an imaginary problem and shipping the structure that solves it.

This is the pattern Karpathy points at directly: no abstractions for single-use code. It’s the same advice the most experienced engineers I’ve worked with have given me for years. The agent will produce abstractions in the absence of explicit pushback. The pushback is the rule.

Pattern five: tests that test the framework, not the code

This one isn’t about the code itself — it’s about the tests the agent generates around the code.

The first take, on tests for a config parser:

def test_open_is_called_with_correct_path():
    with patch("builtins.open", mock_open(read_data="version: 1")) as m:
        parse_config("/path/to/config.yaml")
        m.assert_called_once_with("/path/to/config.yaml", encoding="utf-8")

def test_yaml_safe_load_is_called():
    with patch("yaml.safe_load") as m:
        m.return_value = {"version": 1}
        with patch("builtins.open", mock_open(read_data="version: 1")):
            parse_config("/path/to/config.yaml")
            m.assert_called_once()

Both tests pass. Both tests are testing that Python’s open and PyYAML’s safe_load got called. They aren’t testing whether the config is correctly parsed. If parse_config did the wrong thing — returned the wrong type, lost a field, miscoerced a value — these tests would still pass.

The simplification:

def test_parses_a_simple_config(tmp_path):
    config_file = tmp_path / "config.yaml"
    config_file.write_text("version: 1\nname: test")
    config = parse_config(str(config_file))
    assert config.version == 1
    assert config.name == "test"

One test. Uses pytest’s tmp_path to write a real file. Calls the real function. Asserts the real outputs. If parse_config regresses, this test catches it. The mocked tests would not have.

The rule: test the behavior the user cares about, not the implementation that produces it. Mocks are a tool for isolating things you can’t reach (network, time, randomness), not for proving the code calls the functions you expect it to call.

Pattern six: comments that narrate the code

A small one, but it adds up. The agent has a tendency to add comments that simply restate what the next line of code does.

The first take:

# Open the config file
with open(path) as f:
    # Read the contents
    content = f.read()

# Parse the YAML
data = yaml.safe_load(content)

# Return a Config instance
return Config(**data)

Every comment is true. Every comment is also redundant. A reader of this code can see that with open(path) as f opens the config file. Stating it again in a comment adds line count, not understanding.

The simplification: just remove the comments.

with open(path) as f:
    content = f.read()
data = yaml.safe_load(content)
return Config(**data)

The rule, in the always-on rule file: don’t add comments that just narrate what the code does. Comments should explain why, not what. The what is right there in the code. The why — the constraint that’s not visible from the code, the trade-off that was considered, the gotcha that bit somebody once — is what comments are for.

Senior code review consistently flags narration comments as noise. The rule encodes that signal so the agent doesn’t generate them in the first place.

Pattern seven: the worst one, the unrequested feature

The most expensive overengineering isn’t structural. It’s scope creep. You asked for one thing; the agent did seven.

A version of this came up when I asked the agent to add a draft: true flag to a single blog post. The first take included:

  • The change to the post.
  • A new utility function to filter drafts out of an index page.
  • A modification to the search page to also filter drafts.
  • A change to the build script to warn about draft posts.
  • A new section in the project’s documentation explaining the draft system.

All of that is plausibly useful. None of it was asked for. Each addition has its own implications, its own potential bugs, its own review surface. The task expanded from “edit one frontmatter field” to “design a draft system.”

The simplification:

 ---
 title: Some post
 date: 2026-05-02
 category: developer-tooling
+draft: true
 ---

One line. The post now has the field. If filtering and warnings are needed, they’re separate changes with their own scope, their own review, their own commit message. Bundling them into the diff for “add draft flag” makes that diff harder to review and harder to revert.

This pattern is hard to catch because each individual addition feels helpful. The fix is the surgical-changes rule from the previous post: every changed line should trace directly to the user’s request. Lines that don’t trace back to the request — even good lines, even well-intentioned lines — don’t belong in the diff.

What pushback actually looks like

The rule file is necessary but not sufficient. Even with the rule loaded, the agent will sometimes produce overengineered output, especially on the first pass. The remaining work is mine: I have to actually push back when I see one of these patterns.

What pushback looks like, in practice, is short and direct.

  • “This doesn’t need a class. Make it a function.”
  • “Drop all the configuration parameters except the two we use.”
  • “Remove the error handling for cases that can’t happen.”
  • “Drop the abstract base class and the single concrete implementation. Just inline the function.”
  • “I asked for one change. Reduce the diff to that change.”

The agent always complies. The next iteration is always tighter. The cost of one round of pushback is small. The cost of accepting the overengineered version is that you’ve now committed it, and removing it later is more expensive than refusing to add it now.

The compound effect

Each of these patterns, individually, costs maybe ten minutes of review and a paragraph of pushback. None of them is a catastrophe.

The compound effect, across a few months of working with an AI coding agent, is that the codebase either stays clean or starts accumulating bloat. Codebases that accumulate bloat compound the wrong way — every new addition is harder than the last, every refactor is bigger than it should be, the working surface becomes legacy in months instead of years.

So I’ve come to think of the simplicity check as one of the most important behaviors in the always-on rule. The other three behaviors — think before coding, surgical changes, goal-driven execution — are about how the agent works during a task. Simplicity is about what survives after. It’s the rule that keeps the codebase liveable.

The senior-engineer test does most of the work. Apply it constantly. Refuse the 200-line solution that wants to be 50. Most of the time, simplifying isn’t a regression — it’s the actual answer that was hiding behind the elaborate one.

Further reading