Good enough to get going

The winning scenario for agent-assisted code, design, science, etc. is humans having more time to do creative and impactful thinking because computers/LLMs do the tedious setup, easily verified work, and gather preliminary materials that humans turn into inventions.

FWIW, I don’t think the worst scenarios are likely. The future isn’t atrophying literacy rates or people turning off their brains to tell LLMs what to do. It’s probably not Malthusian job scarcity or Keynesian leisure abundance, either.

The best outcome, IMO, is that producing almost-good-enough software, design, science, etc. is possible for more people, particularly those without specialist degrees.

You won’t have gym owners producing billion dollar SaaS companies, but they might produce software good enough to run their business without needing to contract out to a software developer.

You won’t have software developers producing the same level of design and art direction you see in major films. You might see them producing design good enough and sufficiently distinct that they can wait to bring a designer on until they’ve found their market.

You won’t have writers discovering new axioms of math and science, but you might see them correctly apply statistics and physics so that stories about finance and space battles are slightly more realistic.😉

In short: experts in topic A won’t find themselves held back by having an idea that requires expertise in topic A and topic B, where topic B is too deep for them to “just get good at”. Fewer Wozniaks will have to find their Steve Jobs, fewer Springsteens will have to find their Landaus.

It won’t exactly be you can just do stuff. But, perhaps you can get far enough along that collaborators to fill in the specialties you don’t can find you.


This is a fancier way of saying that it’s unfortunate that naps don’t make one feel more energetic afterward:

So, if you read through the Philokalia, it is very common to find the same author alternating between talking of thoughts (logismoi here referring to negative, vicious thoughts that could lead you astray) and demons. Acedia, the vice of listlessness and despondency, is called, for instance, ‘the noonday demon.’ Sometimes it is treated as an internal struggle; other times it reads as if a personified demon is the cause of these temptations; if Graiver is right, then these are merely two ways of experiencing and conceptualizing the same phenomenon.

— Jared Henderson, Asceticism of the Mind

I give them grief, but I do love afternoon naps.


💤 Midday disappointments:

  • A dip in energy is not fixable by napping, and napping does not restore one’s energy
  • A superb nap is restorative, but difficult to predict or consistently generate
  • A short nap might make you feel better, but rarely more energetic
  • Coffee after a certain hour is contrary to sustainable energy and really only papers over the problem

Multitasking with my eccentric pal

Over the course of two experiments within two half-day coding sessions, I better glimpsed what working with coding agents might look like over the next couple of years.

1. I teach the agent to build a simple Rails CMS

This is pretty unremarkable, given my experience. But, like so many lately, I wanted more reps at exploring agent coding workflows. In particular: what works for guiding an agent and keeping it on-track?

I did a bit of the upfront setup with wits alone. Executing the correct boilerplate is where our LLM assistants seem to most reliably get themselves into early trouble. Plus, Rails’ whole thing is accelerating this part of a project. If somehow it were slow-going, it would be a tremendous failing of the framework. But it wasn’t!

Next, I set up the basic models I wanted for this particular little CMS. Again, a way that an LLM often goes off the path is trying to build new models just right. So I did that part myself. Perhaps, this is also where I’m most opinionated about getting things right. 🤷🏻‍♂️

Finally, I made sure the app had good guardrails in place. Namely, Justfile tasks to run tests, rubocop checks, and brakeman. Claude Code seems to do really well given those kinds of guidance, so setting them up up-front makes sense.

From there, I let the machine take the wheel. I gave it brief instructions, sometimes using plan mode to build up context first, and let it go. Document and element model CRUD, done. Tweaking a DaisyUI theme to look a little more Swiss design or Bauhaus-y, done. Build up a document editor that works with the document and element models at the same time, done.

Result: taking a little time up-front to lay the right foundation and guardrails makes agent coding way better.

2. The agent teaches me about LLM evals

This one is (unintentionally) the opposite of what I was doing with Claude and Rails. In that case, I was working with Claude as the expert, guiding it towards an outcome I knew was easily within reach for this project.

For this project, I was experimenting with using Claude to teach me how to do LLM evals. My understanding is that these are super important for building AI-based apps; they are as close as you can get to a TDD-style feedback loop to verify your work when working with nondeterministic language models.

To start, I used Claude (not Code) and asked for a starting point to learn LLM evals. I wanted to get something simplistic, like a hello world of that problem domain. I would rather not set up multiple ML frameworks (looking at you, Conda), so I said let me run these models locally, and I’m fine with using Python.

Claude generated some Python code, which I pasted into a new uv project and got started. From this point forward, I was using Claude Code to restructure code, like moving code into source files out of Jupiter notebooks or putting code plus explanations back into Jupiter notebooks. I’d also ask why a particular approach was taken or what jargon means.

In this case, I was really using Claude code to teach myself something new, and it was the expert guiding me. For something like this where I’m not caught up in the specifics of machine learning and its problem domain, it’s a perfect approach.

Result: I got way ahead on a new-to-me topic over the course of a couple of hours.

3. 💥 At the same time

The big reveal is, I was working on these two projects at the same time. Two terminal tabs, two Claude Code instances. “Talk” to the first one, let it run while I talk to the second one. First one finishes, talk to it, rinse and repeat.

At first, this was like pairing with two developers working on entirely different projects. And they’re sitting on either side of me, sometimes attempting to talk at the same time. My brain hurt and it felt like I was moving slow enough that single-tasking would have been far more effective.

After 60 minutes or so, things seemed to pick up. I felt like I was doing a better job of bouncing back-and-forth between the two agents. I was quickly improving my prompts, allowing the agents to work more effectively while I was working with the other agent. However, I wouldn’t say I mastered multitasking in one hour. More like, I came to grips with keeping each agent going, like keeping a room full of eager developers moving independently despite needing frequent directions.

It’s still crucial to taste test the work frequently. Product-focused developers/managers will have a leg up here because they’re used to working this way. I think anyone who has good attention to detail and the ability to communicate what they want is going to have an advantage here.

From all this, I learned a few things:

  • Working with agents is going to look similar to multitasking and delegation.
  • To keep up with multitasking, humans will need to get as good at context management as agents (we both have limited context windows).
  • Often, we’ll mentor agents to get the best results, particularly by giving them good guardrails (automated and dynamic context generation).
  • In some cases, the agents can teach us just enough to get stuff done.
  • For better or worse, there’s still much to explore, and the frontier is moving quickly.

📚Currently reading:

Ursula LeGuin, The Language of the Night. A series of essays about writing, science fiction, and LeGuin’s own works. The only novel of hers read is The Dispossesed, but I’m enjoying the heck out of her non-fiction writing and thinking. It’s pretty timely, despite being fifty years old in most cases. Bonus points for LeGuin being a longtime Portland, Oregon resident as well. And, now I really want to read essays and short non-fiction from other favorites like Vonnegut, Stephenson, etc.

James Gleick, Genius. Gleick isn’t quite Caro, but his non-fiction writing really pops. And how could it not, with a subject like Feynman?

Frank Herbert, Children of Dune. Messiah was a bit short, and felt like nothing happened, even though it rather upends the story you’re expecting to get. This one feels like a bunch of scheming and setting of the dramatic tables. Not as much world building as the original Dune. Unsure if I’m up for reading all six of these books. I’ve heard it gets even weirder though!

Craig Mod, Things Become Other Things. I’m, obviously, a huge fan of Craig’s writing. There’s less “meta” about the actual walking in this one. More of an interwoven story of Mod’s childhood friend, observations of the people and places he meets whilst walking, and his own life story. It’s a lovely bit of memoir.


The problem with liking anything besides “ahead of their time” musical groups from the 90s like Radiohead or Wu-Tang Clan:

  1. Everyone who was a teenager in the 90s has one or two favorite bands or albums that they hold dearly and honestly.
  2. The problem is, basically every musical group from that era has its legitimate detractors.
  3. Thus, 90s kids will have to defend their choice to others for the rest of their known lives.

(Mine are Red Hot Chili Peppers, Primus, and Rollins Band. 🤘🏻🤷🏻‍♂️)


Yesterday I coded with only my wits for the first time in a while. It was pretty great. Not quick, but educational. Not efficient, but full of that great feeling when challenges are tackled with one’s own intelligence and experience. Do recommend.


Coding agents create an opportunity for shorter coding times, faster iteration, and shorter feedback loops. That opportunity is wasted if we don’t solve for all the reasons a software project can go sideways that aren’t “we couldn’t type fast enough”.

“Lowering the cost of writing code was the thing that no engineering leader asked for, but it’s the thing we got.”

— Kellan Elliott-McCrea, Vibe Coding for Teams, Thoughts to Date

“The last part: once you’ve created a situation where failure is safe … you need to be prepared to let failure happen.”

— Jacob Kaplan-Moss, Make Failure A (Safe) Option

If teams are (still) afraid to slip a deadline, accidentally ship a bug, or cause an unforeseen performance problem, there’s still a problem. All the LLMs and coding agents in the world won’t fix it for you.

The more failure is a socially acceptable reality and tool-supported via feature flags, fast rollbacks, observability, blue-green releases, etc. the better your team will operate. A small part of this is building better tooling for detecting, recovering, and remediating surprises. The majority of the effort is in leaders supporting the team and the team supporting each other in stressful times. 🧠


Computers are also free to be weird:

I’m a programmer. You’re probably a programmer. We think in systems, deterministic workflows, and abstractions. What’s more natural for us than viewing LLMs as an extremely slow kind of unreliable computer that we program with natural language?

This is a weird form of metaprogramming: we write “code” in the form of prompts that execute on the LLM to produce the actual code that runs on real CPUs.

— Mario Zechner, Taming agentic engineering - Prompts are code, .json/.md files are state

This workflow, for tedious porting of animation engine code to many languages and targets, has a lot of human-in-the-loop affordances. If I were a betting man, I’d wager on human supervision entering the lexicon as context engineering and vibe coding fade in the hype cycle.


I kept an 8-week streak of writing and publishing a newsletter essay going. I love the output, but not that it absorbs almost all of my writing energy. Back to re-thinking the format on that one.

“It ships because it’s 11:30am” was useful in setting that streak. But “focus on the 2 or 3 most important things, let everything else sit sadly in the corner” is also a handy regimen. In this case: 1) job searching, 2) get a newsletter out there consistently.

Courtney and I are heads-down at furnishing and tidying our basement, so folks can visit. I’m going to count that work as a win for “building things”, even though it’s not software or writing.📈


📺 Currently watching:

  • MurderBot: I might have cast this differently, but I’m still enjoying it.
  • Poker Face: still an absolute delight.
  • The Bear: they push me away with all the yelling, they pull me back in with “people doing a creative thing together” montages.

Prototyping with LLMs

A few reflections on what I’ve been building (with) lately:

  • llm is great for prototyping many kinds of workflows. If you’re thinking “I’d like to build an app with some intelligence” and “I don’t mind tinkering with CLI apps”, give it a go. In particular, templates and fragments are very useful for assembling the rudiments of problem solution.
  • Part and parcel with using llm, I’m tinkering with locally runnable models via Ollama. On my M3 MacBook Pro with 24 GB of total memory, Qwen3 and Mistral are small enough to fit into GPU memory and run pretty quickly. With “thinking” disabled (the open models will spend many tokens talking to themselves!), they are fast enough for development work and experimentation. They definitely aren’t at the same level as the latest from Anthropic or OpenAI. But, the future is promising for using these smaller models instead of relying on metered API access for every single token of intelligence.
  • Putting those two together, I’m hacking out some tools to help with job search. My goal is to reduce the effort to see if a job description matches what I’m looking for, generate ideas for customizing a cover letter to the role, and provide useful answers for any pre-interview questions. Next step: put an actual UI on my llm-based prototype.
  • To wit: I’m studying up on Python, FastAPI, HTMX, etc. by asking Claude Code to write learning projects and then asking it questions about why it wrote them that way. Turns out, this helps me with language idioms and library setup pitfalls. Wild times!

A thing about homeownership is that every room is a workshop until it becomes whatever room it’s going to be.

This seems to hold for brand-new homes and old homes. Sometimes, for rooms you’ve been using one way for years, and now you want to use them for a different purpose. Or, because you suddenly get intensely bored with their current layout.


It’s unfortunate, but true, we can’t sensibly integrate all the great productivity tools out there into our workflows.

I wish I could find a way to put Bike, Antinote, and Agenda into my (daily) workflows. 🤷‍♂️


Systems expansive enough to warrant a course, discussion forum, or orthodoxy may not solve your problem.

I was thinking of knowledge management systems when I wrote this down. But it’s probably more broadly true than just organizing your tasks and notes. 🤔

Related: I Deleted My Second Brain.


I’ve been writing about estimating and planning a lot lately, so this is worth sharing:

Numbers are not the goal. Successful outcomes are.

Those footing the bill would rather have a successful outcome than an excuse to blame the estimator. In exploring better estimation, I’m not looking to protect the estimator or development organization from censure or litigation after a disappointment. Instead, we use estimation to help guide the project to success from the point of view of both those paying for development and those performing it.

— George Dinwiddie, Software Estimation Without Guessing

When I’m feeling lousy about an estimate, it’s usually because I’m treating it like a promise instead of what it actually is: a piece of the puzzle for deciding what to do next.

The tricky bit is setting expectations up front: this estimate is based on what we know today, not what we’ll discover tomorrow. Everyone knows surprises will happen. The estimate should help the team make better decisions when they do, not box them into promises they can’t keep.

The best estimates I’ve given weren’t the most accurate—they were the ones that helped teams navigate uncertainty instead of pretending it away.


We tested for tedium

Folks are ruffled that LLMs, even without tool use, are pretty good at coding interview challenges. How will we even know who is good at coding anymore? (Current hypothesis: by collaborating with them as early as possible.)

Friend, I am here to tell you that LLMs are not the problem. The issue is we were evaluating programmers on speed and competency through the tedious parts of programming. If we ever looked for what makes a programmer effective, it was accidentally. 🫠


In retrospect, we had mediocre insight into what activities are effective use of a programmer’s time. At best, we were using a metric from decades ago, when knowledge of algorithms was rare and solving arcane puzzles was more valuable. The past twenty years of human knowledge accumulation, into reusable libraries, on the web and then distilled into language models, has obviated much of that work.

That’s not even counting all the tedium we didn’t even realize we created. Generating projects with their dependencies and libraries all lined up just the right way (looking at you, C and JavaScript ecosystems), upgrading those libraries, configuring our development environments, accreting unruly towers of software dependencies that grew an industry (software supply chain management), even the seemly simple task of “commit my work to the correct branch and fix things up when I realize I was on the wrong branch when I committed”. 🙃 All of that tedium is required toil for successfully developing software, but none of it contributes to solving problems for people.


The wrong insight here is that developers will be replaced by LLMs that can quickly implement a linked list. (And much more, if you supervise them well enough.) Maybe the 10x developers were really only good at tedium or algorithm tests. 🌶️

A better insight is that Leetcode and similar schemes were probably never that great. But, they were easier (and perhaps better on average🤷‍♂️) than developing a well-considered coding exercise yourself. Silver lining: it’s an object lesson that you don’t need the best product to succeed in the market.

The best insight: way more of software development is tedium than we previously guessed. And, it’s within our grasp to automate those bits away. If we can reclaim the parts of our brains that remember arcane shell commands or know how to rebase our way out of a sticky git conundrum, all the better.


Communicating early and often is a great hack, and easy to do:

In the end, write the docs you want to write. If no one reads them, or if readers find they are out of date, then consider not writing them next time. But don’t let anyone shame you into wasting time. The question is not, “Do you have documentation?” but rather, “Do you communicate clearly?”

– Kent Beck The Documentation Tradeoff)

“Write documentation” is a tidy but unsubtle maxim. “Tell people what you did” and “help people use your software for the right thing” are better starts. Of late, “invest in written documentation for onboarding humans and agents” is an even better suggestion.

As long as you’re telling people that your thing exists, here’s when you should use it, and here’s how to use it, your documentation bases are covered.


Backyard Coffee And Jazz In Kyoto, Japan:

One of the things I read about while getting ready for our vacation in Japan were these famous tiny businesses: bars or izakayas with four seats, narrow little bookstores or record shops in people’s houses or the bottom floors of small buildings, hyper-specialized or themed bars owned by one passionate guy. (There’s one that’s chock-full of Star Wars memorabilia, for example.)

Ain’t this a friend of the gestalt? I’m daydreaming of how to run a pop-up software and coffee shop in my driveway, whilst somehow steering clear of neighborhood associations and zoning regulations. Maybe more of a lemonade stand than a “shop”. 😉


Exit Codes Are Having a Moment

LLMs (and agent coding tools in particular) love a fitness function. Give them a tool that indicates success or failure and they’ll go to town. Compiler errors, linter warnings—they love that stuff.

They also seem to love deleting failing tests—maybe they’re more human than we’d like. 😅

Never before has the exit status of Unix commands been so important—the clarity of errors, logging messages, and progress displays are all suddenly crucial. Well-written text, whether it’s a prompt, error, or informative log message, nudges LLMs towards the right next step just like it would for a human.

Any task with a decent fitness function—an LLM will handle it soon. Currently, they’re limited by human response times and gumption. But once we’re confident enough in their performance and the safeguards we’ve put up, a lot of them will be out there, just doing stuff and starting tasks based on what they predict needs doing. Wild times ahead!