Prototyping with LLMs

A few reflections on what I’ve been building (with) lately:

  • llm is great for prototyping many kinds of workflows. If you’re thinking “I’d like to build an app with some intelligence” and “I don’t mind tinkering with CLI apps”, give it a go. In particular, templates and fragments are very useful for assembling the rudiments of problem solution.
  • Part and parcel with using llm, I’m tinkering with locally runnable models via Ollama. On my M3 MacBook Pro with 24 GB of total memory, Qwen3 and Mistral are small enough to fit into GPU memory and run pretty quickly. With “thinking” disabled (the open models will spend many tokens talking to themselves!), they are fast enough for development work and experimentation. They definitely aren’t at the same level as the latest from Anthropic or OpenAI. But, the future is promising for using these smaller models instead of relying on metered API access for every single token of intelligence.
  • Putting those two together, I’m hacking out some tools to help with job search. My goal is to reduce the effort to see if a job description matches what I’m looking for, generate ideas for customizing a cover letter to the role, and provide useful answers for any pre-interview questions. Next step: put an actual UI on my llm-based prototype.
  • To wit: I’m studying up on Python, FastAPI, HTMX, etc. by asking Claude Code to write learning projects and then asking it questions about why it wrote them that way. Turns out, this helps me with language idioms and library setup pitfalls. Wild times!

A thing about homeownership is that every room is a workshop until it becomes whatever room it’s going to be.

This seems to hold for brand-new homes and old homes. Sometimes, for rooms you’ve been using one way for years, and now you want to use them for a different purpose. Or, because you suddenly get intensely bored with their current layout.


It’s unfortunate, but true, we can’t sensibly integrate all the great productivity tools out there into our workflows.

I wish I could find a way to put Bike, Antinote, and Agenda into my (daily) workflows. 🤷‍♂️


Systems expansive enough to warrant a course, discussion forum, or orthodoxy may not solve your problem.

I was thinking of knowledge management systems when I wrote this down. But it’s probably more broadly true than just organizing your tasks and notes. 🤔

Related: I Deleted My Second Brain.


I’ve been writing about estimating and planning a lot lately, so this is worth sharing:

Numbers are not the goal. Successful outcomes are.

Those footing the bill would rather have a successful outcome than an excuse to blame the estimator. In exploring better estimation, I’m not looking to protect the estimator or development organization from censure or litigation after a disappointment. Instead, we use estimation to help guide the project to success from the point of view of both those paying for development and those performing it.

— George Dinwiddie, Software Estimation Without Guessing

When I’m feeling lousy about an estimate, it’s usually because I’m treating it like a promise instead of what it actually is: a piece of the puzzle for deciding what to do next.

The tricky bit is setting expectations up front: this estimate is based on what we know today, not what we’ll discover tomorrow. Everyone knows surprises will happen. The estimate should help the team make better decisions when they do, not box them into promises they can’t keep.

The best estimates I’ve given weren’t the most accurate—they were the ones that helped teams navigate uncertainty instead of pretending it away.


We tested for tedium

Folks are ruffled that LLMs, even without tool use, are pretty good at coding interview challenges. How will we even know who is good at coding anymore? (Current hypothesis: by collaborating with them as early as possible.)

Friend, I am here to tell you that LLMs are not the problem. The issue is we were evaluating programmers on speed and competency through the tedious parts of programming. If we ever looked for what makes a programmer effective, it was accidentally. 🫠


In retrospect, we had mediocre insight into what activities are effective use of a programmer’s time. At best, we were using a metric from decades ago, when knowledge of algorithms was rare and solving arcane puzzles was more valuable. The past twenty years of human knowledge accumulation, into reusable libraries, on the web and then distilled into language models, has obviated much of that work.

That’s not even counting all the tedium we didn’t even realize we created. Generating projects with their dependencies and libraries all lined up just the right way (looking at you, C and JavaScript ecosystems), upgrading those libraries, configuring our development environments, accreting unruly towers of software dependencies that grew an industry (software supply chain management), even the seemly simple task of “commit my work to the correct branch and fix things up when I realize I was on the wrong branch when I committed”. 🙃 All of that tedium is required toil for successfully developing software, but none of it contributes to solving problems for people.


The wrong insight here is that developers will be replaced by LLMs that can quickly implement a linked list. (And much more, if you supervise them well enough.) Maybe the 10x developers were really only good at tedium or algorithm tests. 🌶️

A better insight is that Leetcode and similar schemes were probably never that great. But, they were easier (and perhaps better on average🤷‍♂️) than developing a well-considered coding exercise yourself. Silver lining: it’s an object lesson that you don’t need the best product to succeed in the market.

The best insight: way more of software development is tedium than we previously guessed. And, it’s within our grasp to automate those bits away. If we can reclaim the parts of our brains that remember arcane shell commands or know how to rebase our way out of a sticky git conundrum, all the better.


Communicating early and often is a great hack, and easy to do:

In the end, write the docs you want to write. If no one reads them, or if readers find they are out of date, then consider not writing them next time. But don’t let anyone shame you into wasting time. The question is not, “Do you have documentation?” but rather, “Do you communicate clearly?”

– Kent Beck The Documentation Tradeoff)

“Write documentation” is a tidy but unsubtle maxim. “Tell people what you did” and “help people use your software for the right thing” are better starts. Of late, “invest in written documentation for onboarding humans and agents” is an even better suggestion.

As long as you’re telling people that your thing exists, here’s when you should use it, and here’s how to use it, your documentation bases are covered.


Backyard Coffee And Jazz In Kyoto, Japan:

One of the things I read about while getting ready for our vacation in Japan were these famous tiny businesses: bars or izakayas with four seats, narrow little bookstores or record shops in people’s houses or the bottom floors of small buildings, hyper-specialized or themed bars owned by one passionate guy. (There’s one that’s chock-full of Star Wars memorabilia, for example.)

Ain’t this a friend of the gestalt? I’m daydreaming of how to run a pop-up software and coffee shop in my driveway, whilst somehow steering clear of neighborhood associations and zoning regulations. Maybe more of a lemonade stand than a “shop”. 😉


Exit Codes Are Having a Moment

LLMs (and agent coding tools in particular) love a fitness function. Give them a tool that indicates success or failure and they’ll go to town. Compiler errors, linter warnings—they love that stuff.

They also seem to love deleting failing tests—maybe they’re more human than we’d like. 😅

Never before has the exit status of Unix commands been so important—the clarity of errors, logging messages, and progress displays are all suddenly crucial. Well-written text, whether it’s a prompt, error, or informative log message, nudges LLMs towards the right next step just like it would for a human.

Any task with a decent fitness function—an LLM will handle it soon. Currently, they’re limited by human response times and gumption. But once we’re confident enough in their performance and the safeguards we’ve put up, a lot of them will be out there, just doing stuff and starting tasks based on what they predict needs doing. Wild times ahead!


Differential Coverage for Debugging. Take a failing test. You have no idea where the underlying issue might be. The system you’re working with is possibly deep, wide, and eldritch.

Your move:

  • run the tests and capture code coverage without the failing test
  • run the tests and capture code coverage with the failing test
  • diff the line-by-line code coverage(s) to see which lines are executed in the failing case only
  • eliminate further red herrings by mental elimination

Very clever! Via Thorsten Ball.


So your estimates were wrong

AKA “Help! I estimated a project and hit every branch falling down the surprises tree.” A shocking turn of events that definitely has never happened to any of us.

Don’t worry too hard when it turns out estimates weren’t accurate. They were optimistic guesses based on incomplete information and optimism that everything would go right—which it rarely does.

Do communicate to your stakeholders. Tell people whose work depends on your project about delays as soon as you know. Communicate now instead of waiting for the next scheduled update.

Don’t let your team think they have failed. If they executed well but surprises came up, tell them so explicitly. If they aren’t executing well, address that separately.

Do re-evaluate your plan with fresh estimates. If the deadline is fixed, cut scope. You can’t count on catching up by working smarter or having better luck.

Don’t ask people to work harder or longer to catch up. There’s no catching up, only keeping pace, reducing scope, or slipping the deadline.


Take an album track, in this case Daft Punk, deconstruct it, and then reconstruct from scratch in Ableton? That’s a cool hack.

I must admit something: while I consider Daft Punk’s first two albums absolute masterpieces, I’ve never been a huge fan of their guitar tones—especially their later collaborations with Nile Rodgers.

Recreating Daft Punk’s Something About Us

I disagree about Nile Rodgers! The thin-ness of his sound is what makes him distinctive. Chicken grease chords, in a different context.


I was talking to a pal about how some software developers are able to operate in a sort of timeless way. If contemporary tech, the stuff that made “tech people” the antagonist, is on one end of the spectrum, these developers are all the way over on the other end.

Picture a sleepy strip mall: grocery store, clothing store, paint store (why is there always a paint store?), and then, improbably, a software store. Inside, a proud but humble proprietor.

“What do you have for sale?”, you ask.

“One application, available on two or three platforms, you can buy ‘em all for less than $99”, they reply.

“You got anything else?”

“Nope, just this one program.”

Maybe you buy it, maybe you ask a few questions. Either way, the owner heads back to the workshop and keeps at their thing.

It makes an okay living. There are a couple of other people back in the workshop, helping to fix bugs, add minor new features, and keep things moving along. But it hasn’t grown to planet scale, doesn’t require astronomical growth to make payroll and pay holiday bonuses.

I’ve conveniently left out all the other small business things they have to do: accounting, marketing, so many kinds of insurance, budgeting, etc. Small businesses are all alike in that way.

It’s really great that the curious shop in the strip mall of the web, the personal website, continues to forge ahead. Maybe you call it a blog, a webring, a portfolio site, whatever. It’s still out there, doesn’t need astronomical growth to satisfy end-game finance or investors. It just is.


Another month, another Cars and Coffee event. Honestly, 75% of it was Porsche 911s of all vintages. But, I tried to work cars with other names or numbers in there too.

Porsche 911s of all ages and colors

Experience the Rennbow.

A black Mercedes 500 SEC

Lovely, nostalgia. A friend’s dad had one of these leviathans back in the day.

An overland-fitted Porsche Cayenne Transsiberia

As adventure Porsches go, this one is delightfully fitted and unique.

It’s a police car, but it’s a sports car

Porsche police cars, they were a thing.

“If you’re not first, you’re last” on the back of a McLaren

I will always abide a Talladega Nights reference.

Gold badging on the back of a bespoke Porsche 911

Porsche Heritage examples are not cheap, but they are tasteful.


Anyone miss running Linux in the 1990s/2000s and leaving a visualizer, xscreensaver, or some kind of demo running while working on other things? Nostalgia is a heck of a thing. 😆 Improbably, Apple Music still has something close to Winamp/XMMS visualizations.


Maintain fewer and smaller backlogs

Easy advice to give, tricky to implement. Sometimes it’s effective, sometimes it’s a wash. Like rearranging for the sake of rearranging. But when it is effective, it creates mental space. And sharpens the skill of editing.

I have entirely too many items in entirely too many lists. De facto backlogs. Watch laters, read laters, tasks, notes with “TK marks”, boards holding dozens of ideas that would be nice to flesh out, someday…

All those “maybe somedays” present a non-trivial mental burden. Is today the right day for one of them? Will these lists ever get shorter? … Were it only that just the right action, supporting material, and mindset arrived all at the same time!

Mind (and trim) your backlogs

This is about organizing and cutting, not productivity. It may sound like weird tricks for squeezing the last bit out of every hour, but it’s not. It’s about developing the ability to do more of the good stuff so you can keep energy in reserve for when it matters.

A lot of the thinking about how to maintain a great system of lists, tasks, emails, notes and supporting material is crafting a DO NOT ERASE system for that one Truly Awesome Idea. Trouble is, they save too many Just Okay Ideas that crowd out the Truly Awesome Ideas.

Our brains have so many ways to erase an idea. Sleep consolidates memories, sometimes indiscreetly. We say something true or brilliant but have to move on to the next calendar event or get back to our routine before we can write it down. Or we simply walk through a door. Suddenly the idea is gone. Memory consolidation is our brain’s editing process. And mostly, it works!

Offloading mental burden to lists is a great hack, on balance. But develop and sharpen your brain’s natural forgetting process too. Every idea can’t be a winner.


On one hand, there’s the classical music catalog mode of blogging. This is a bookmark, that’s a note, a photo here, a link there, an essay when I’m feeling ambitious. Étude, sonata, variations on a theme. Framing the writing by its scale and shape. A taxonomist’s delight.

Then there’s the Jack White sort of blogging: here’s a thing, I made it. The edges weren’t worked down too much, the roughness is still there. It preserves the energy and spirit of the idea more than it tries to explore the idea to its logical end. Better to cut ideas too soon than coddle them.

Both work! Blogging is a big, weird tent.


I miss writing “DO NOT ERASE” on whiteboards when a collaborator and I finally succeed at capturing the mental or visual model of a problem or its solution.

It’s like writing “a good thing happened here today” on the whiteboard.

Close second place: the act of crossing an item off your to-do list, particularly one on paper.


What’s a high-level language? Is it Java, Go, and C#? Or is it Ruby, JavaScript, or Python? Where does Swift go?

It’s all a matter of perspective. To a kernel developer who thinks about bytes on a wire or spinning a disk to read a file, git is incredibly high-level. 🙃 To a Smalltalk developer, files on disk are low-level (I assume).

Anyone telling you otherwise, including past or future Adams, is making a no true Scotsman argument.


📺Currently watching:

  • The second (and final) season of Andor was spectacular. The best Star War of the past ten years, IMO.
  • S3 and S4 of Slow Horses continue to delight. Capable but slovenly Gary Oldman is a great character.
  • I’m surprised how much I look forward to The Studio every week. Highly recommended, even if Seth Rogen isn’t your thing.
  • S4 of Hacks plays up tension between the leads more than I’d like. Otherwise, the writing and performances are excellent.
  • Reserving judgement on S1 of Murderbot. The first episode did not grab me, but the second was an improvement.
  • Upcoming: Not sure how I feel about The Bear. But, my feelings won’t change how award shows go, so maybe I should just go with it. 🤷🏻‍♂️