The joy of finishing

Then, the finish. Stain. Wipe. Wait. Stain. Wipe. Wait. Sand. Wipe. Stain. Wipe. Wait. Check. Seal. Wait. Sand. Wipe. Seal. Wait. Sand. Seal. Wait. And that’s if everything goes according to plan.

Finishing. It’s about woodworking. Or everything.  Or sometimes an essay about furniture is just an essay about furniture.

Let the right something in

Bridge enroute to Chilas

There will always be more somethings we want to do than we have time to do. Right? Maybe.

  1. A lot of the right somethings can add up to a great thing, even if the somethings aren’t of the highest quality or express the biggest idea.
  2. A lot of the wrong somethings aren’t that interesting, unless your work is generally of great enough import that historians take an interest in it.
  3. A lot of the wrong somethings may not add up to much at all and are unlikely to attract the interest of historians.
  4. If you don’t care about whether something’s great, you can produce a lot of somethings.
  5. If you don’t care if something expresses a big idea, you can produce a lot of somethings.

A lot of truisms will tell you that quality is the important thing and quantity is secondary. But perhaps there are all sorts of cases where that’s not entirely true.

Mozart wrote way more music than Beethoven; Beethoven’s was more sophisticated but their bodies of work are considered on the same level. There are way more episodes of Law and Order and all its spin-offs than Breaking Bad; one made more money, one gathered more acclaim.

Rather than deciding to pursue quality over quantity, perhaps it’s better to:

  • Choose your somethings with care
  • Execute on the idea central to those somethings
  • Produce as many somethings as possible without hating the quality of your work

Not that kind of log

Logs from Finland

First, read all of this excellent distillation of distributed systems by Jay Kreps,  The Log: What every software engineer should know about real-time data’s unifying abstraction. Now, consider this.

There’s a moment, when you’re building services and web APIs, when you think you’ve pretty much got it under control. You’ve got an endpoint for every query, a resource for every workflow. All the use-cases seem to be under control. And then, the question appears:

“How can I get access to all the updates to all the data? You know, for [REASONS].”

For APIs exposed to external developers over the web, this is where you’d reach for web hooks or PubSubHubbub. It’s not the best solution, but it works. If you’re building an internal system, you could  use the same approaches, or…you could build a log.

No not that kind of log. An event log, like LinkedIn did with Kafka for their internal systems. Every time your data model changes, every create, update, or delete, you drop an event with all the metadata related to the change. The event goes into some kind of single-producer, multiple consumer queue. Then all the clients that want to know about all the changes to all the things can read events off the queue and do whatever it is they need to for those important REASONS.

If you find this intriguing, this is a lot like replication in database systems. Definitely read LinkedIn’s article on this, and definitely read up on how your database of choice handles replication. And if you’ve built this before and have a good answer to initially populating “replicas” of a database, let me know; I haven’t come up with anything better than “just rsync it”.

Dear Sync Diary

Synchronize icon

Brent Simmons is keeping a diary as he works through implementing sync for Vesper, an iOS note-taking app. Building this sort of thing isn’t easy; cf. it took Cultured Code multiple years to implement it for Things. Thus it’s pretty neat that Simmons is breaking it down into understandable chunks:

If you think you need to implement a synchronization system for your application, try to find a shortcut so that you don’t. If you can’t find a shortcut, you could do worse than starting with these notes.

Aliens ate my program’s state

So, Adam from a few years ago, you think you can build a distributed system?

Designing a concurrent system, one that runs across multiple processors using shared memory threads, is difficult because someone can move your cheese. One moment you’ve got food on your plate, and the next moment it’s on someone else’s plate.

Designing distributed systems, one that runs across multiple computers over networks that occasionally do weird things, is difficult because you can’t assume that the message will get through. One moment you’re telling your friend about an awesome new album and the next moment they don’t even know who you are anymore.

In both scenarios, you can’t assume the simplest case. You have to discard Occam’s razor; it’s entirely possible a perfectly devious alien is tinkering with your system, swallowing messages or preventing threads from running. For every operation in your system, you have to ask yourself what could go wrong and how will my system deal with that?

The first jarring thing about these sorts of problems is that you’re probably not using a framework. There are libraries to help you with logging, instrumentation, storage, coordination, and low-level primitives. There aren’t a lot of well-curated collections of opinions expressed as code. There’s no Rails, no Play, no Django, no Cocoa. There’s hardly even a Sinatra, a Celery, a JDBC.

That means you’re going to be designing, for real. Drawing on whiteboard, building prototypes and proofs of concepts. Getting feedback from as many smart people as you can corner. Questioning your assumptions, reading everything you can find on the topic. Looking for tradeoffs that decrease risk and simplify your problem domain.

Accept that you’re unlikely to ever get it entirely right. Those who are really good at it are more like Scotty than Spock. They’re surrounded by levers and measurements, tweaking and rebuilding the system as they go. It’s a fun puzzle!