That’s a question

In a technical conversation, I love to hear this: “that’s a good question!” Now we are going to talk about something we might have otherwise missed. Later we will look back at a potential crisis averted.

I groan (inside) when I hear: “that’s an interesting question!” Someone is about to bloviate, philosophize, or otherwise derail the conversation. Later, we will reflect on time poorly spent.

I may have a weird, grumpy relationship with technical conversations.

Life’s Easy Mode

This morning I walked a half mile, not too far, to a neighborhood coffee shop. I had two breakfast tacos and a sweet-flavored latte.

I can choose to walk, and take a Sunday morning (really, a whole weekend) to myself because I went to college, fooled around with computers a bunch, and happened upon a time of tremendous income growth for people who fooled around with computers a lot.

On the way, I walked down a well-maintained and safe sidewalk in an neighborhood in the middle of teardowns and gentrification. At one point, a small branch had grown over the sidewalk. Not big enough to walk around entirely, just the right size to push away.

But then, like a miracle, the wind blew just so and pushed the branch out of my way. It was like nature’s automatic sliding door.

Seems that’s a pretty good way of summing up the Easy Mode of Life that is being a professional white guy.

Doubt mongering

Doubt mongering. It’s a thing that happens because egos are fragile. Some doubts I’ve heard or uttered myself in the past month:

  • That sounds like building a dependency manager, and look how great those are in JavaScript!
  • Swagger is an IDL and I had bad experiences with IDLs when using SOAP and/or Thrift so we probably shouldn’t use Swagger.
  • Microservices sound like microkernels, and that never took off.

They’re FUD and they work off cognitive biases. When someone’s trying to vent, angle into a conversation, or show how smart they are, doubt mongering can happen.

Some of us are more prone to doubt mongering than others. I’m probably more prone to it than I realize. Writing this is making me cringe inside a little.

What irks me is that I often have to pause to separate the doubt mongering from the little bit of insight inside of it.

Say we’re talking about Swagger, for example. Most human endeavors are flawed. It’s perfectly legitimate to say “not all uses of IDLs have succeeded” and “let’s learn from past experience”. That’s a useful insight!

But it’s not okay to do so in a way that takes the energy out of the conversation. It’s not okay do so in a way makes someone feel less smart for suggesting something. It’s not okay to derail. Don’t be a gumption trap.

I still have to remind myself to Yes, And conversations that need a historical context. This isn’t a silver bullet and has its own nuances of application, but at least it’s not a Hard No. It preserves the energy and gumption in a group, rather than sapping it.

What about event sourcing?

I was chatting about Event Sourced data models with a pal last week. He is really taken by the idea and excited that perhaps its a “next big thing” in data modeling. Regretfully, I have an adverse reaction to “next big thing” thinking and pointed out that Event Sourced data models are more complex than the equivalent third-normal form data model. Thus, I said, tooling and education need to set in before Event Sourcing could achieve broad impact.

(Before I proceed, I need to put forth a lament of vocabulary. Events, in this context, are not fine-grained language constructs like in a continuation-passing-style asynchronous system. They are business events, a sale or page impression, or technical events, a request or cache hit. These are not callbacks.)

That said, there’s a few strings to pull from Event Sourcing that seem like possible trends:

  • Integration via event logs using something like Kafka. The low hanging fruit is to replace background jobs with messages on a Kafka stream. The next step is to think about messaging as reading from a database’s replication log.
  • Intermediate storage of historical event records in Hadoop. Once applications are publishing messages on changes to their data, you can slurp up each topic (one per domain model) into a Hadoop table. Then…
  • ETL of event logs in place of some messaging/REST integrations. Instead of querying another system or implementing a topic consumer, periodically query the event data in Hadoop. Transform it if necessary and load it into another application’s database. LinkedIn has extensive tooling for this and it seems like they have done their homework.
  • Data and databases modeled around the passage of time. Event Sourcing is sort of like introducing the notion of accounting to database records. We can go a step further and model our data such that we can travel forward or back in time, not just recalculate from the past. Git has a model of time. Datomic is modeled on time.
  • Event Sourcing as an extension of third-normal form. We still need normalized data models, and we still need the migration, ORM, and reporting tooling built on top of them. Event Sourcing gives us an additional facet to our data. Now, instead of just having the data model, we have the causality that created it. (If you’re curious, probably the enabling technology for storing all that causality is the diminishing cost of storage, adoption of append-only data structures, and data warehouses.)
  • Synchronization streams instead of REST for disconnected clients. When you store the events that brought data to where it is, and you have a total ordering on those events, you can keep disconnected applications up to date by sending them the events they’ve missed. This is way better than clever logic for querying the central database to update state without squashing local state. Hand-wavy analogy: think Git instead of SQLite (both are wonderful software).

In particular, the case for synchronization is when things started clicking for me. Hat tip to David Nolen’s talk on Om Next (start at 17:12) for this. As we continue building native and mobile web apps that are frequently disconnected, we may need an additional tool to augment resource-based workflows. In the same way that perhaps Event Sourcing is something we build as an extension of third-normal form data models, I’ll bet event logs as APIs will pop up more often. But we may see event logs entirely usurping resource workflows. Why implement consuming a log and implementing updates via REST when you could write a log producer and ship new events off to the server?

The developer impedance mismatch I’m finding with message logs is request-reply thinking. There’s a temptation to recreate REST semantics in Kafka topics. If a consumer fails to process a message, does it stop processing entirely, skip the message, discard the message? Does it notify another consumer via a separate topic, or does it phone home to its developers via an error notification? I haven’t found a satisfying answer to this, but I suspect its a matter of time, education, and tooling.

Encapsulation is a tradeoff too

Better understand Encapsulation. I can’t 😍 this article enough:

An individual programmer has fixed limits on how quickly they can build up instructions and later on how quickly they can correct problems. A highly-effective team can support and extend a much larger code base than the sum of its individuals, but eventually the complexity will grow beyond their abilities. There is always some physical maximum after which the work becomes excessively error prone or consistently slower or both. There is no getting around complexity, it is a fundamental limitation on scale.

Useless datapoint: my personal maximum is around three thousand lines of code, or 4–6 weeks of clean-slate effort.

So maybe I need to start encapsulating once I reach that limit?

To get the most out of encapsulation, the contents of the box must do something significantly more than just trivially implement an interface. That is, boxing off something simple is essentially negative, given that the box itself is a bump in complexity. To actually reduce the overall complexity, enough sub-complexity must be hidden away to make the box itself worth the effort.

This has been bugging me for a while. Encapsulation is treated as an unquestionable good by many developers. To question encapsulation is to adopt the opposite, that design isn’t worthwhile.

But it’s a tradeoff! Introducing encapsulation incurs a temporary increase in the net complexity of a system. Over the course of a tactical refactoring of methods and classes, the increased complexity is only observable by one or two developers doing the work.

But, if services are encapsulation (they are!), then rearranging the pieces will leave you paying for the increased complexity for days, weeks, months. Now the encapsulation takes on real costs: the risk of completing it, the burden of explaining to others what you’re doing, etc. That encapsulation better be worth it and not just a hunch!

For example, one could write a new layer on top of a technology like sockets and call it something like ‘connections’, but unless this new layer really encapsulates enough underlying complexity, like implementing a communications protocol and a data transfer format, then it has hurt rather than helped. It is ‘shallow’. What this means is that for any useful encapsulation, it must hide a significant amount of complexity, thus there should be plenty of code and data buried inside of the box that is no longer necessary to know outside of it. It should not leak out any of this knowledge. So a connection that seamlessly synchronizes data between two parties (how? We don’t know) correctly removes a chunk of knowledge out of the upper levels of the system. And it does it in a way that it is clear and easy to triage problems as being ‘in’ or ‘out’ of the box.

My experience is that encapsulation, if it happens at all, starts off shallow. Real encapsulation, where a developer can treat it as a black box, never needing to peak inside to understand the mechanisms or in/out problems, is rare. It takes the best designers of software to achieve it.

We should all be so bold as to attempt building encapsulations of that quality, but not so proud to think that we succeed at it even half the time.

In little programs, encapsulation isn’t really necessary, it might help but there just isn’t enough overall complexity to worry about. Once the system grows however, it approaches the threshold really fast. Fast enough that many software developers ignore it until it is way too late, and then the costs of correcting the code becomes unmanageable.

I feel like prefactoring a program or architecture only increases the complexity growth rate of small systems. A dominant factor in complexity is communication and coordination cost. If you start off with ten classes instead of three, or three services instead of one, you haven’t tripled your complexity, you’ve squared it (or worse).

I’m all for minimal solutions and fighting to keep things small, but not at the cost of incurring large coordination overhead.

To build big systems, you need to build up a huge and extremely complex code base. To keep it manageable you need to keep it heavily organized, but you also need to carve out chunks of it that have been done and dusted for the moment, so that you can focus your current efforts on moving the work forward. There are no short-cuts available in software development that won’t harm a project, just a lot of very careful, dedicated, disciplined work that when done correctly, helps towards continuing the active lifespan of the code.

Emphasis mine. In a successful system, size and complexity are nearly unavoidable. Almost every “best practice” and “leading edge approach” we know of is contextual and expresses trade-offs. Thus I’m left agreeing that the unsatisfying, hand-wavy craft of “careful, dedicated, disciplined work” is the principle most likely to generate code that’s improves (rather than regresses) over its lifetime.

Bridging design and development with data

Programming and designing with Pure UI:

The process involved, among other things, creating a new UI, ditching the dependency on Flash in favor of HTML5 and introducing new functionality…The particular way in which I implemented it led me to some interest insights around the growing convergence of the designer and programmer roles…The fundamental idea I want to discuss is the definition of an application’s UI as a pure function of application state.

This pulls together three threads:

  • that design and development are duals in a deep way
  • thinking in data structures is useful even if you aren’t using gobs of parenthesis (i.e. Lisp)
  • removing resistance to experimenting with software behavior, in this case by describing behavior with data structures instead of conditionals in code, yields good things (see also Bret Victor)

Medium-term bet: Facebook, through tools like React(-Native), continues to push tasks that were previously outside of “text editors”, such as visual design and animations, into things-resembling-code via the function-of-state paradigm that React is sneaking into people’s brains.

(Also, the use of a fixed-width font in the page design there is 💯)

Microservices in context

An interview with John Allspaw, on Etsy infrastructure and operations:

For example, a good friend of mine runs and has run an electronic trading exchange. You could imagine his goals and constraints when designing an electronic trading exchange are very different than, say, Facebook. Facebook might be very different architecturally because they have different constraints than Amazon. And Amazon might be different than even Etsy.

When you have a conversation that unnecessarily paints the discussion as, “Are you micro-services or are you a monolith?” then it wipes away all of the context-specificity. Which you actually have no real way of talking inspecifics.

Compared to the previous buzzword, SOA, what does microservices mean? As far I can tell, its two things:

  • A Rorschach test. What do you see in this buzzword? What does it say to you?
  • A signaling mechanism. I’m most likely to hear about microservices from those trying to distinguish themselves from those other people who write code that doesn’t share their values.

Context-specificity is the important part. I’ve been reading David Byrne’s How Music Works and he spends the first chapter entirely on how the performance venue (a savannah, a noisy club, an austere concert hall) puts its mark on the music that is performed there (percussion oriented, loud and compressed, or quiet and precise).

In architecture, context is also king. Building and deploying services is different at Heroku, Netflix, Facebook, and the place where you work. You can build services of varying size and complexity anywhere on any stack. What the team, culture, and organization prefers is the real determinant.

I find it useful to read about other people’s service architectures to learn what works elsewhere. Even better if they describe the context they built that service architecture in. But it is always foolish cargo-culting to attempt to replicate another team’s architecture without the team and organizational context in which it was born.

When we model

I’ve observed a few levels of modeling (i.e. thinking about a problem and describing it in concepts plus data structures) that software developers do in the wild:

  • structural modeling, describe structure of the problem domain and represent that directly in code, probably using the concepts that your ORM or data layer provide
  • operational modeling, evolving a structural model to include models of the operations and workflows that interact with the structural models
  • deep modeling, evolving an operational model to include language that describes how the model, problem domain, and solution domain interact and describe each other

A structural model is what happens in a “just ship it” culture. If you’re lucky, you might start thinking about an operational model as you convert that just-ship-it app into an ecosystem of services connected by APIs and messaging.

Any of these models could poof into existence at a higher level. That is, a team could pop out an operational or deep model of a system on their first try. This is even more likely if it’s their second or third take on a problem domain.

Some ideas for kinds of even-higher level modeling that high-functioning teams perform: error-case modeling, coordinated system modeling, social modeling, migration modeling.

And, let’s not even speak of metamodeling :P

Word processors, still imitating typewriters

Right after we finish ridding the world of “floppy-disk-to-save” icons, I propose we remove this bit of obtuse skeumorphism from the default view in word processors like Google Docs:

the margin setting thingy on a typewriter
Who uses this anymore?

I vaguely remember using one of these to adjust margins and such on a real typewriter once. Its possible I used one to eek out an extra page in a school report during junior high. Since then? Wasted screen space!

Act like a modern device, word processors. Hide that stuff in a menu somewhere!