2015
Software design, always on the wrong foot
Software design has probably been broken from the start. The earliest business software, machine language encoded to punch cards, was more about fiddling registers and managing memory locations than doing arithmetic or implementing business logic. Even after you fast forward to Unix and compiled languages, software is still more about managing heap memory and arcane details like file or error pointers than it is about business logic.
Fast forward again to the first web apps and it seems like there’s an opportunity to put business logic in the center and the incidental complexity of the computer on the outsides. Alas, when web apps took off, most of their logic was written in scripting languages which often trade organizing code along boundaries for the thrill of just getting stuff done. Sometimes I don’t think we’ve outgrown that urge.
Software design has always started off on the wrong foot. Maybe we know better now, maybe we’re as lost as ever. Perhaps in the future I will only feign surprise when I come across working software that is not exactly ideal on the inside.
Specific, purposeful emails are great
When I’m emailing with teammates, I try to do them a few favors.
I make my purpose clear, specific, and up front. I often write the whole email, figure out the real purpose, and then move it into the very first sentence and subject line. I’m a little pessimistic, so I figure I’ve got three sentences, tops, to persuade someone to read an email. They are way more likely to retain at least part of my meaning if there are bullet soundbites for those unlikely to read past the first paragraph. When I want to get down to details, it all goes “under the fold” of the soundbites.
If at all possible, I don’t want to generate Yet Another Meeting. I’ve been in too many meetings that could have been an email. Need to update me on a project? Write it out. Have a simple question to ask? Write it out. Have a complex question to ask? Boil it down to three simple ones, write it out. Need to explore an idea? That’s closer to requiring a meeting! Want to talk about something that requires the sophistication of reading faces and vocal inflections? That requires a meeting, go ahead and schedule one!
What I try to avoid, at all costs, is to throw a bunch of random datapoints or ideas together without drawing a conclusion. Some of the most frustrating emails I’ve read ended with “Thoughts?”. If I’m going to email someone, I’m going to ask a specific question or make a specific point. Ending with “thoughts?” leaves it up to the recipient to guess what the sender wants from the them and then respond in kind.
Don’t ramble, don’t use a meeting when an email will suffice, do make conclusions and do ask specific questions. I will send you email hugs to thank you for respecting my time.
Easy steps to programming language commitment
Feel pressured by other developers telling you that your programming language of choice is old, bad, or that you should feel bad? Apply this heuristic:
- Try different programming languages until you find one that best fits your brain and the problems you want to solve
- Use that langauge for everything you can
- When a language comes along that fits your brain or your problems even better, switch to that one, ad infinitum
Don’t let the hype of people with different brains or different problems get you down.
Code needs boundaries, but not too many
Let’s talk about boundaries in programs. I need them, otherwise programs grow increasingly inscrutable and impossible to change. A lack of boundaries is nearly as bad as spaghetti code; i.e. it’s really bad.
But, too many boundaries can also make a program inscrutable. To the absurd, a program composed entirely of black boxes each of a single narrow function and behavior is all indirection. Indirection is a cost I pay when I introduce boundaries, e.g. “Your princess is in another castle.” I want to have just the right number of boundaries; not too few, not too many.
Further, I want to avoid establishing the wrong boundaries if I can. Boundaries are hard to move around; creating them is an implicit act of making some kinds of changes more difficult. Awkward boundaries make it difficult to write correct code; hurried developers will yield to the temptation to circumvent the boundary. If you do manage to identify an awkward boundary and correct it, you’ll have some temporary churn in your program while you rejigger the boundary and the code on both sides of it.
On the other hand, the right boundaries are wonderful. They create leverage for the developers working on both sides of the boundaries. They get more done, only needing to know about the boundary and not what lies on the other side. Establishing good boundaries is the first step towards encapsulation and abstraction.
We used to make boundaries from packages and libraries. Now we have added out-of-process services, message passing, and infrastructure as boundaries. This will probably turn out as a net benefit, but right now we’re chasing novelty, blog posts, and conference talks at the expense of increased complexity. We aren’t really “engineering” our boundaries.
Creating boundaries too eagerly increases the odds of imposing the wrong boundaries and churning on said boundaries. Creating boundaries too lazily imposes a high cost of change to create those boundaries once you discover them and accept the implementation challenge.
My favorite kind of boundary is a bounded context. Its a wonderful epiphany that we don’t all have to agree on the precise definition of words and responsibilities if we can agree where the fences (boundaries) go.
Gary Bernhardt has nice things to say about boundaries. If you like this, you’ll love his ideas.
That's a question
In a technical conversation, I love to hear this: “that’s a good question!” Now we are going to talk about something we might have otherwise missed. Later we will look back at a potential crisis averted.
I groan (inside) when I hear: “that’s an interesting question!” Someone is about to bloviate, philosophize, or otherwise derail the conversation. Later, we will reflect on time poorly spent.
I may have a weird, grumpy relationship with technical conversations.
Life's Easy Mode
This morning I walked a half mile, not too far, to a neighborhood coffee shop. I had two breakfast tacos and a sweet-flavored latte.
I can choose to walk, and take a Sunday morning (really, a whole weekend) to myself because I went to college, fooled around with computers a bunch, and happened upon a time of tremendous income growth for people who fooled around with computers a lot.
On the way, I walked down a well-maintained and safe sidewalk in an neighborhood in the middle of teardowns and gentrification. At one point, a small branch had grown over the sidewalk. Not big enough to walk around entirely, just the right size to push away.
But then, like a miracle, the wind blew just so and pushed the branch out of my way. It was like nature’s automatic sliding door.
Seems that’s a pretty good way of summing up the Easy Mode of Life that is being a professional white guy.
Doubt mongering
Doubt mongering. It’s a thing that happens because egos are fragile. Some doubts I’ve heard or uttered myself in the past month:
- That sounds like building a dependency manager, and look how great those are in JavaScript!
- Swagger is an IDL and I had bad experiences with IDLs when using SOAP and/or Thrift so we probably shouldn’t use Swagger.
- Microservices sound like microkernels, and that never took off.
They’re FUD and they work off cognitive biases. When someone’s trying to vent, angle into a conversation, or show how smart they are, doubt mongering can happen.
Some of us are more prone to doubt mongering than others. I’m probably more prone to it than I realize. Writing this is making me cringe inside a little.
What irks me is that I often have to pause to separate the doubt mongering from the little bit of insight inside of it.
Say we’re talking about Swagger, for example. Most human endeavors are flawed. It’s perfectly legitimate to say “not all uses of IDLs have succeeded” and “let’s learn from past experience”. That’s a useful insight!
But it’s not okay to do so in a way that takes the energy out of the conversation. It’s not okay do so in a way makes someone feel less smart for suggesting something. It’s not okay to derail. Don’t be a gumption trap.
I still have to remind myself to Yes, And conversations that need a historical context. This isn’t a silver bullet and has its own nuances of application, but at least it’s not a Hard No. It preserves the energy and gumption in a group, rather than sapping it.
NASA: robots everywhere! Military: nuke the moon!
NASA (2014 funding: $17 billion) has sent man to the moon and robots all over the solar system. The military (2015 funding: unfathomable) wanted to nuke the moon. Maybe we could throw more cash at NASA and less at the military industrial complex?
What about event sourcing?
I was chatting about Event Sourced data models with a pal last week. He is really taken by the idea and excited that perhaps its a “next big thing” in data modeling. Regretfully, I have an adverse reaction to “next big thing” thinking and pointed out that Event Sourced data models are more complex than the equivalent third-normal form data model. Thus, I said, tooling and education need to set in before Event Sourcing could achieve broad impact.
(Before I proceed, I need to put forth a lament of vocabulary. Events, in this context, are not fine-grained language constructs like in a continuation-passing-style asynchronous system. They are business events, a sale or page impression, or technical events, a request or cache hit. These are not callbacks.)
That said, there’s a few strings to pull from Event Sourcing that seem like possible trends:
- Integration via event logs using something like Kafka. The low hanging fruit is to replace background jobs with messages on a Kafka stream. The next step is to think about messaging as reading from a database’s replication log.
- Intermediate storage of historical event records in Hadoop. Once applications are publishing messages on changes to their data, you can slurp up each topic (one per domain model) into a Hadoop table. Then…
- ETL of event logs in place of some messaging/REST integrations. Instead of querying another system or implementing a topic consumer, periodically query the event data in Hadoop. Transform it if necessary and load it into another application’s database. LinkedIn has extensive tooling for this and it seems like they have done their homework.
- Data and databases modeled around the passage of time. Event Sourcing is sort of like introducing the notion of accounting to database records. We can go a step further and model our data such that we can travel forward or back in time, not just recalculate from the past. Git has a model of time. Datomic is modeled on time.
- Event Sourcing as an extension of third-normal form. We still need normalized data models, and we still need the migration, ORM, and reporting tooling built on top of them. Event Sourcing gives us an additional facet to our data. Now, instead of just having the data model, we have the causality that created it. (If you’re curious, probably the enabling technology for storing all that causality is the diminishing cost of storage, adoption of append-only data structures, and data warehouses.)
- Synchronization streams instead of REST for disconnected clients. When you store the events that brought data to where it is, and you have a total ordering on those events, you can keep disconnected applications up to date by sending them the events they’ve missed. This is way better than clever logic for querying the central database to update state without squashing local state. Hand-wavy analogy: think Git instead of SQLite (both are wonderful software).
In particular, the case for synchronization is when things started clicking for me. Hat tip to David Nolen’s talk on Om Next (start at 17:12) for this. As we continue building native and mobile web apps that are frequently disconnected, we may need an additional tool to augment resource-based workflows. In the same way that perhaps Event Sourcing is something we build as an extension of third-normal form data models, I’ll bet event logs as APIs will pop up more often. But we may see event logs entirely usurping resource workflows. Why implement consuming a log and implementing updates via REST when you could write a log producer and ship new events off to the server?
The developer impedance mismatch I’m finding with message logs is request-reply thinking. There’s a temptation to recreate REST semantics in Kafka topics. If a consumer fails to process a message, does it stop processing entirely, skip the message, discard the message? Does it notify another consumer via a separate topic, or does it phone home to its developers via an error notification? I haven’t found a satisfying answer to this, but I suspect its a matter of time, education, and tooling.
Encapsulation is a tradeoff too
Better understand Encapsulation. I can’t 😍 this article enough:
An individual programmer has fixed limits on how quickly they can build up instructions and later on how quickly they can correct problems. A highly-effective team can support and extend a much larger code base than the sum of its individuals, but eventually the complexity will grow beyond their abilities. There is always some physical maximum after which the work becomes excessively error prone or consistently slower or both. There is no getting around complexity, it is a fundamental limitation on scale.
Useless datapoint: my personal maximum is around three thousand lines of code, or 4–6 weeks of clean-slate effort.
So maybe I need to start encapsulating once I reach that limit?
To get the most out of encapsulation, the contents of the box must do something significantly more than just trivially implement an interface. That is, boxing off something simple is essentially negative, given that the box itself is a bump in complexity. To actually reduce the overall complexity, enough sub-complexity must be hidden away to make the box itself worth the effort.
This has been bugging me for a while. Encapsulation is treated as an unquestionable good by many developers. To question encapsulation is to adopt the opposite, that design isn’t worthwhile.
But it’s a tradeoff! Introducing encapsulation incurs a temporary increase in the net complexity of a system. Over the course of a tactical refactoring of methods and classes, the increased complexity is only observable by one or two developers doing the work.
But, if services are encapsulation (they are!), then rearranging the pieces will leave you paying for the increased complexity for days, weeks, months. Now the encapsulation takes on real costs: the risk of completing it, the burden of explaining to others what you’re doing, etc. That encapsulation better be worth it and not just a hunch!
For example, one could write a new layer on top of a technology like sockets and call it something like ‘connections’, but unless this new layer really encapsulates enough underlying complexity, like implementing a communications protocol and a data transfer format, then it has hurt rather than helped. It is ‘shallow’. What this means is that for any useful encapsulation, it must hide a significant amount of complexity, thus there should be plenty of code and data buried inside of the box that is no longer necessary to know outside of it. It should not leak out any of this knowledge. So a connection that seamlessly synchronizes data between two parties (how? We don’t know) correctly removes a chunk of knowledge out of the upper levels of the system. And it does it in a way that it is clear and easy to triage problems as being ‘in’ or ‘out’ of the box.
My experience is that encapsulation, if it happens at all, starts off shallow. Real encapsulation, where a developer can treat it as a black box, never needing to peak inside to understand the mechanisms or in/out problems, is rare. It takes the best designers of software to achieve it.
We should all be so bold as to attempt building encapsulations of that quality, but not so proud to think that we succeed at it even half the time.
In little programs, encapsulation isn’t really necessary, it might help but there just isn’t enough overall complexity to worry about. Once the system grows however, it approaches the threshold really fast. Fast enough that many software developers ignore it until it is way too late, and then the costs of correcting the code becomes unmanageable.
I feel like prefactoring a program or architecture only increases the complexity growth rate of small systems. A dominant factor in complexity is communication and coordination cost. If you start off with ten classes instead of three, or three services instead of one, you haven’t tripled your complexity, you’ve squared it (or worse).
I’m all for minimal solutions and fighting to keep things small, but not at the cost of incurring large coordination overhead.
To build big systems, you need to build up a huge and extremely complex code base. To keep it manageable you need to keep it heavily organized, but you also need to carve out chunks of it that have been done and dusted for the moment, so that you can focus your current efforts on moving the work forward. There are no short-cuts available in software development that won’t harm a project, just a lot of very careful, dedicated, disciplined work that when done correctly, helps towards continuing the active lifespan of the code.
Emphasis mine. In a successful system, size and complexity are nearly unavoidable. Almost every “best practice” and “leading edge approach” we know of is contextual and expresses trade-offs. Thus I’m left agreeing that the unsatisfying, hand-wavy craft of “careful, dedicated, disciplined work” is the principle most likely to generate code that’s improves (rather than regresses) over its lifetime.
Bridging design and development with data
Programming and designing with Pure UI:
The process involved, among other things, creating a new UI, ditching the dependency on Flash in favor of HTML5 and introducing new functionality…The particular way in which I implemented it led me to some interest insights around the growing convergence of the designer and programmer roles…The fundamental idea I want to discuss is the definition of an application’s UI as a pure function of application state.
This pulls together three threads:
- that design and development are duals in a deep way
- thinking in data structures is useful even if you aren’t using gobs of parenthesis (i.e. Lisp)
- removing resistance to experimenting with software behavior, in this case by describing behavior with data structures instead of conditionals in code, yields good things (see also Bret Victor)
Medium-term bet: Facebook, through tools like React(-Native), continues to push tasks that were previously outside of “text editors”, such as visual design and animations, into things-resembling-code via the function-of-state paradigm that React is sneaking into people’s brains.
(Also, the use of a fixed-width font in the page design there is 💯)
Microservices in context
An interview with John Allspaw, on Etsy infrastructure and operations:
For example, a good friend of mine runs and has run an electronic trading exchange. You could imagine his goals and constraints when designing an electronic trading exchange are very different than, say, Facebook. Facebook might be very different architecturally because they have different constraints than Amazon. And Amazon might be different than even Etsy.When you have a conversation that unnecessarily paints the discussion as, “Are you micro-services or are you a monolith?” then it wipes away all of the context-specificity. Which you actually have no real way of talking inspecifics.
Compared to the previous buzzword, SOA, what does microservices mean? As far I can tell, its two things:
- A Rorschach test. What do you see in this buzzword? What does it say to you?
- A signaling mechanism. I’m most likely to hear about microservices from those trying to distinguish themselves from those other people who write code that doesn’t share their values.
Context-specificity is the important part. I’ve been reading David Byrne’s How Music Works and he spends the first chapter entirely on how the performance venue (a savannah, a noisy club, an austere concert hall) puts its mark on the music that is performed there (percussion oriented, loud and compressed, or quiet and precise).
In architecture, context is also king. Building and deploying services is different at Heroku, Netflix, Facebook, and the place where you work. You can build services of varying size and complexity anywhere on any stack. What the team, culture, and organization prefers is the real determinant.
I find it useful to read about other people’s service architectures to learn what works elsewhere. Even better if they describe the context they built that service architecture in. But it is always foolish cargo-culting to attempt to replicate another team’s architecture without the team and organizational context in which it was born.
When we model
I’ve observed a few levels of modeling (i.e. thinking about a problem and describing it in concepts plus data structures) that software developers do in the wild:
- structural modeling, describe structure of the problem domain and represent that directly in code, probably using the concepts that your ORM or data layer provide
- operational modeling, evolving a structural model to include models of the operations and workflows that interact with the structural models
- deep modeling, evolving an operational model to include language that describes how the model, problem domain, and solution domain interact and describe each other
A structural model is what happens in a “just ship it” culture. If you’re lucky, you might start thinking about an operational model as you convert that just-ship-it app into an ecosystem of services connected by APIs and messaging.
Any of these models could poof into existence at a higher level. That is, a team could pop out an operational or deep model of a system on their first try. This is even more likely if it’s their second or third take on a problem domain.
Some ideas for kinds of even-higher level modeling that high-functioning teams perform: error-case modeling, coordinated system modeling, social modeling, migration modeling.
And, let’s not even speak of metamodeling :P
Word processors, still imitating typewriters
Right after we finish ridding the world of “floppy-disk-to-save” icons, I propose we remove this bit of obtuse skeumorphism from the default view in word processors like Google Docs:
[caption id=“attachment_3570” align=“aligncenter” width=“660”] Who uses this anymore?[/caption]
I vaguely remember using one of these to adjust margins and such on a real typewriter once. Its possible I used one to eek out an extra page in a school report during junior high. Since then? Wasted screen space!
Act like a modern device, word processors. Hide that stuff in a menu somewhere!
"Everybody Wants to Rule the World", too much of its time
I really dislike “Everybody Wants to Rule the World” by Tears for Fears because it’s a perfectly written song that sounds exactly like the year it was recorded, 1985. Five years earlier, it would have sounded mildly seventies-ish and been great. Five years later and it would have had a little more grit and sound very late eighties.
What I’m saying is, if I could un-invent certain musical sounds, the bass on that track would appear on the list.
Raising all boats
It’s easy to complain about PHP. For instance, why didn’t they choose ☃ as their namespace resolution operator?! As a developer with lofty opinions, I’m not a big fan of PHP. To me, it’s an argument against allowing accretion to determine the design of a system. I don’t think it’s controversial to call the PHP language, core library, and ecosystem “inconsistent” and “a matter of curious histories”. A language feature here, a library function there, year over year, and you’ve got a “quaint” design. Yes, those are scare-quotes.
Whenever I feel a big rant about PHP shortcomings approaching, I try remember a few important facets of its success:
- PHP made programming web applications accessible to lot of people for whom writing CGIs with Perl, Python or Java servlets was overwhelming. Myself included!
- You still can’t beat the simplicity of PHP’s deployment model: acquire commodity web hosting, upload source files, and done.
- Due to its accessibility and ease of deployment, a whole new kind of person started building stuff with code. Jason Kottke called part of this Liberal Arts 2.0. Less mathy programming, more craftsy.
Fast forward to today. PHP is still doing fine, though lots of people switched to Ruby or Python many moons ago, depending on personality type. And lots of those have since moved on to other things. The technology hype curve is an overlapping, ongoing thing.
Of those that switched, many ended up with JavaScript, in the guise of browser-side frameworks or server-side Node (and its ilk). I think there’s a huge opportunity here. JS is not without flaws, like PHP. But its sort of backed into really broad reach. Embedded, games, applications, mobile, probably more that I don’t even know about. That could make it compelling for an even less math-y demographic of people building stuff with computers.
And yet, there is no single JS community. There’s browser people, there’s server people. The future may hold mobile, gaming, and device people. That creates dissonance and some uphill battles.
But maybe that’s the really cool part. The JavaScript communities will have to slog uphill a bit to make accessible the previously intimidating domains of mobile apps, games, and embedded software. And that could raise the boat for people who aren’t building web apps but could be building software.
Functions about nothing
The tricky thing about decomposing code into abstractions is you end up with “functions about nothing”. You’ve probably seen on of these: a method or function with really vague names glommed into a utility or enumerations junk drawer. It’s probably innocuous, but as you’re reading code, it takes you out of your flow and forces you to think in the abstract instead of the concrete.
It’s easy to guess how these things happen. Successive refactoring iterations end up pulling business logic into a pile of predicates and side-effects and separate pile of abstractions. We feel pretty good ourselves at the end of the refactoring and write a fancy blog post about it!
The rub is when we come back to read the code later. Its easy to find the abstraction first and get side-tracked by figuring out why it exists, the context in which it was created, and when we might use it again. This is better than predicates and side-effects interwoven. But it’s still a problem.
I don’t have a salve for this. I just wanted to put the phrase “functions about nothing” on the internet. [SLAP BASS OUTRO RIFF PLAYS HERE]