Code that resists
Kellan Elliott-McCrea, on the way towards an understanding of technical debt, catalogs the ways we end up with code that resists our efforts to change it:
Therefore the second common meaning of “technical debt” is the features of the codebase we encounter in our work that make it resist change. Examples of features that can make a codebase resist change include: poor modularization, poor documentation or poor test coverage. Just as easily though an abundance of modularization (and complexity) or an abundance documentation, and tests encoding the now the incorrect old behavior can apply a strong downward pressure on change.
A little discussed and poorly understood design goal for code is disposability. Given change, what design patterns can we follow that allow us to quickly expunge incorrect behavior from our codebase? Interestingly it is a much more tractable metric for measuring as opposed to more popular criteria like “elegance”. (a post for another day)
Put that in your thinker. Does something like Strategy or Adapter let you throw out whole classes when they prove unnecessary? Or is that so only when you luck out and chose the exact right axes of disposability? Does a microservice really let you discard codebases wholesale? Can maps and functions free you from intertwingled state and behavior or does it move the resistance somewhere else?
Grumpy, opinionated answers: possibly! Even more possibly! Meh. Very meh.
Tinkering with Kinto
Here’s a thing I want to experiment with. Short videos talking about what I’m currently tinkering with. Here’s one!
[wpvideo lj3gGXFS]
More notes in the repo, if you want to play along at home. Let me know what you think!
Three part method
I find methods/functions decomposed into three parts really satisfying. Consider a typical xUnit test:
def test_grants_new_role
# setup
user = make_user
new_role = make_new_role
# behavior under test
# assert results
assert_equal [new_role], user.roles
Lately I’ve been structuring Rails controller similarly:
def create
# Extract inputs/parameters from HTTP request
person_params = params.require(:person).permit(:name, :age)
# Invoke behavior encapsulated in a Plain(ish) Ruby object somewhere
user = UserService.create_user(person_params)
# Check the result and make some HTTP output
if user.persisted?
redirect_to user_path(user.id)
@user = user
render :new
Clojure even has the let
form which encourages this style:
; annotated from clj-http
; https://github.com/dakrone/clj-http/blob/master/src/clj_http/util.clj
(defn gzip
"Returns a gzip'd version of the given byte array."
(when b
; set the table
(let [baos (ByteArrayOutputStream.)
gos (GZIPOutputStream. baos)]
; do the work and clean up
(IOUtils/copy (ByteArrayInputStream. b) gos)
(.close gos)
; produce a result
(.toByteArray baos))))
I don’t think there’s anything inherently wrong if a method or function isn’t organized this way. But when I read code structured this way, it feels less like a bunch of random logic and more like a cohesive unit that someone put time into thinking through how someone might try to understand it later. The Rule of Three rules everything around us.
The future of programming is design, teaching, and empathy
The Future Programming Manifesto starts with this header:
Inessential complexity is the root of all evil
OK, I’m on board!
We should measure complexity as the cumulative cognitive effort to learn a technology from novice all the way to expert. One simple surrogate measure is the size of the documentation.
Perhaps we could describe the complexity of a technology in “bookshelves”? For example, in my second internship I met a CleearCase administrator whose office bookcase had one shelf devoted to SunOS, one shelf to Oracle, and the final shelf dedicated to ClearCase itself. How many bookcases for Ruby, Rails, JS, CSS, a database, and all the other stuff you need to know to put a CRUD app in your browser (not even deploy it to the web!)
- Maintaining compatibility increases complexity.
- Technical debt increases complexity.
- Most R&D is incremental: it adds features and tools and layers. Simplification requires that we throw things away.
- Computer Science rejects simplification as a result because it is subjective.
- The Curse of Knowledge: experts are blind to the complexity they have laboriously mastered.
- Rewarding programmers for their ability to handle complexity selects for those who love it.
- Our gold-rush economy encourages greed and haste.
A weird thing about programmer is that those that rant endlessly about someone else’s complexity, layers, and haste are almost completely blind to the complexity, layers, and haste they make in an effort to set the world just so.
We should work for end-users disenfranchised by lack of programming expertise. We should concentrate on their modest but ubiquitous needs rather than the high-end specialized problems addressed by most R&D. We should take inspiration from end-user tools like spreadsheets and HyperCard. We should avoid the trap of designing for ourselves.
What if more of programming was accessible as data manipulation (cf. spreadsheets, data files, JSX templates) instead of as logic and behavior (i.e. almost every programming language)?
We are doing Design: using experience and judgement to make complex tradeoffs in order to satisfy qualitative human needs.
This reminds me of Developer Experience. “Developer experience” is a weird word right now, but it’s becoming table stakes for success. It’s a design discipline. It’s considering the form and function of code. It’s the opposite of attempting to learn C ;)
Long story short: we’re gonna need more empathy, more design skills, and more teaching skills to reach the next level of great programming languages and tools.
One model doesn't fit all
There are two kinds of developers in the world:
- those who realize data models aren’t monolithic and use business boundaries to their advantage
- those struggling with monolithic data models
Versioning an API is a river delta of pain
Slight rant: versioning a (REST) API inflicts upon you a confluence of factors that will lead to pain no matter what you do.
You’re going to need to version things, which opens you to bikeshedding which True Scotsman approach to REST versioning you’ll use. Once you’ve expended tons of effort on how clients should specify which version they want (i.e. once you’ve just barely started), now you need to figure out how to make that work in your code. Which, after you’re done parsing the HTTP request (the easy part!), is almost certainly going to lead to some unruly layer(s) of indirection. At which point you’re going to hate life and never want to introduce another version ever again. And you won’t even be close to finished.
I hope that, between JSON API and GraphQL, letting the client specify what they want ends up proving way better than relying on the server to carefully (or possibly carelessly) hand craft just the right data for the client.
Rails doctrine and Kremlinology
Long story short, Rails now has a nicely written Doctrine that delineates the principles that motivate tradeoffs the framework makes. Any Rails developer can benefit from understanding this and coding with the framework as much as possible.
Short story long: I’ve been trying to mentally track this for a while via something I call DHHology. It’s where I follow @dhh and try to piece togther blog posts, tweets, code snippets haphazardly shared, and keynotes to build a mental model for what working with the grain of Rails looks like when you get past a small app.
It’s a lot like Kremlinology, “the study and analysis of the politics and policies of Russia”.
During the Cold War, lack of reliable information about the country forced Western analysts to "read between the lines" and to use the tiniest tidbits, such as the removal of portraits, the rearranging of chairs, positions at the reviewing stand for parades in Red Square, the choice of capital or small initial letters in phrases such as "First Secretary", the arrangement of articles on the pages of the party newspaper Pravda and other indirect signs to try to understand what was happening in internal Soviet politics.
It’s a bit of a stretch, but I think the same kind of “between the lines” thinking helps to understand Rails. Of course, sometimes I’m wrong, but so were the Kremlinologists. On the other hand, there was that one time I got DHH and Gary Bernhardt to kind of agree on Twitter, so that’s nice!
Software design, always on the wrong foot
Software design has probably been broken from the start. The earliest business software, machine language encoded to punch cards, was more about fiddling registers and managing memory locations than doing arithmetic or implementing business logic. Even after you fast forward to Unix and compiled languages, software is still more about managing heap memory and arcane details like file or error pointers than it is about business logic.
Fast forward again to the first web apps and it seems like there’s an opportunity to put business logic in the center and the incidental complexity of the computer on the outsides. Alas, when web apps took off, most of their logic was written in scripting languages which often trade organizing code along boundaries for the thrill of just getting stuff done. Sometimes I don’t think we’ve outgrown that urge.
Software design has always started off on the wrong foot. Maybe we know better now, maybe we’re as lost as ever. Perhaps in the future I will only feign surprise when I come across working software that is not exactly ideal on the inside.
Specific, purposeful emails are great
When I’m emailing with teammates, I try to do them a few favors.
I make my purpose clear, specific, and up front. I often write the whole email, figure out the real purpose, and then move it into the very first sentence and subject line. I’m a little pessimistic, so I figure I’ve got three sentences, tops, to persuade someone to read an email. They are way more likely to retain at least part of my meaning if there are bullet soundbites for those unlikely to read past the first paragraph. When I want to get down to details, it all goes “under the fold” of the soundbites.
If at all possible, I don’t want to generate Yet Another Meeting. I’ve been in too many meetings that could have been an email. Need to update me on a project? Write it out. Have a simple question to ask? Write it out. Have a complex question to ask? Boil it down to three simple ones, write it out. Need to explore an idea? That’s closer to requiring a meeting! Want to talk about something that requires the sophistication of reading faces and vocal inflections? That requires a meeting, go ahead and schedule one!
What I try to avoid, at all costs, is to throw a bunch of random datapoints or ideas together without drawing a conclusion. Some of the most frustrating emails I’ve read ended with “Thoughts?”. If I’m going to email someone, I’m going to ask a specific question or make a specific point. Ending with “thoughts?” leaves it up to the recipient to guess what the sender wants from the them and then respond in kind.
Don’t ramble, don’t use a meeting when an email will suffice, do make conclusions and do ask specific questions. I will send you email hugs to thank you for respecting my time.
Easy steps to programming language commitment
Feel pressured by other developers telling you that your programming language of choice is old, bad, or that you should feel bad? Apply this heuristic:
- Try different programming languages until you find one that best fits your brain and the problems you want to solve
- Use that langauge for everything you can
- When a language comes along that fits your brain or your problems even better, switch to that one, ad infinitum
Don’t let the hype of people with different brains or different problems get you down.
Code needs boundaries, but not too many
Let’s talk about boundaries in programs. I need them, otherwise programs grow increasingly inscrutable and impossible to change. A lack of boundaries is nearly as bad as spaghetti code; i.e. it’s really bad.
But, too many boundaries can also make a program inscrutable. To the absurd, a program composed entirely of black boxes each of a single narrow function and behavior is all indirection. Indirection is a cost I pay when I introduce boundaries, e.g. “Your princess is in another castle.” I want to have just the right number of boundaries; not too few, not too many.
Further, I want to avoid establishing the wrong boundaries if I can. Boundaries are hard to move around; creating them is an implicit act of making some kinds of changes more difficult. Awkward boundaries make it difficult to write correct code; hurried developers will yield to the temptation to circumvent the boundary. If you do manage to identify an awkward boundary and correct it, you’ll have some temporary churn in your program while you rejigger the boundary and the code on both sides of it.
On the other hand, the right boundaries are wonderful. They create leverage for the developers working on both sides of the boundaries. They get more done, only needing to know about the boundary and not what lies on the other side. Establishing good boundaries is the first step towards encapsulation and abstraction.
We used to make boundaries from packages and libraries. Now we have added out-of-process services, message passing, and infrastructure as boundaries. This will probably turn out as a net benefit, but right now we’re chasing novelty, blog posts, and conference talks at the expense of increased complexity. We aren’t really “engineering” our boundaries.
Creating boundaries too eagerly increases the odds of imposing the wrong boundaries and churning on said boundaries. Creating boundaries too lazily imposes a high cost of change to create those boundaries once you discover them and accept the implementation challenge.
My favorite kind of boundary is a bounded context. Its a wonderful epiphany that we don’t all have to agree on the precise definition of words and responsibilities if we can agree where the fences (boundaries) go.
Gary Bernhardt has nice things to say about boundaries. If you like this, you’ll love his ideas.
That's a question
In a technical conversation, I love to hear this: “that’s a good question!” Now we are going to talk about something we might have otherwise missed. Later we will look back at a potential crisis averted.
I groan (inside) when I hear: “that’s an interesting question!” Someone is about to bloviate, philosophize, or otherwise derail the conversation. Later, we will reflect on time poorly spent.
I may have a weird, grumpy relationship with technical conversations.
Life's Easy Mode
This morning I walked a half mile, not too far, to a neighborhood coffee shop. I had two breakfast tacos and a sweet-flavored latte.
I can choose to walk, and take a Sunday morning (really, a whole weekend) to myself because I went to college, fooled around with computers a bunch, and happened upon a time of tremendous income growth for people who fooled around with computers a lot.
On the way, I walked down a well-maintained and safe sidewalk in an neighborhood in the middle of teardowns and gentrification. At one point, a small branch had grown over the sidewalk. Not big enough to walk around entirely, just the right size to push away.
But then, like a miracle, the wind blew just so and pushed the branch out of my way. It was like nature’s automatic sliding door.
Seems that’s a pretty good way of summing up the Easy Mode of Life that is being a professional white guy.
Doubt mongering
Doubt mongering. It’s a thing that happens because egos are fragile. Some doubts I’ve heard or uttered myself in the past month:
- That sounds like building a dependency manager, and look how great those are in JavaScript!
- Swagger is an IDL and I had bad experiences with IDLs when using SOAP and/or Thrift so we probably shouldn’t use Swagger.
- Microservices sound like microkernels, and that never took off.
They’re FUD and they work off cognitive biases. When someone’s trying to vent, angle into a conversation, or show how smart they are, doubt mongering can happen.
Some of us are more prone to doubt mongering than others. I’m probably more prone to it than I realize. Writing this is making me cringe inside a little.
What irks me is that I often have to pause to separate the doubt mongering from the little bit of insight inside of it.
Say we’re talking about Swagger, for example. Most human endeavors are flawed. It’s perfectly legitimate to say “not all uses of IDLs have succeeded” and “let’s learn from past experience”. That’s a useful insight!
But it’s not okay to do so in a way that takes the energy out of the conversation. It’s not okay do so in a way makes someone feel less smart for suggesting something. It’s not okay to derail. Don’t be a gumption trap.
I still have to remind myself to Yes, And conversations that need a historical context. This isn’t a silver bullet and has its own nuances of application, but at least it’s not a Hard No. It preserves the energy and gumption in a group, rather than sapping it.
NASA: robots everywhere! Military: nuke the moon!
NASA (2014 funding: $17 billion) has sent man to the moon and robots all over the solar system. The military (2015 funding: unfathomable) wanted to nuke the moon. Maybe we could throw more cash at NASA and less at the military industrial complex?
What about event sourcing?
I was chatting about Event Sourced data models with a pal last week. He is really taken by the idea and excited that perhaps its a “next big thing” in data modeling. Regretfully, I have an adverse reaction to “next big thing” thinking and pointed out that Event Sourced data models are more complex than the equivalent third-normal form data model. Thus, I said, tooling and education need to set in before Event Sourcing could achieve broad impact.
(Before I proceed, I need to put forth a lament of vocabulary. Events, in this context, are not fine-grained language constructs like in a continuation-passing-style asynchronous system. They are business events, a sale or page impression, or technical events, a request or cache hit. These are not callbacks.)
That said, there’s a few strings to pull from Event Sourcing that seem like possible trends:
- Integration via event logs using something like Kafka. The low hanging fruit is to replace background jobs with messages on a Kafka stream. The next step is to think about messaging as reading from a database’s replication log.
- Intermediate storage of historical event records in Hadoop. Once applications are publishing messages on changes to their data, you can slurp up each topic (one per domain model) into a Hadoop table. Then…
- ETL of event logs in place of some messaging/REST integrations. Instead of querying another system or implementing a topic consumer, periodically query the event data in Hadoop. Transform it if necessary and load it into another application’s database. LinkedIn has extensive tooling for this and it seems like they have done their homework.
- Data and databases modeled around the passage of time. Event Sourcing is sort of like introducing the notion of accounting to database records. We can go a step further and model our data such that we can travel forward or back in time, not just recalculate from the past. Git has a model of time. Datomic is modeled on time.
- Event Sourcing as an extension of third-normal form. We still need normalized data models, and we still need the migration, ORM, and reporting tooling built on top of them. Event Sourcing gives us an additional facet to our data. Now, instead of just having the data model, we have the causality that created it. (If you’re curious, probably the enabling technology for storing all that causality is the diminishing cost of storage, adoption of append-only data structures, and data warehouses.)
- Synchronization streams instead of REST for disconnected clients. When you store the events that brought data to where it is, and you have a total ordering on those events, you can keep disconnected applications up to date by sending them the events they’ve missed. This is way better than clever logic for querying the central database to update state without squashing local state. Hand-wavy analogy: think Git instead of SQLite (both are wonderful software).
In particular, the case for synchronization is when things started clicking for me. Hat tip to David Nolen’s talk on Om Next (start at 17:12) for this. As we continue building native and mobile web apps that are frequently disconnected, we may need an additional tool to augment resource-based workflows. In the same way that perhaps Event Sourcing is something we build as an extension of third-normal form data models, I’ll bet event logs as APIs will pop up more often. But we may see event logs entirely usurping resource workflows. Why implement consuming a log and implementing updates via REST when you could write a log producer and ship new events off to the server?
The developer impedance mismatch I’m finding with message logs is request-reply thinking. There’s a temptation to recreate REST semantics in Kafka topics. If a consumer fails to process a message, does it stop processing entirely, skip the message, discard the message? Does it notify another consumer via a separate topic, or does it phone home to its developers via an error notification? I haven’t found a satisfying answer to this, but I suspect its a matter of time, education, and tooling.
Encapsulation is a tradeoff too
Better understand Encapsulation. I can’t 😍 this article enough:
An individual programmer has fixed limits on how quickly they can build up instructions and later on how quickly they can correct problems. A highly-effective team can support and extend a much larger code base than the sum of its individuals, but eventually the complexity will grow beyond their abilities. There is always some physical maximum after which the work becomes excessively error prone or consistently slower or both. There is no getting around complexity, it is a fundamental limitation on scale.
Useless datapoint: my personal maximum is around three thousand lines of code, or 4–6 weeks of clean-slate effort.
So maybe I need to start encapsulating once I reach that limit?
To get the most out of encapsulation, the contents of the box must do something significantly more than just trivially implement an interface. That is, boxing off something simple is essentially negative, given that the box itself is a bump in complexity. To actually reduce the overall complexity, enough sub-complexity must be hidden away to make the box itself worth the effort.
This has been bugging me for a while. Encapsulation is treated as an unquestionable good by many developers. To question encapsulation is to adopt the opposite, that design isn’t worthwhile.
But it’s a tradeoff! Introducing encapsulation incurs a temporary increase in the net complexity of a system. Over the course of a tactical refactoring of methods and classes, the increased complexity is only observable by one or two developers doing the work.
But, if services are encapsulation (they are!), then rearranging the pieces will leave you paying for the increased complexity for days, weeks, months. Now the encapsulation takes on real costs: the risk of completing it, the burden of explaining to others what you’re doing, etc. That encapsulation better be worth it and not just a hunch!
For example, one could write a new layer on top of a technology like sockets and call it something like ‘connections’, but unless this new layer really encapsulates enough underlying complexity, like implementing a communications protocol and a data transfer format, then it has hurt rather than helped. It is ‘shallow’. What this means is that for any useful encapsulation, it must hide a significant amount of complexity, thus there should be plenty of code and data buried inside of the box that is no longer necessary to know outside of it. It should not leak out any of this knowledge. So a connection that seamlessly synchronizes data between two parties (how? We don’t know) correctly removes a chunk of knowledge out of the upper levels of the system. And it does it in a way that it is clear and easy to triage problems as being ‘in’ or ‘out’ of the box.
My experience is that encapsulation, if it happens at all, starts off shallow. Real encapsulation, where a developer can treat it as a black box, never needing to peak inside to understand the mechanisms or in/out problems, is rare. It takes the best designers of software to achieve it.
We should all be so bold as to attempt building encapsulations of that quality, but not so proud to think that we succeed at it even half the time.
In little programs, encapsulation isn’t really necessary, it might help but there just isn’t enough overall complexity to worry about. Once the system grows however, it approaches the threshold really fast. Fast enough that many software developers ignore it until it is way too late, and then the costs of correcting the code becomes unmanageable.
I feel like prefactoring a program or architecture only increases the complexity growth rate of small systems. A dominant factor in complexity is communication and coordination cost. If you start off with ten classes instead of three, or three services instead of one, you haven’t tripled your complexity, you’ve squared it (or worse).
I’m all for minimal solutions and fighting to keep things small, but not at the cost of incurring large coordination overhead.
To build big systems, you need to build up a huge and extremely complex code base. To keep it manageable you need to keep it heavily organized, but you also need to carve out chunks of it that have been done and dusted for the moment, so that you can focus your current efforts on moving the work forward. There are no short-cuts available in software development that won’t harm a project, just a lot of very careful, dedicated, disciplined work that when done correctly, helps towards continuing the active lifespan of the code.
Emphasis mine. In a successful system, size and complexity are nearly unavoidable. Almost every “best practice” and “leading edge approach” we know of is contextual and expresses trade-offs. Thus I’m left agreeing that the unsatisfying, hand-wavy craft of “careful, dedicated, disciplined work” is the principle most likely to generate code that’s improves (rather than regresses) over its lifetime.
Bridging design and development with data
Programming and designing with Pure UI:
The process involved, among other things, creating a new UI, ditching the dependency on Flash in favor of HTML5 and introducing new functionality…The particular way in which I implemented it led me to some interest insights around the growing convergence of the designer and programmer roles…The fundamental idea I want to discuss is the definition of an application’s UI as a pure function of application state.
This pulls together three threads:
- that design and development are duals in a deep way
- thinking in data structures is useful even if you aren’t using gobs of parenthesis (i.e. Lisp)
- removing resistance to experimenting with software behavior, in this case by describing behavior with data structures instead of conditionals in code, yields good things (see also Bret Victor)
Medium-term bet: Facebook, through tools like React(-Native), continues to push tasks that were previously outside of “text editors”, such as visual design and animations, into things-resembling-code via the function-of-state paradigm that React is sneaking into people’s brains.
(Also, the use of a fixed-width font in the page design there is 💯)
Microservices in context
An interview with John Allspaw, on Etsy infrastructure and operations:
For example, a good friend of mine runs and has run an electronic trading exchange. You could imagine his goals and constraints when designing an electronic trading exchange are very different than, say, Facebook. Facebook might be very different architecturally because they have different constraints than Amazon. And Amazon might be different than even Etsy.When you have a conversation that unnecessarily paints the discussion as, “Are you micro-services or are you a monolith?” then it wipes away all of the context-specificity. Which you actually have no real way of talking inspecifics.
Compared to the previous buzzword, SOA, what does microservices mean? As far I can tell, its two things:
- A Rorschach test. What do you see in this buzzword? What does it say to you?
- A signaling mechanism. I’m most likely to hear about microservices from those trying to distinguish themselves from those other people who write code that doesn’t share their values.
Context-specificity is the important part. I’ve been reading David Byrne’s How Music Works and he spends the first chapter entirely on how the performance venue (a savannah, a noisy club, an austere concert hall) puts its mark on the music that is performed there (percussion oriented, loud and compressed, or quiet and precise).
In architecture, context is also king. Building and deploying services is different at Heroku, Netflix, Facebook, and the place where you work. You can build services of varying size and complexity anywhere on any stack. What the team, culture, and organization prefers is the real determinant.
I find it useful to read about other people’s service architectures to learn what works elsewhere. Even better if they describe the context they built that service architecture in. But it is always foolish cargo-culting to attempt to replicate another team’s architecture without the team and organizational context in which it was born.
When we model
I’ve observed a few levels of modeling (i.e. thinking about a problem and describing it in concepts plus data structures) that software developers do in the wild:
- structural modeling, describe structure of the problem domain and represent that directly in code, probably using the concepts that your ORM or data layer provide
- operational modeling, evolving a structural model to include models of the operations and workflows that interact with the structural models
- deep modeling, evolving an operational model to include language that describes how the model, problem domain, and solution domain interact and describe each other
A structural model is what happens in a “just ship it” culture. If you’re lucky, you might start thinking about an operational model as you convert that just-ship-it app into an ecosystem of services connected by APIs and messaging.
Any of these models could poof into existence at a higher level. That is, a team could pop out an operational or deep model of a system on their first try. This is even more likely if it’s their second or third take on a problem domain.
Some ideas for kinds of even-higher level modeling that high-functioning teams perform: error-case modeling, coordinated system modeling, social modeling, migration modeling.
And, let’s not even speak of metamodeling :P