Categorizing and understanding magical code

Sometimes, programmers like to disparage “magical code”. They say magical code is causing their bugs, magical code is offensive to use, magical code is harder to understand, we should try to write “less magical” code.

“Wait, what’s magic?”, I hear you say. That’s what I’m here to talk about! (Warning: this post contains an above-average density of “air quotes”, ask your doctor if your heart is strong enough for “humorous quoting”.)

Magic is code I have yet to understand

It’s not inscrutable code. It’s not bad code. It doesn’t intentionally defy understanding, like an obfuscated code contest or code golfing.

I can start to understand why a big of code is frustratingly magical to me by categorizing it. (Hi, I’m Adam, I love categorizing things, it’s awful.)

“Mathemagical” code escapes my understanding due to its foundation in math and my lack of understanding therein. I recently read Purely Functional Data Structures, which is a great book, but the parts on proving e.g. worst-case cost for amortized operations on data structures are completely beyond my patience or confidence in math. Once Greek symbols enter the text, my brain kinda “nope!”s out.

“Metamagic” is hard to understand due to use of metaprogramming. Code that generates code inside code is a) really cool and b) a bit of a mind exploder at first. When it works, its glorious and not “magical”. When it falls short, it’s a mess of violated expectations and complaints about magic. PSA: don’t metaprogram when you can program.

“Sleight of hand” makes it harder for me to understand code because I don’t know where the control flow or logic goes. Combining inheritance and mixins when using Ruby is a good example of control flow sleight-of-hand. If a class extends Foo, includes Bar, and all three define a method do_the_thing, which one gets called (trick question: all of them, trick follow-up question: in what order!)? The Rails router is a good example of logical sleight-of-hand. If I’m wondering how root to: "some_controller/index" works and I have only the Rails sources on me, where would I start looking to find that logic? For the first few years of Rails, I’d dig around in various files before I found the trail to that answer.

“Multi-level magic schemes” is my new tongue-in-cheek way to explain a tool like tmux. It’s a wonderful tool for those of us who prefer to work in (several) shells all day. I’m terrified of when things go wrong with it, though. To multiplex several shells into one process while persisting that state across user sessions requires tmux to operate at the intersection of Unix shells, process trees, and redrawing interfaces to a terminal emulator. I understand the first two in isolation, but when you put it all together, my brain again “nope!”s out of trying to solve any problems that arise. Other multi-level magic schemes include object-relational mappers, game engines, operating system containers, and datacenter networking.

I can understand magic and so can you!

I’m writing this because I often see ineffective reactions to “magical” code. Namely, 1) identify code that is frustrating, 2) complain on Twitter or Slack, 3) there is no step 3. Getting frustrated is okay and normal! Contributing only negative energy to the situation is not.

Instead, once I find a thing frustrating, I try to step back and figure out what’s going on. How does this bit of code or tool work? Am I doing something that it recommends against or doesn’t expect? Can I get back on the “golden path” the code is built for? Can I find the code and understand what’s going on by reading it? Often some combination of these side quests puts me back on my way an out of frustration’s way.

Other times, I don’t have time for a side quest of understanding. If that’s the case, I make a mental note that “here be dragons” and try to work around it until I’m done with my main quest. Next time I come across that mental map and remember “oh, there were dragons here!”, I try to understand the situation a little better.

For example, I have a “barely tolerating” relationship with webpack. I’m glad it exists, it mostly works well, but I feel its human factors leave a lot to be desired. It took a few dives into how it works and how to configure it before I started to develop a mental model for what’s going on such that I didn’t feel like it was constantly burning me. I probably even complained about this in the confidence of friends, but for my own personal assurances, attached the caveat of “this is magical because it’s unfamiliar to me.”

Which brings me to my last caveat: all this advice works for me because I’ve been programming for quite a while. I have tons of knowledge, the kind anyone can read and the kind you have to win by experience, to draw upon. If you’re still in your first decade of programming, nearly everything will seem like magic. Worse, it’s hard to tell what’s useful magic, what’s virtuous magic, and what’s plain-old mediocre code. In that case: when you’re confronted with magic, consult me or your nearest Adam-like collaborator.

On Code Review:

Bias to small, digestible review requests. When possible, try to break down your large refactor into smaller, easier to reason about changes, which can be reviewed in sequence (or better still, orthogonally). When your review request gets bigger than about 400 lines of code, ask yourself if it can be compartmentalized. If everyone is efficient at reviewing code as it is published, there’s no advantage to batching small changes together, and there are distinct disadvantages. The most dangerous outcome of a large review request is that reviewers are unable to sustain focus over many lines, and the code isn’t reviewed well or at all.

This has made code review of big features way more plausible on my current team. Large work is organized into epic branches which have review branches which are individually reviewed. This makes the final merge and review way more tractable.

Your description should tell the story of your change. It should not be an automated list of commits. Instead, you should talk about why you’re making the change, what problem you’re solving, what code you changed, what classes you introduced, how you tested it. The description should tell the reviewers what specific pieces of the change they should take extra care in reviewing.

This is a good start for a style guide ala git commits!

Fewer changes are faster to deploy than fewer changes

Itamar Turner-Trauring, Incremental results: how to succeed at large software projects:

  • Faster feedback…
  • Less unnecessary features…
  • Less cancellation risk…
  • Less deployment risk…

👏 👏 👏 👏 👏  read the whole thing, Itamar’s tale is well told.

Consider: incremental approaches consist of taking a large scope and finding smaller, still-valuable scopes inside of it. Risk is 100% proportional to scope. Time-to-deliver grows as scope grows. Cancellation and deployment risk grow as time-to-deliver grows. It’s not quite math, but it is easy to demonstrate on a whiteboard. In case you happen to need to work with someone who wants large scope and low risk and low time-to-delivery.

TIL: divide by 10 with this one weird number

Running an application across two physical databases is not a straightforward thing. One of the relatively easier ways to do it involves assigning each database instance a shard number and then arranging for all your primary key IDs to end with that number. For example, shard 0 generates IDs like 1230, 40, 482340, shard 1 generates IDs like 1231, 41, 482341, and shard 2 generates IDs like 1232, 42, 482342, etc. all the way up to 9. If you want more than 10 database shards, it gets more involved.

My brain is wired oddly, so I came to wonder how you would quickly get the shard ID for an ID (e.g. shard 1 for 1231). This is really easy with decimal math; you just divide by 10. However, we run our databases on computers that can only do binary math, so its not actually simple.

But it turns out you can do it quite fast! There’s one weird number, expressed as `0x1999999A` hexadecimal, that is very close to multiplying by the fraction 1/10 (plus further binary math and register trickery). Thus you can do this in only a few instructions on Intel processors released in the past twenty years.

I’m really glad someone else figured this out.

OAuth2 🔥-takes

Is it too late to do hottakes for something that’s been around for nearly a decade?

OAuth2 pros:

  • I can allow other sites to use my data with some confidence that, at least, my authentication information won’t leak
  • It has made really cool stuff possible at my current workplace and workplace-2
  • Libraries to make it happen in server-side apps are pretty good

Cons:

  • There are a bajillionty implementations and standard definitions of OAuth2 (for somewhat justifiable reasons)
  • If you want to tinker with an OAuth2 API, you’re in a bit of hurt because you can’t just grab a token and start playing (mostly, depending on the implementer)
  • Those open source libraries are the kind of thing that drive maintainers away pretty quickly 😬

Overall: would not uninvent this technology.

The emotional rollercoaster of extracting code

There’s a moment of despair when extracting functionality from a larger library, framework, or program. The idea grows, a seed at first and then a full-blown tree, that the coupling in this functionality isn’t all bad. A lot of people talk only about coupling and leave out cohesion. They aren’t mutually exclusive! When the two are balanced, it’s hard to come up with a reason to start extracting.

On the other hand, sometimes that moment of despair strikes when you start really digging into the domain and realize this chunk of functionality isn’t what you thought it was. Maybe it’s not coherent (see above!) or perhaps the model of the domain isn’t deep enough. This is a pretty good signal to hit the brakes on the refactoring, figure the domain out, and reconsider the course of action.

Feature envy rears its head in extractions too. Patterns of crosstalk between the existing thing and the new thing are a sure sign of feature envy. It’s tempting to say, hey maybe you really need a third thing in the middle. That’s probably making matters worse though.

That said, changing bidirectional communication to unidirectional is usually a positive thing. Same for replacing any kind of asynchronous communication with synchronous. Or replacing lockstep coordination with asynchronous messaging. Envy is tricky!

(I) often encourage starting a new service or application within your existing “mothership”. The trendy way to say this right now is “monorepo all the things” or build a “modular monolith”. I find this compelling because you can leverage a lot of existing effort into operationalizing, tooling, and infrastructure. Once you know the domain and technical concerns specific to the new thing, you can easily extract into its own thing if you need to. The other edge of a monorepolith is that path dependence is a hell of a thing. Today is almost certainly an easier day to split stuff out than tomorrow.

A thing to consider pursuing is a backend-for-frontend service in pursuit of a specific frontend. It doesn’t even have to serve an application. You may have services that are specific to mobile, desktop, apps, APIs, integrations, etc. Each of these may need drastically different rates of change, technical features, and team sets.

Probably don’t split out a service so that a bunch of specialized people can build a “center of excellence” for the rest of the organization to rely upon. This is a very fancy way to say “we are too cool for everyone else and we just can’t stand the work everyone else is doing”. On their best day, the Excellence team will be overwhelmed by the volume of work they have put in front of themselves to make Everything Good. On their worst day, they will straight give up.

If you split something out, realize you’re going to have to maintain it until you replace it. And you’re going to rebuild the airplane while it’s flying. If you’re not really into that, stop now. Just because you can’t stand Rails, relational databases, or whatever doesn’t mean you should jump into an extraction.

More ideas for framework people

A few months ago I wrote about Framework and Library people. I had great follow-up conversations with Ben Hamill, Brad Fults, and Nathan Ladd about it. Some ideas from those conversations:

  • use a well-worn framework when it addresses your technical complexities (e.g. expose functionality via the web or build a 3-d game) and your domain complexity (e.g. shopping, social networking, or multi-dimensional bowling) is your paramount concern

  • once you have some time/experience in your problem domain, start rounding off corners to leave future teammates a metaframework that reduces decision/design burdens and gives them some kind of golden path

  • frameworks may end up less useful as integration surface area increases

  • napkin math makes it hard to justify not using a framework; you have to build the thing and accept the cost of not having a community to support you and hire from

  • to paraphrase Sandi Metz on the wrong abstraction: “(Using) no abstraction is better than the wrong abstraction”; if you’ve had a bad time with a framework, chances it was an inappropriate abstraction or you used the abstraction incorrectly

Did you try editing the right file?

The first few years of my career, I edited the wrong file all the time. I could spend hours making changes, wondering why nothing was happening, until I realized I’d been tinkering in the wrong place because I was misreading a file path or not paying close enough attention to control flow.

Fast forward to now, and I’m pretty quick to drop a raise "BLORP" in code I’m tinkering with if things aren’t working like I think they should. All hail puts debuggerering.

However, it turns out I found a new class of this operator error today. I was diligently re-running a test case, expecting new results when the test fixture file I thought was changed was the wrong file. Once I deleted the right file, I was back on my way.

Joyful and grumpy are we who can find new ways to screw up time ever day!

Chaining Ruby enumerators

I want to connect two Ruby enumerators. Give me all the values from the first, then the second, and so on. Ideally, without forcing any lazy evaluations and flat so I don’t have to think about nested stuff. Like so:

xs = [1, 2, 3].to_enum
ys = [4, 5, 6].to_enum
[xs, ys].chain.to_a # => [1, 2, 3, 4, 5, 6]

I couldn’t figure out how to do that with Ruby’s standard library alone. But, it wasn’t that hard to write my own:

def chain(*enums)
  return to_enum(:chain, *enums) unless block_given?

  enums.each { |enum| enum.each { |e| yield e } }
 end

But it seems like Ruby’s library, Enumerable in particular, is so strong I must have missed something. So, mob programmers, is there a better way to do this? A fancier enumerator-combining thing I’m missing?

Stored Procedure Modern

The idea behind Facebook’s Relay is to write declarative queries, put them next to the user interaction code that uses them, and compose those queries. It’s a solid idea. But this snippet about Relay Modern made me chuckle:

The teams realized that if the GraphQL queries instead were statically known — that is, they were not altered by runtime conditions — then they could be constructed once during development time and saved on the Facebook servers, and replaced in the mobile app with a tiny identifier. With this approach, the app sends the identifier along with some GraphQL variables, and the Facebook server knows which query to run. No more overhead, massively reduced network traffic, and much faster mobile apps.

Relay Modern adopts a similar approach. The Relay compiler extracts colocated GraphQL snippets from across an app, constructs the necessary queries, saves them on the server ahead of time, and outputs artifacts that the Relay runtime uses to fetch those queries and process their results at runtime.

How many meetings did they need before they renamed this from “GraphQL stored procedures” to “Relay Modern”?

(FWIW, I worked on a system that exposed stored procedures through a web service for client-side interaction code. It wasn’t too bad, setting aside the need to hand write SQL and XSLT.)