Three Easy Essays on Distributed Systems

Ryan Smith is pretty good at thinking about distributed systems. Distributed systems, the systems we (sometimes unwittingly) create on a regular basis these days, are a complicated, dense, far-reaching topic. Ryan’s managed to take a few of its problems and concisely introduce them with simple solutions that apply to all but the largest systems.

In The Worker Pattern, he presents a novel solution to a problem you are probably tackling with background or asynchronous job queues. Teaser: do you know what the HTTP 202 status code does?

A web service that requires high throughput will undoubtedly need to ensure low latency while processing requests. In other words, the process that is serving HTTP requests should spend the least amount of time possible to serve the request. Subsequently if the server does not have all of the data necessary to properly respond to the request, it must not wait until the data is found. Instead it must let the client know that it is working on the fulfillment of the request and that the client should check back later.

Coordinating multiple processes that need to process a dataset in bulk is tricky. Large systems usually end up needing some kind of Paxos service like Doozer or ZooKeeper to keep all the worker processes from butting heads or duplicating work. Leader Election shows how, by scoping the problem space to existing tools, it becomes possible to put together a solution that scales down to small and medium-sized systems:

My environment already is dependent on Ruby & PostgreSQL so I want a solution that leverages my existing technologies. Also, I don’t want to create a table other than the one which I need to process.

As applications grow, they tend to maintain more and more state across more and more systems. Incidental state is problematic, especially when you have to maintain several services to keep all of it available. Applying Event Buffering mitigates many of these problems. The core idea of this one is my favorite:

We have seen several examples of how to transfer state from our client to our server. The primary reason that we take these steps to transfer state is to eliminate the number of services in our distributed system that have to maintain state. Keeping a database on a service eventually becomes and operational hazard.

Most of the systems we build on the web today are distributed systems. Ryan’s writings are an excellent introduction to thinking about and building these systems. It certainly helps to comb through research papers on the topic, but these three essays are excellent starters down the path to intentionally building distributed systems.


Ruby anthropology with Hopper

Zach Holman is doing some interesting code anthropology on the Ruby community. Consider Aggressively Probing Ruby Projects:

Hopper is a Sinatra app designed to pull down tens of thousands of Ruby projects from GitHub, snapshot each repository into ten equidistant revisions, run them through a battery of tests (which we call Probes), and hopefully come up with some deeply moving insights about how we write Ruby.

There are plenty of code metric gizmos out there. At a glance, Hopper takes a few nice steps over extant projects. Unlike previous tools, it has a clear design, an obvious extension mechanism, and the analysis tools are distinct from the reporting tools. Further, it’s designed to run out-in-the-open, on existing open source projects. This makes it immediately useful and gives it a ton of data to work with.

For entertainment, here’s some information collected on some stuff I worked on at Gowalla: Chronologic and Audit.


A real coding workspace

Do you miss the ability to take a bunch of paper, books, and writing utensils and spread them out over a huge desk or table? Me too!

Light Table is based on a very simple idea: we need a real work surface to code on, not just an editor and a project explorer. We need to be able to move things around, keep clutter down, and bring information to the foreground in the places we need it most.

This project is fantastic. It’s taking a page from the Smalltalk environments of yore, cross-referencing that with Bret Victor’s ideas on workspace interactivity. The result is a kick-in-the-pants to almost every developer’s current workflow.

There’s a lot to think about here. A lot of people focus on making their workflow faster, but what about a workspace that makes it easier to think? There’s a lot of room to design a better workspace, even if you’re not going as far as Light Table does.

There’s a project on Kickstarter to fund further development of Light Table. If you write software, it’s likely in your interest to chip in.


UserVoice's extremely detailed project workflow

Some nice people at UserVoice took the time to jot down how they manage their product. Amongst the lessons learned:

Have a set amount of time per week that will be spent on bugs

We have roughly achieved this by setting a limit on the number of bugs we’ll accept into Next Up per week. This was a bit contentious at first but has resolved a lot of strife about whether a bug is worthy. The customer team is now empowered (or burdened) with choice of choosing which cards will move on. It’s the product development version of the Hunger Games.

This, to me, is an interesting juxtaposition. Normally, I think of bugs as things that should all be fixed, eventually. Putting some scarcity of labor into them is a great idea. Fixing bugs is great, until it negatively affects morale. Better to address the most critical and pressing bugs and then move the product ball forward. A mechanism to limit the number of bugs to fix, plus the feedback loop of recognizing those who fix bugs in an iteration (they mention this elsewhere in the article), is a great idea.


Cowboy dependencies

So you’ve written a cool open source library. It’s at the point where it’s useful. You’re pretty excited. Even better, it seems like something that might be useful at your day job. You could go ahead and integrate it. Win-win! You get to work out the rough edges on your open source project and make progress on your professional project.

This is tricky ground and it’s not as win-win as you might think. Integrating a new dependency, whether its one maintained by a team-mate or not, requires communication. Everyone on the team will have to know about the dependency, how to work with it, and how to maintain it within the project. If there’s a deal-breaking concern with the library, consider it feedback on your library; it either needs to better address the problem, or it needs better documentation to address why the problem isn’t so much a problem.

It all comes down to communication. Adding a dependency, even if you know the person who wrote it really well, requires collaboration from your teammates. If you’re not talking to your teammates, you’re just cowboy coding.

Don’t cowboy dependencies into your project!


A Presenter is a signal

When someone says “your view or API layer needs presenters”, it’s easy to get confused. Presenter has become wildcard jargon for a lot of different sorts of things: coordinators, conductors, representations, filter, projections, helpers, etc. Even worse, many developers are in the “uncanny valley” stage of understanding the pattern; it’s close to being a thing, but not quite. I’ve come across presenters with entirely too much logic, presenters that don’t pull their weight as objects, and presenters that are merely indirection. Presenter is becoming a catch-all that stands for organizing logic more strictly than your framework typically provides for.

I could make it my mission to tell every single person they’re wrong about presenters, but that’s not productive and it’s not entirely correct. Rather, presenters are a signal. When you say “we use presenters for our API”, I hear you say “we found we had too much logic hanging around in our templates and so we started moving it into objects”. From there on out, every application is likely to vary. Some applications are template focus and so need objects that are focused on presentational logic. Other apps are API focused and need more support in the area of emitting and parsing JSON.

At first, I was a bit concerned about the explosion of options for putting more objects into the view part of your typical model-view-controller application. But as I see more applications and highly varied approaches, I’m fine with Rails not providing a standard option. Better to decide what your application really needs and craft it yourself or find something viable off the shelf.

As long as your “presenters” reduce the complexity of your templates and makes logic easier to decouple and test, we’re all good, friend.


Learn Unix the Jesse Storimer way

11 Resources for Learning Unix Programming:

I tend to steer clear of the thick reference books and go instead for books that give me a look into how smart people think about programming.

I have a soft spot in my heart for books that are way too long. But Jesse’s on to something, I think. The problem with big Unix books is that they are tomes of arcane rites; most of it just isn’t relevant to those building systems on a modern Unix (Linux) with modern tools (Java, Python, Ruby, etc.).

Jesse’s way of learning you a Unix is way better, honestly. Read concise programs, cross-reference them with manual pages. Try writing your own stuff. Rinse. Repeat.


How to approach a database-shaped problem

When it comes to caching and primary storage of an application’s data, developers are faced with a plethora of shiny tools. It’s easy to get caught up in how novel these tools are and get over enthusiastic about adopting them; I certainly have in the past! Sadly, this route often leads to pain. Databases, like programming languages, are best chosen carefully, rationally, and somewhat conservatively.

The thought process you want to go through is a lot like what former Gowalla colleague Brad Fults did at his new gig with OtherInbox. He needed to come up with a new way for them to store a mapping of emails. He didn’t jump on the database of the day, the system with the niftiest features, the one with the greatest scalability, or the one that would look best on his resume. Instead, he proceeded as follows:

  1. Describe the problem domain and narrow it down to two specific, actionable challenges
  2. Elaborate on the existing solution and its shortcomings
  3. Identify the possible databases to use and summarize their advantages and shortcomings
  4. Describe the new system and how it solves the specific challenges

Of course, what Brad wrote is post-hoc. He most likely did the first two steps in a matter of hours, took some days to evaluate each possible solution, decided which path to take, and then hacked out the system he later wrote about.

But more importantly, he cheated aggressively. He didn’t choose one database, he chose two! He identified a key unique attribute to his problem; he only needed a subset of his data to be relatively fresh. This gave him the luxury of choosing a cheaper, easier data store for the complete dataset.

In short: solve your problem, not the problem that fits the database, and cheat aggressively when you can.


How I use vim splits

[vimeo www.vimeo.com/38571167 w=500&h=394]

A five-minute exploration of how I use splits in vim to navigate between production or test code and how I move up and down through layers of abstractions. Spoiler: I use vertical splits to put test and production code side-by-side; horizontal splits come into play when I’m navigating through the layers of a web app or something with a client/server aspect to it. I haven’t customized vim to make this work; I just use the normal window keybindings and a bunch of commands from vim-rails.

I seriously heart this setup. It’s worth taking thirty minutes to figure out how you can approximate it with whatever editor setup you enjoy most. Hint: splits are really fantastic.


Tool agnosticism is good for you

When it comes to programming editors, frameworks, and languages, you’re likely to take one of three stances: marry, boff, or kill. There are tools that you want to use for the rest of your life, tools that you want to moonlight or tinker with, and tools you never want to use again in your life.

Tool antagonism, ranting about the tools you want to kill, is fun. It plays well on internet forums. It goes nicely with beer and friends. But on a team that isn’t using absolutely terrible tools, it’s a waste of time.

Unless your team is bizarrely like-minded, it’s likely some disagreement will arise along the lines of editors, frameworks, and languages. I’ve been that antagonistic guy. We can’t use this language, it has an annoying feature. We can’t use this framework, it doesn’t protect us from an annoying kind of bug. I’ve learned these conversations are a waste of social capital and unnecessarily divisive. Don’t be that guy.

There are usually a few tools that are appropriate to a given task. I often have a preference and choose specific tools for my personal projects. There are others tools that I’m adept at using or have used before, but find minor faults with. I’ve found it way better for me to accept that other, non-preferred tool if it’s already in place or others have convictions about using it. Better to put effort into making the project or product better than spinning wheels on programming arcanery.

When it comes to programmer tools, rational agnosticsm beats antagonism every time. Train yourelf to work amazingly-great with a handful of tools, reach adeptness with a few others, and learn how to think with as many tools as possible.


Rails 4, all about scaling down?

To some, Rails is getting big. It brings a lot of functionality to the table. This makes apps easier to get off the ground, especially if you aren’t an expert. But, as apps grow, it can lead to the pain; there’s so much machinery in Rails, it’s likely you’re abusing something. It’s easy to look at other, smaller tools and think there’s green grass over there.

Rails 4 might have answers to this temptation.

On the controller side of things, it seems likely that some form of Strobe’s Rails extensions will find their way into Rails, making it easier to create apps (or sub-apps) that are focused on providing an API and eschew the parts of ActionPack that aren’t necessary for services. The thing I like a lot about this idea is it covers a gap between Sinatra and Rails. You can prototype your app with all the conveniences of Rails and then strip out the parts you don’t need as you grow the app and strip it down to provide quick services. You could certainly still rewrite services in Sinatra, Grape, or Goliath, but it’s nice to have an option.

On the model side of things, people are, well, modeling. Simpler ways to use use ActiveModel with ActionPack in Rails will appear in Rails 4. The components the DataMapper team is working on, in the form of Virtus seem really interesting too. If you want to get started now, you can check out ActiveAttr right now, sort of the bonus track version of ActiveModel. Chris Griego’s put a lot of solid thought into this; you definitely want to check out his slides on models everywhere; they’re lurking in your controllers, your requests, your responses, your API clients, everywhere.

In short, my best guess on Rails 4, right now, is that it will continue to give developers a curated set of choices and frameworks to get their application off the ground. It will add options to grow your application’s codebase sensably once it’s proven out.

What I know, for sure, is that the notion of Rails 4 seems really strange to me. How fast time flies. Uphill, both ways.


This silver bullet has happened before and it will happen again

Today it's Node. Before it was Rails. Before it was PHP. Before it was Java. Cogs Bad:

There’s a whole mindset - a modern movement - that solves things in terms of working out how to link together a constellation of different utility components like noSQL databases, frontends, load balancers, various scripts and glue and so on.  Its not that one tool fits all; its that they want to use all the shiny new tools.  And this is held up as good architecture!  Good separation.  Good scaling.

I’ve fallen victim to this mindset. Make everything a Rails app, solve all the problems with Ruby, store all the data in distributed databases! It’s not that any of these technologies are wrong it’s just they might not yet be right for the problem at hand.

You can almost age generations of programmers like tree rings. I’m of the PHP/Rails generation, though I started in the Linux generation. A few years ago, I thought I could school the Java generations. But it turns out, I’ve learned a ton from them, even when I was a bit of a hubristic brat. The Node generation of developers will teach me more still, both in finding new virtuous paths and in going down false paths so I don’t have to follow them.

That said, it would be delightful if there was a shortcut to get these new generations past the “re-invent all the things!” phase and straight into the “make useful things and constructive dialog about how they’re better” phase.


Bootstrap, subproject, and document your way to a bigger team

Zach Holman's slides on patterns GitHub uses to scale their team Ruby Patterns from GitHub's Codebase:

Your company is going to have tons of success, which means you'll have to hire tons of people.

My favorites:

  • Every project gets a script/bootstrap for fetching dependencies, putting data in place, and getting new people ready to go ASAP. This script comes in handy for CI too.
  • Try new techniques by deploying it only to team members at first. The example here was auto-escaping markup. They started with this only enabled for staff, instead of turning it on for everyone and feeling the hurt.
  • Build projects within projects. Inevitably areas of functionality start to get so complex or generic that they want to be their own thing. Start by partitioning these things into lib/some_project, document it with a read me in lib/some_project and put the tests in test/some_project. If you need to share it across apps or scale it differently someday, you can pull those folders out and there you go.
  • Write internal, concise API docs with TomDoc. Most things only need 1-3 lines of prose to explain what’s going on. Don’t worry about generating browse-able docs, just look in the code. I heart TomDoc so much.

These ideas really aren’t about patterns, or scaling the volume of traffic your business can handle. They’re about scaling the size of your team and getting more effectiveness out of every person you add.


Write more manpages

Every program, library, framework, and application needs documentation of some sort; this much is uncontroversial. How much documentation, what kinds of documentation, and where to put that documentation are the questions that often elicit endless prognostication.

When it comes to documentation aimed at developers, there’s a spectrum. On one end, there’s zero documentation, only code. On the other end of the spectrum, are literate programs; the code is intertwined with the documentation and the language is equally geared towards marking up a document and translating ideas into machine executable code.

Somewhere along this spectrum exists a happy ideal for most programmers. Inline API docs ala JavaDoc, RDoc, and YARD have been popular for a while. Lately, tools like docco and rocco have raised enthusiasm for “semi-literate programming”. There’s also a lot of potential in projects exhaustively documenting themselves in their Cucumber features as vcr does.

All of these tools couple code with documentation, per the notion that putting them right next to each other makes it more likely documentation gets updated in sync with the code. The downside to this approach is that code gets ‘noised up’ with comments. Often this is a fair trade, but it occasionally makes navigating a file cumbersome.

It happens that Unix, in its age-old sage ways, has been storing its docs out-of-line with the relevant code for years. They’re called manpages, and they mostly don’t suck. Every C API on a modern Unix has a corresponding manpage that describes the relevant functions and structures, how to use it, and any bugs that may exist. They’re actually a pretty good source of info.

Scene change.

It so happens that Ryan Tomayko is a Unix afficionado and wrote a tool that is even better for writing manpages than the original Unix tooling. It’s called ronn, and it’s pretty rad; you write Markdown, and you get bonafide UNIX manpages plus manpage-styled HTML.

Perhaps this is a useful way to write programmer-focused docs. Keep docs out of the code, put it in manpages instead, push it to GitHub pages. Code stays focused, docs still look great.

I took John Nunemaker’s scam gem and put this idea to the test. Here’s what the manpage looks like, with the default styling provided by ronn:

The manage for scam

Here’s the raw ronn document:

Eat a plain-text sandwich

No Ruby files were harmed in the making of this documentation.

It took me about ninety minutes to put this together. Probably 33-50% of that time was simply tinkering with ronn and making sure I was writing in the style that is typical of manpages. So we’re talking about forty-five minutes to document a mixin with seven methods. For pretty good looking output and simple tooling, that’s a very modest investment.

The potential drawbacks are the same as any kind of documentation; it could fall out-of-sync with the production code. Really, I think this is more of a workflow issue. Ideally, every commit or merged branch is reviewed to make sure that relevant docs are updated. As a baseline, the release workflow for a project should include a step to make sure docs are up-to-date.

In short, I have one datapoint that tells me that ronn is a pretty great way to generate programmer-oriented documentation. I’ll keep tinkering with it and encourage other developers to do the same.


How to make a CIA spy, and other anecdotes

And the hilariously incompetent, such as the OSS operative whose cover was so far blown that when he dropped into his favorite restaurant, the band played “Boo! Boo! I’m a Spy.”

Interesting, new-to-me tidbits on what goes into making CIA spies, what they actually do in the field, and how the practitioners of spy craft have changed over the years. The bad news: spies recruitment doesn’t exactly work like in Spies Like Us. The good news: the CIA and its spying is closer to “just as bad/inept as you’d think” than “as diabolical as a James Bond villain”.


Own your development tools, and other cooking metaphors

Noel Rappin encourages all of us to use our development tools efficiently. If your editor or workflow aren’t working for you, get a new tool and learn to use it.

I’ve been working with another principle lately: minimize moving parts. I used to spend time setting up tools like autotest, guard, or spork. But it ended up that I spent too much time tweaking them or, even worse, figuring out exactly how they were working.

I’ve since adopted a much simpler workflow. Just a terminal, a text editor, and some scripts/functions/aliases/etc. for running the stuff I do all the time. I take note when I’m doing something repeatedly and figure out how I can automate it. Besides that, I don’t spend much time thinking about my tools. I spend time thinking about the problem in front of me. It makes a lot of sense, when you think about it.

I say you should “own” your tools and minimize moving parts because you should understand how they all work together and how they might change the behavior of your code. If you don’t own your tools in this way, you’ll end up wasting time debugging someone else’s code, i.e. a misbehaving tool. That’s just a waste of time; when you come across a tool that offends in this way, put aside a time block to fix it, or discard it outright.


Automated code goodness checking with cane

A script to run cane in a project

A few nights ago, I added Xavier Shay’s cane to Sifter. It was super simple, and cane runs surprisingly fast. Cane is designed to run as part of CI, but Sifter doesn’t really an actual CI box. Instead, I’ve added it to our preflight script that tells us whether deploying is a good idea or not, based on our spec suite. Now that preflight can tell us if we’ve regressed on code complexity or style as well. I’m pretty pumped about this setup.

Next step: add glib comments on failure. I’m thinking of something like “Yo, imma let you finish but the code you’re about to deploy is not that great.”


What kind of HTTP API is that?

An API Ontology: if you were curious about what the difference between an RPC, SOAP, REST, and Hypermedia API are, but were afraid to ask. In my opinion, this is not prescription; I don't think there's anything inherently wrong with using any of these, except SOAP. Sometimes an RPC or a simple GET is all you need.


On rolling one's own metrics kit

On instrumenting Rails, custom aggregators, bespoke dashboards, and reinventing the wheel; 37signals documents their own metrics infrastructure. They’re doing some cool things here:

  • a StatsD fork that stores to Redis; for most people, this is way more sensible than the effort involved in installing Graphite, let alone maintaining it
  • storing aggregated metrics to flat files; it’s super-tempting to overbuild this part, but if flat files work for you, run with it
  • leaning on ActiveSupport notifications for instrumentation; I’ve tinkered with this a little and it’s awesome, I highly recommend it if you have the means
  • building a custom reporting app on top of their metric data; anything is better than the Graphite reporting app

More like this, please.

One could take issue with them rolling this all on their own, rather than relying on existing services. If 37signals were a fresh new shop, one would have a point. Building out metrics infrastructure, even today with awesome tools like StatsD, can turn into an epic time sink. However, once you’ve been around for several years and thought enough about what you need to measure and act on your application, rolling your own metrics kit starts to make a lot of sense. It’s fun too!

Of course, the important part is they’re measuring things and operating based on evidence. Whether you roll your own metrics infrastructure or use something off the shelf like Librato or NewRelic, operating on hard data is the coolest thing of all.


Whither code review and pairing

Jesse Storimer has great thoughts on code review and pairing. You Should be Doing Formal Code Review:

Let’s face it, developers are often overly confident in their work, and telling them that something is done wrong can be taken as a personal attack. If you get used to letting other people look at, and critque, your code then disidentification becomes a necessity. This also goes vice versa, you need to be able to talk about the code of your peers without worrying about them taking your critiques as a personal attack. The goal here is to ensure that the best code possible makes it into your final release.

I struggle with this so much, on both the giving and receiving side. When I’m reviewing code, I find myself holding back so as not to come off as saying the other person’s code is awful and offensive. On the receiving side, I often get frustrated and feel like a huge impediment has been put in front of my ability to ship code. In reality, neither is the case. Whether I’m the reviewer or the reviewee, the other party is simply trying to get the best code possible into production.

Jesse has further great points: review helps you avoid shortcuts, encourages one to review their own code (my favorite), and it makes for better code.

More recently, Jesse’s pointed out that pairing isn’t necessarily a substitute for code review: “…pairing is heavyweight and rare. Code review is lightweight and always.”

In my experience, pairing is great for cornering a problem and figuring out what the path to the solution is. Pairing is great for bringing people into the fold of a new team or project. Review is great for enforcing team standards and identifying wholly missing functionality. Review is sometimes great for finding little bugs, as is pairing.

Neither pairing or code review is a silver bullet for better software, but when a team applies them well, really awesome things can happen.