writings
The Grinder
As teams grow and specialize, I’ve noticed people tend to take on characters that I see over and over. Archetypes that seem to go beyond one project and apply to each team I work on over time. Today I want to talk about one of those archetypes: the Grinder.
The Grinder isn’t the smartest or most skilled guy on your team. They don’t write the prettiest code, they aren’t up-to-date on the state of the art, and the way they use tools can seem simplistic. Often, The Grinder doesn’t even push working code; there are often tiny bugs lurking, or even syntax errors. They push or deploy this code perhaps dozens of times a day. At first glance, The Grinder is a Terrifying Problem.
What sets The Grinder apart from your garden variety mediocre developer is that The Grinder is an expert at making progress. They move rapidly, they upset things, and then they get it working. The Grinder is an indispensable part of your team because they’re a bit cutthroat. They’re not worrying about the coolest new tech or design approaches the intelligensia are raving about. They’re just thinking, “how do I get this into production, get feedback, and get on with the next thing?”
The Grinder is an indispensable part of your team because they balance out the thinkers and worriers. While they’re asking “can we?” and “should we?” the Grinder is just getting it done. Grinders expand the realm of possibility by taking a journey of a thousand steps. They don’t invent a jetpack or hoverboard first; they just go with what they have.
The Grinders I’ve known are typically humble, kind people. They know how to operate their tools to get stuff done, and that’s mostly good enough for them. They’re not opposed to hearing about new techniques, but they want to know how it’s going to help them push code out faster. They are not particularly phased by brainy tech that appeals to novelty.
Pair a Grinder with a thinker who values how their skills complement each other and you can make a ton of progress without making a huge mess. A team of all Grinders would eventually burn itself out. Grinders stop when the feature is done, not when the day is over or their brain is out of gas. Grinders need thinkers to encourage them to regulate their work pace and to help them understand how to make rapid progress without coding themselves into a corner.
It’s not hard to recognize the Grinder on your team; it’s likely even the non-technical people in the company know who they are and recognize their strengths. If you’re a thinker who is a little flabberghasted that the Grinder approach works, take some inspiration from how they do what they do and ship some stuff with them.
If it’s late or the Grinder has been working long hours, tap them on the shoulder and tell them they do good work. Send them home so they can sustain it over weeks and months without running themselves down. A well-rested, excited Grinder is one of your team’s best assets.
Keep your application state out of your queue
I’m going to pick on Resque here, since it’s nominally great and widely used. You’re probably using it in your application right now. Unfortunately, I need to tell you something unsettling.
¡DRAMA MUSIC!
There’s an outside chance your application is dropping jobs. The Resque worker process pulls a job off the queue, turns it into an object, and passes it to your class for processing. This handles the simple case beautifully. But failure cases are important, if tedious, to consider.
What happens if there’s an exception in your processing code? There’s a plugin for that. What happens if you need to restart your Resque process while a job is processing? There’s a signal for that. What if, between taking a job off the queue and fully processing it, the whole server disappears due to a network partition, hardware failure, or the FBI “borrowing” your rack?
¡DRAMA MUSIC!
Honestly, you shouldn’t treat Resque, or even Redis, like you would a relational or dynamo-style database. Redis, like memcached, is designed as a cache. You put things in it, you can get it out, really fast. Redis, like memcached, rarely falls over. But if it does, the remediation steps are manual. Redis currently doesn’t have a good High Availability setup (it’s being contemplated).
Further, Resque assumes that clients will properly process every message they dequeue. This isn’t a bad assumption. Most systems work most of the time. But, if a Resque worker process fails, it’s not great. It will lose all of the message(s) held in memory, and the Redis instance that runs your Resque queues is none the wiser.
¡DRAMA MUSIC!
In my past usage of Resque, this isn’t that big of a deal. Most jobs aren’t business-critical. If the occasional image doesn’t get resized or a notification email doesn’t go out, life goes on. A little logging and data tweaking cures many ills.
But, some jobs are business-critical. They need stronger semantics than Resque provides. The processing of those jobs, the state of that processing, is part of our application’s logic. We need to model those jobs in our application and store that state somewhere we can trust.
I first became really aware of this problem, and a nice solution to it, listening to the Ruby Rogues podcast. Therein, one of the panelists advised everyone to model crucial processing as state machines. The jobs become the transitions from one model state to the next. You store the state alongside an appropriate model in your database. If a job should get dropped, it’s possible to scan the database for models that are in an inconsistent state and issue the job again.
Example follows
Let’s work an example. For our imaginary application, comment notifications are really important. We want to make sure they get sent, come hell or high water. Here’s what our comment model looks like originally:
class Comment
after_create :send_notification
def send_notification
Resque.enqueue(NotifyUser, self.user_id, self.id)
end
end
Now we’ll add a job to send that notification:
class NotifyUser
@queue = :notify_user
def self.perform(user_id, comment_id)
# Load the user and comment, send a notification!
end
end
But, as I’ve pointed out with great drama, this code can drop jobs. Let’s throw that state machine in:
class Comment
# We'll use acts-as-state-machine. It's a classic.
include AASM
# We store the state of sending this notification in the aptly named
# `notification_state` column. AASM gives us predicate methods to see if this
# model is in the `pending?` or `sent?` states and a `notification_sent!`
# method to go from one state to the next.
aasm :column => :notification_state do
state :pending, :initial => true
state :sent
event :notification_sent do
transitions :to => :sent, :from => [:pending]
end
end
after_create :send_notification
def send_notification
Resque.enqueue(NotifyUser, self.user_id, self.id)
end
end
Our notification has two states: pending, and sent. Our web app creates it in the pending state. After the job finishes, it will put it in the sent state.
class NotifyUser
@queue = :notify_user
def self.perform(user_id, comment_id)
user = User.find(user_id)
comment = Comment.find(comment_id)
user.notify_for(comment)
# Notification success! Update the comment's state.
comment.notification_sent!
end
end
This a good start for more reliably processing jobs. However, most jobs happen to handle the interaction between two systems. This notification is a great example. It integrates our application with a mail server or another service that handles our notifications. Talking to those things is probably something that isn’t tolerant to duplicate requests. If our process croaks between the time it tells the mail server to send and the time it updates the notification state in our database, we could accidentally process this notification twice. Back to square one?
¡DRAMA MUSIC!
Not quite. We can reduce our problem space once more by adding another state to our model.
class Comment
include AASM
aasm :column => :notification_state do
state :pending, :initial => true
state :sending # We have attempted to send a notification
state :sent # The notification succeeded
state :error # Something is amiss :(
# This happens right before we attempt to send the notification
event :notification_attempted do
transitions :to => :sending, :from [:pending]
end
# We take this transition if an exception occurs
event :notification_error do
transitions :to => :error, :from => [:sending]
end
# When everything goes to plan, we take this transition
event :notification_sent do
transitions :to => :sent, :from => [:sending]
end
end
after_create :send_notification
def send_notification
Resque.enqueue(NotifyUser, self.user_id, self.id)
end
end
Now, when our job processes a notification, it first uses notification_attempted
. Should this job fail, we’ll know which jobs we should look for in our logs. We could also get a little sneaky and monitor the number of jobs in this state if we think we’re bottlenecking around sending the actual notification. Once the job completes, we transition to the sent
state. If anything goes wrong, we catch the exception and put the job in the error
state. We definitely want to monitor this state and use the logs to figure out what went wrong, manually fix it, and perhaps write some code to fix bugs or add robustness.
The sending
state is entered when at least one worker has picked up a notification and tried to send the message. Should that worker fail in sending the message or updating the database, we will know. When trouble strikes, we’ll know we have two cases to deal with: notifications that haven’t been touched at all, and notifications that were attempted and may have succeeded. The former, we’ll handle by requeueing them. The latter, we’ll probably have to write a script to grep our mail logs and see if we successfully sent the message. (You are logging everything, centrally aggregating it, and know how to search it, right?)
The truth is, the integration points between systems is a gnarly problem. You don’t so much solve the problem; you get really good at detecting and responding to edge cases. Thus is life in production. But losing jobs, we can make really good progress on that. Don’t worry about your low-value jobs; start with the really important ones, and weave the state of those jobs into the rest of your application. Some day, you’ll thank yourself for doing that.
A Presenter is a signal
When someone says “your view or API layer needs presenters”, it’s easy to get confused. Presenter has become wildcard jargon for a lot of different sorts of things: coordinators, conductors, representations, filter, projections, helpers, etc. Even worse, many developers are in the “uncanny valley” stage of understanding the pattern; it’s close to being a thing, but not quite. I’ve come across presenters with entirely too much logic, presenters that don’t pull their weight as objects, and presenters that are merely indirection. Presenter is becoming a catch-all that stands for organizing logic more strictly than your framework typically provides for.
I could make it my mission to tell every single person they’re wrong about presenters, but that’s not productive and it’s not entirely correct. Rather, presenters are a signal. When you say “we use presenters for our API”, I hear you say “we found we had too much logic hanging around in our templates and so we started moving it into objects”. From there on out, every application is likely to vary. Some applications are template focus and so need objects that are focused on presentational logic. Other apps are API focused and need more support in the area of emitting and parsing JSON.
At first, I was a bit concerned about the explosion of options for putting more objects into the view part of your typical model-view-controller application. But as I see more applications and highly varied approaches, I’m fine with Rails not providing a standard option. Better to decide what your application really needs and craft it yourself or find something viable off the shelf.
As long as your “presenters” reduce the complexity of your templates and makes logic easier to decouple and test, we’re all good, friend.
How to approach a database-shaped problem
When it comes to caching and primary storage of an application’s data, developers are faced with a plethora of shiny tools. It’s easy to get caught up in how novel these tools are and get over enthusiastic about adopting them; I certainly have in the past! Sadly, this route often leads to pain. Databases, like programming languages, are best chosen carefully, rationally, and somewhat conservatively.
The thought process you want to go through is a lot like what former Gowalla colleague Brad Fults did at his new gig with OtherInbox. He needed to come up with a new way for them to store a mapping of emails. He didn’t jump on the database of the day, the system with the niftiest features, the one with the greatest scalability, or the one that would look best on his resume. Instead, he proceeded as follows:
- Describe the problem domain and narrow it down to two specific, actionable challenges
- Elaborate on the existing solution and its shortcomings
- Identify the possible databases to use and summarize their advantages and shortcomings
- Describe the new system and how it solves the specific challenges
Of course, what Brad wrote is post-hoc. He most likely did the first two steps in a matter of hours, took some days to evaluate each possible solution, decided which path to take, and then hacked out the system he later wrote about.
But more importantly, he cheated aggressively. He didn’t choose one database, he chose two! He identified a key unique attribute to his problem; he only needed a subset of his data to be relatively fresh. This gave him the luxury of choosing a cheaper, easier data store for the complete dataset.
In short: solve your problem, not the problem that fits the database, and cheat aggressively when you can.
Tool agnosticism is good for you
When it comes to programming editors, frameworks, and languages, you’re likely to take one of three stances: marry, boff, or kill. There are tools that you want to use for the rest of your life, tools that you want to moonlight or tinker with, and tools you never want to use again in your life.
Tool antagonism, ranting about the tools you want to kill, is fun. It plays well on internet forums. It goes nicely with beer and friends. But on a team that isn’t using absolutely terrible tools, it’s a waste of time.
Unless your team is bizarrely like-minded, it’s likely some disagreement will arise along the lines of editors, frameworks, and languages. I’ve been that antagonistic guy. We can’t use this language, it has an annoying feature. We can’t use this framework, it doesn’t protect us from an annoying kind of bug. I’ve learned these conversations are a waste of social capital and unnecessarily divisive. Don’t be that guy.
There are usually a few tools that are appropriate to a given task. I often have a preference and choose specific tools for my personal projects. There are others tools that I’m adept at using or have used before, but find minor faults with. I’ve found it way better for me to accept that other, non-preferred tool if it’s already in place or others have convictions about using it. Better to put effort into making the project or product better than spinning wheels on programming arcanery.
When it comes to programmer tools, rational agnosticsm beats antagonism every time. Train yourelf to work amazingly-great with a handful of tools, reach adeptness with a few others, and learn how to think with as many tools as possible.
Rails 4, all about scaling down?
To some, Rails is getting big. It brings a lot of functionality to the table. This makes apps easier to get off the ground, especially if you aren’t an expert. But, as apps grow, it can lead to the pain; there’s so much machinery in Rails, it’s likely you’re abusing something. It’s easy to look at other, smaller tools and think there’s green grass over there.
Rails 4 might have answers to this temptation.
On the controller side of things, it seems likely that some form of Strobe’s Rails extensions will find their way into Rails, making it easier to create apps (or sub-apps) that are focused on providing an API and eschew the parts of ActionPack that aren’t necessary for services. The thing I like a lot about this idea is it covers a gap between Sinatra and Rails. You can prototype your app with all the conveniences of Rails and then strip out the parts you don’t need as you grow the app and strip it down to provide quick services. You could certainly still rewrite services in Sinatra, Grape, or Goliath, but it’s nice to have an option.
On the model side of things, people are, well, modeling. Simpler ways to use use ActiveModel with ActionPack in Rails will appear in Rails 4. The components the DataMapper team is working on, in the form of Virtus seem really interesting too. If you want to get started now, you can check out ActiveAttr right now, sort of the bonus track version of ActiveModel. Chris Griego’s put a lot of solid thought into this; you definitely want to check out his slides on models everywhere; they’re lurking in your controllers, your requests, your responses, your API clients, everywhere.
In short, my best guess on Rails 4, right now, is that it will continue to give developers a curated set of choices and frameworks to get their application off the ground. It will add options to grow your application’s codebase sensably once it’s proven out.
What I know, for sure, is that the notion of Rails 4 seems really strange to me. How fast time flies. Uphill, both ways.
Write more manpages
Every program, library, framework, and application needs documentation of some sort; this much is uncontroversial. How much documentation, what kinds of documentation, and where to put that documentation are the questions that often elicit endless prognostication.
When it comes to documentation aimed at developers, there’s a spectrum. On one end, there’s zero documentation, only code. On the other end of the spectrum, are literate programs; the code is intertwined with the documentation and the language is equally geared towards marking up a document and translating ideas into machine executable code.
Somewhere along this spectrum exists a happy ideal for most programmers. Inline API docs ala JavaDoc, RDoc, and YARD have been popular for a while. Lately, tools like docco and rocco have raised enthusiasm for “semi-literate programming”. There’s also a lot of potential in projects exhaustively documenting themselves in their Cucumber features as vcr does.
All of these tools couple code with documentation, per the notion that putting them right next to each other makes it more likely documentation gets updated in sync with the code. The downside to this approach is that code gets ‘noised up’ with comments. Often this is a fair trade, but it occasionally makes navigating a file cumbersome.
It happens that Unix, in its age-old sage ways, has been storing its docs out-of-line with the relevant code for years. They’re called manpages, and they mostly don’t suck. Every C API on a modern Unix has a corresponding manpage that describes the relevant functions and structures, how to use it, and any bugs that may exist. They’re actually a pretty good source of info.
Scene change.
It so happens that Ryan Tomayko is a Unix afficionado and wrote a tool that is even better for writing manpages than the original Unix tooling. It’s called ronn, and it’s pretty rad; you write Markdown, and you get bonafide UNIX manpages plus manpage-styled HTML.
Perhaps this is a useful way to write programmer-focused docs. Keep docs out of the code, put it in manpages instead, push it to GitHub pages. Code stays focused, docs still look great.
I took John Nunemaker’s scam gem and put this idea to the test. Here’s what the manpage looks like, with the default styling provided by ronn:

Here’s the raw ronn document:
No Ruby files were harmed in the making of this documentation.
It took me about ninety minutes to put this together. Probably 33-50% of that time was simply tinkering with ronn and making sure I was writing in the style that is typical of manpages. So we’re talking about forty-five minutes to document a mixin with seven methods. For pretty good looking output and simple tooling, that’s a very modest investment.
The potential drawbacks are the same as any kind of documentation; it could fall out-of-sync with the production code. Really, I think this is more of a workflow issue. Ideally, every commit or merged branch is reviewed to make sure that relevant docs are updated. As a baseline, the release workflow for a project should include a step to make sure docs are up-to-date.
In short, I have one datapoint that tells me that ronn is a pretty great way to generate programmer-oriented documentation. I’ll keep tinkering with it and encourage other developers to do the same.
Automated code goodness checking with cane
A few nights ago, I added Xavier Shay’s cane to Sifter. It was super simple, and cane runs surprisingly fast. Cane is designed to run as part of CI, but Sifter doesn’t really an actual CI box. Instead, I’ve added it to our preflight script that tells us whether deploying is a good idea or not, based on our spec suite. Now that preflight can tell us if we’ve regressed on code complexity or style as well. I’m pretty pumped about this setup.
Next step: add glib comments on failure. I’m thinking of something like “Yo, imma let you finish but the code you’re about to deploy is not that great.”
On rolling one's own metrics kit
On instrumenting Rails, custom aggregators, bespoke dashboards, and reinventing the wheel; 37signals documents their own metrics infrastructure. They’re doing some cool things here:
- a StatsD fork that stores to Redis; for most people, this is way more sensible than the effort involved in installing Graphite, let alone maintaining it
- storing aggregated metrics to flat files; it’s super-tempting to overbuild this part, but if flat files work for you, run with it
- leaning on ActiveSupport notifications for instrumentation; I’ve tinkered with this a little and it’s awesome, I highly recommend it if you have the means
- building a custom reporting app on top of their metric data; anything is better than the Graphite reporting app
More like this, please.
One could take issue with them rolling this all on their own, rather than relying on existing services. If 37signals were a fresh new shop, one would have a point. Building out metrics infrastructure, even today with awesome tools like StatsD, can turn into an epic time sink. However, once you’ve been around for several years and thought enough about what you need to measure and act on your application, rolling your own metrics kit starts to make a lot of sense. It’s fun too!
Of course, the important part is they’re measuring things and operating based on evidence. Whether you roll your own metrics infrastructure or use something off the shelf like Librato or NewRelic, operating on hard data is the coolest thing of all.
Whither code review and pairing
Jesse Storimer has great thoughts on code review and pairing. You Should be Doing Formal Code Review:
Let’s face it, developers are often overly confident in their work, and telling them that something is done wrong can be taken as a personal attack. If you get used to letting other people look at, and critque, your code then disidentification becomes a necessity. This also goes vice versa, you need to be able to talk about the code of your peers without worrying about them taking your critiques as a personal attack. The goal here is to ensure that the best code possible makes it into your final release.
I struggle with this so much, on both the giving and receiving side. When I’m reviewing code, I find myself holding back so as not to come off as saying the other person’s code is awful and offensive. On the receiving side, I often get frustrated and feel like a huge impediment has been put in front of my ability to ship code. In reality, neither is the case. Whether I’m the reviewer or the reviewee, the other party is simply trying to get the best code possible into production.
Jesse has further great points: review helps you avoid shortcuts, encourages one to review their own code (my favorite), and it makes for better code.
More recently, Jesse’s pointed out that pairing isn’t necessarily a substitute for code review: “…pairing is heavyweight and rare. Code review is lightweight and always.”
In my experience, pairing is great for cornering a problem and figuring out what the path to the solution is. Pairing is great for bringing people into the fold of a new team or project. Review is great for enforcing team standards and identifying wholly missing functionality. Review is sometimes great for finding little bugs, as is pairing.
Neither pairing or code review is a silver bullet for better software, but when a team applies them well, really awesome things can happen.
Hip-hop for nerds: "Otis"
(Ed. Herein, I attempt to break down a current favorite of mine, “Otis” by Jay-Z and Kanye West, in terms familiar and interesting to nerds, specifically of the nerd and/or comedy persuasion.)
“Otis” is a song arranged and performed by two best pals, Jay-Z and Kanye West. It opens with a sample of Otis Redding (hence the title) singing “Try a little tenderness”. Opening with a sample like this tells us two things:
- Misters Z and West enjoy the music of Mr. Redding enough that they were compelled to include it in their own music.
- The gentlemen are also well connected and affluent, as not just everyone can afford to sample a legend like Redding in their music
A digression: sampling in hip-hop is one of its key characteristics and is of particular interest to nerds. It is a way that we can connect, through “nerding out” with the artist and find what it is that they respect and listen to. It is also a bit of a recursive structure; “Otis” samples Otis, Otis borrowed from gospel and blues, blues and gospel borrowed from traditional songs, etc. Finally, sampling is a recombinant form; in “Otis”, there is a verbatim sample in the opening bars, but the sample devolves to a looped-beat in the middle of the song and a mere sound-effect at the end of the song.
As Misters Z and West enter the song proper, the rappers trade verses about their affluence (“New watch alert, Hublot’s / Or the big face Rollie I got two of those”), the recursive (again, nerdy) deception they use to evade the papperazzi (“They ain’t see me ‘cuz I pulled up in my other Benz / Last week I was in my other other Benz”), a conflicting verse about how they would seek the paparazzi out (“Photo shoot fresh, looking like wealth / I’m ‘bout to call the paparazzi on myself”), more boasting of their affluence and skill (“Couture level flow, it’s never going on sale / Luxury rap, the Hermes of verses”), and such.
This song features a video, so we wouldn’t be properly doing a nerd dissection of it if we were to neglect that. It opens with our heroes approaching a Maybach sedan with a saw and a blow torch. Following a “car modification montage”, it appears the doors have been removed from the car and the front end of the car has been placed on the back, and vice versa. Another display of affluence, with perhaps a touch of hipster irony thrown in.
The video follows with various shots of our heroes rapping and driving the Maybach through an abandoned dock or airfield. Our heroes are in the front seat of the car and there are four models in the back seat, one seated precariously atop the one in the middle as our heroes make dangerous-looking manuvers in the car. At one point it appears they will lose a model through the door-less side of the car. At multiple points, it appears the boobs of the models might fight free of their loose fitting shirt. It should be noted that the appearance of a possible free boob could be considered quite progressive for a hip-hop music video.
The Maybach is, in my opinion, the most difficult to interpret signal the song and video send. Are we to understand that Misters Z and West are so affluent they can afford to put down six figures on the purchase and massively impractical modification of a high-end luxury car? Perhaps they had a spare one laying around and felt it would be a better use to destroy it than to leave it around. Or, perhaps this was a vehicle for a clever tax deduction?
Mr. West's CPA: You're going to owe a lot of tax on this purchase of your other-other Benz, 'Ye.
Mr. West: What if I were to use it in a music video for the purposes of promoting my upcoming album?
Mr. West's CPA: Well then you could depreciate it at 50% this year and 25% for the next two years, but you're still going to owe a lot.
Mr. West: If I were to take it to a chop shop and have them put the ass-end of the car on the front and turn the doors into wings, could I depreciate it faster?
Mr. West's CPA: throw some models in the back seat, and it *just might work*!
(Ed. as it turns out, the vehicle was to be auctioned and the proceeds donated to charity)
The other enigma of the video is the presence of comedian Aziz Ansari. Mr. Ansari has documented (Ed. hilariously) his friendship with Mr. West. Thus it is not shocking to see him appear in the video. He appears for only an instant, and his appearance marks the absence of the models in the rest of the video. Perhaps, we are to believe, Mr. Ansari is the pumpkin that the models turn into after some deadline has passed for Misters Z and West.
Despite, or perhaps because, of its mysteries, I find “Otis” is a fantastic piece of hip-hop production. The samples is well chosen and deconstructed, the verses are interesting (if mentally unchallenging), and the video is engaging to watch. I would easily rank it amongst the top songs of recent memory, were I one to make lists of top songs.
When to Class.new
In response to Why metaprogram when you can program?, an astute reader asked for an example of when you would want to use Class.new
in Ruby. It’s a rarely needed method, but really fun when faced with a tasteful application. Herein, a couple ways I’ve used it and an example from the wild.
Dead-simple doubles
In my opinion, the most “wholly legitimate” frequent application of Class.new
is in test code. It’s a great tool for creating test doubles, fakes, stubs, and mocks without the weight of pulling in a framework. To wit:
TinyFake = Class.new do
def slow_operation
"SO FAST"
end
def critical_operation
@critical = true
end
def critical_called?
@critical
end
end
tiny_fake = TinyFake.new
tiny_fake.slow_operation
tiny_fake.critical_operation
tiny_fake.critical_called? == true
TinyFake
functions as a fake and as a mock. We can call a dummy implementation of slow_operation
without worrying about the snappiness of our suite. We can verify that a method was called in the verification section of our test method. Normally you would only do one of these things at a time, but this shows how easy it is to roll your own doubles, fakes, stubs, or mocks.
The thing I like about this approach over defining classes inside a test file or class is that it’s all scoped inside the method. We can assign the class to a local and keep the context for each test method small. This approach is also really great for testing mixins and parent classes; define a new class, add the desired functionality, and test to suit.
DSL internals
Rack and Resque are two examples of libraries that expose an API based largely on writing a class with a specific entry point. Rack middlewares are objects with a call
method that generates a response based on an environment hash and any other middlewares that are contained within the middleware. Resque expects the classes that work through enqueued jobs define a perform
method.
In practice, putting these methods in a class is the way to go. But, hypothetically, we are way too lazy to type class
/end
, or perhaps we want to wrap a bunch of standard instrumentation and logging around a simple chunk of code. In that case, we can write ourself a little shortcut:
module TinyDSL
def self.performer(&block)
c = Class.new
c.class_eval { define_method(:perform, block) }
c
end
end
Thingy = TinyDSL.performer { |*args| p args }
Thingy.new.perform("one", 2, :three)
This little DSL gives us a shortcut for defining classes that implement whatever contract is expected of performer
objects. From this humble beginning, we could mix in modules to add functionality around the performer
, or we could pass a parent class to Class.new
to make the generated class inherit from another class.
That leads us to the sort-of shortcoming of this particular application of Class.new
: if the unique function of performer
is to wrap a class around a method (for instance, as part of an API exported by another library), why not just subclass or mixin that functionality in the client application? This is the question you have to ask yourself when using Class.new
in this way and decide if the metaprogramming is pulling its weight.
How Class.new is used in Sinatra
Sinatra is a little language for writing web applications. The language specifies how HTTP requests are mapped to blocks of Ruby. Originally, you wrote your Sinatra applications like so:
get '/' { [200, {"Content-Type" => "text/plain"}, "Hello, world!"] }
Right before Sinatra 1.0, the team added a cleaner way to to build and compose applications as Ruby classes. It looks the same, except it happens inside the scope of a class instead of the global scope:
class SomeApp < Sinatra::Base
get '/' { [200, {"Content-Type" => "text/plain"}, "Hello, world!"] }
end
It turns out that the former is implemented in terms of the latter. When you use the old, global-level DSL, it creates a new class via Class.new(Sinatra::Base)
and then class_eval
s a block into it to define the routes. Short, clever, effective: the best sort of Class.new
.
So that’s how you might see Class.new
used in the wild. As with any metaprogramming or construct labeled “Advanced (!)”, the main thing to keep in mind, when you use it or when you set upon refactoring an existing usage, is whether it is pulling its conceptual weight. If there’s a simpler way to use it, do that instead.
But sometimes a nail is, in fact, a nail.
Why metaprogram when you can program?
When I sought to learn Ruby, it was for three reasons. I’d heard of this cool thing called blocks, and that they had a lot of great use cases. I read there was this thing called metaprogramming and it was easier and more practical than learning Lisp. Plus, I knew several smart, nice people who were doing Ruby so it was probably a good thing to pay attention to. As it turns out, I will never go back to a language without the first and last. I can’t live without blocks, and I can’t live without smart, kind, fun people.
Metaprogramming requires a little more nuance. I understand metaprogramming well enough to get clever with it, and I understand it well enough to mostly understand what other people’s metaprogramming does. I still struggle with the nomenclature (eigenclass, metaclass, class Class?) and I often fall back to trial and error or brute-force tinkering to get things working.
On the other hand, I think I’ve come far enough that I can start to smell out when metaprogramming is done in good taste. See, every language has a feature that is terribly abused because it’s the cool, clever thing in the language: operator overloading in Scala, monadic everything in Haskell, XML in Java, and metaprogramming in Ruby.
Adam’s Handy Guide to Metaprogramming
This guide won’t teach you how to metaprogram, but it will teach you when to metaprogram.
I want you to think twice the next time you reach for the metaprogramming hammer. It’s a great tool for building developer-friendly APIs, little languages, and using code as data. But often, it’s a step too far. Normal, everyday programming will do you just fine.
There are two principles at work here.
Don’t metaprogram when you can just program
Exhaust all your all tricks before you reach for metaprogramming. Use Ruby’s mixins and method delegation to compose a class. Dip into your Gang of Four book and see if there isn’t a pattern that solves your problem.
Lots of metaprogramming is in support of callback-oriented programming. Think “before”/”after”/”around” hooks. You can do this by defining extension points in the public API for your class and mixing other modules into the class that implement logic around those public methods.
Another common form is configuring an object or framework. Think about things that declare models, connections, or queries. Use method chaining to build or configure an object that acts as a parameter list for another method or object.
Use the weakest form of metaprogramming possible
Once you’ve exhausted your patterns and static Ruby tricks, it’s time to play a game: how little metaprogramming can you do and get the job done?
Various forms of metaprogramming are weaker or stronger than others. The weaker ones are harder to screw up and less likely to require a deep understanding of Ruby. The stronger ones have trade-offs that require careful application and possibly need a lot of explanation to newcomers to your codebase.
Now, I will present to you a partial ordering of metaprogramming forms, in order of weak to strong. We can bicker on their specific placement, but I’m pretty certain that the first one is far better to use frequently than the last.
- Blocks - I hesitate to call this a form of metaprogramming. But, it is sometimes abused, and it is sometimes smart to use blocks instead of tricks further down this list. That said, if you find yourself needing more than one block parameter to a method, you should consider a parameter object that holds those blocks instead.
- Dynamic message send on a static object - You set a symbol on an object and later it will send that symbol as a method selector to an object that doesn’t change at runtime. This is weak because the only thing that varies is the method that gets called. On the other hand, you could have just used a block.
- Dynamic message send on a dynamic object - You set a symbol and a receiver object, at some point they are combined into a method call. This is stronger than the previous form because you’ve got two points of variability, which means two things to hunt down and two more things to hold in your brain.
Class.new
- I love this method so much. But, it’s a source of potential hurt when trying to understand a new piece of code. Classes magically poofing into existence at runtime makes code harder to read and navigate with simple tools. At the very least, have the civility to assign classes created this way to a constant so they feel like a normal class. Downsides, err, aside, I love this method so much, having it around is way better than not.define_method
- I like this method a lot too. Again, it’s way better to have it around than not. It’s got two modes of use, one gnarly and one not-so-bad. If you look at how its used in Rails, you’ll see a lot of instances where its passed a string of code, sometimes with interpolations inside said string. This is the gnarly form; unfortunately, it’s also faster on MRI and maybe other runtimes. There is another form, where you pass a block todefine_method
and the block becomes the body of the newly defined method. This one is far easier to read. Don’t even ask me the differences in how variables are bound in that block; Evan Phoenix and Wilson Bilkovich tried to explain it to me once and I just stared at them like a yokel.class_eval
- We’re getting into the big guns of metaprogramming now. The trick withclass_eval
is that its tricky to understand exactly which class (the metaclass or the class itself) the parameters toclass_eval
apply to. The upside is that’s mostly a write-time problem. It’s easy to look at code that usesclass_eval
and figure out what it intends to do. Just don’t put that stuff in front of me in an interview and expect me to tell you where the methods land without typing the damn thing into IRB.instance_eval
- Same tricks asclass_eval
. This may have simpler semantics, but I always find myself falling back to tinkering with IRB, your mileage may vary. The one really tricky thing you can do withinstance_eval
(and theclass <<some_obj
trick) is put methods on specific instances of an object. Another thing that’s better to have around than not, but always gives me pause when I see it or think I should use it.method_missing
- Behold, the easiest form of metaprogramming to grasp and thus the most widely abused. Don’t feel like typing out methods to delegate or want to build an API that’s easy to use but impossible to document?method_missing
that stuff! Builder objects are a legitimate use ofmethod_missing
. Everything else requires deep zen to justify. Remember: friends don’t let friends write objects that indiscriminately swallow messages.eval
- You almost certainly don’t need this; almost everything else is better off as a weaker form of metaprogramming. If I see this, I expect that you’re doing something really, really clever and therefore have a well-written justification and a note from your parents.
Bonus principle!
At some point you will accidentally type “meatprogram” instead of “metaprogram”. Cherish that moment!
It’s OK to write a few more lines of code if they’re simple, concise, and easy to test. Use delegation, decorators, adapters, etc. before you metaprogram. Exhaust your GoF tricks. Read up on SOLID principles and understand how they change how you program and give you much of the flexibility that metaprogramming provides without all the trickery. When you do resort to trickery, use the simplest trickery you can. Document it, test it, and have someone review it.
When it comes to metaprogramming, it’s not about how much of the language you use. It’s about what the next person to see the code whispers under their breath. Don’t let your present self make future enemies.
Cassandra at Gowalla
Over the past year, I’ve done a lot of work making Cassandra part of Gowalla’s multi-prong database strategy. I recently spoke at Austin on Rails on this topic, doing a sort of retrospective on our adoption of Cassandra and what I learned in the process. You can check out the slide deck, or if you’re a database nerd like me, dig into the really nerdy details below.
Why does Gowalla use Cassandra?
We have a few motivations for using Cassandra at Gowalla. First off, it’s become out database of choice for applications with relatively fixed query patterns that, for us to succeed, need to handle a rapidly growing dataset. Cassandra’s read and write paths are optimized for these kinds of applications. It’s good at keeping the hot subset of a database in memory while keeping queries that require hitting disk pretty quick too.
Cassandra is also great for time-oriented applications. Any time we need to fetch data based primarily on some sort of timestamp, Cassandra is a great fit. It’s a bit unique in this regard, and that’s one of the main reasons I’m so interested in Cassandra.
Cassandra is a Dynamo-style database, which yields some nice operational aspects. If a node goes down over night, we don’t take an availability hit; the ops people can sleep through the night and fix it later. The Cassandra developers have also done a great job of eliminating all the cases where one need to an entire Cassandra cluster at one time, resulting in downtime.
When does Gowalla not use Cassandra?
I don’t think Cassandra is all that great for iterating on prototypes. When you’re not sure what your data or queries will end up looking like, it’s hard to build a schema that works well with Cassandra. You’re also unlikely to need the strengths that a distributed, column-oriented database offers at that stage. Plus, there aren’t any options for outsourced Cassandra right now, and early-stage applications/businesses rarely want to devote expertise to hosting a database.
Applications that don’t grow data quickly, or can fit their entire dataset in memory on a pair of machines doesn’t play to Cassandra’s strengths either. Given that you can get a machine with a few dozen gigabytes of memory for the cost of rent in the valley, sometimes it does pay out to scale vertically instead of horizontally as Cassandra encourages.
Cassandra applications at Gowalla
We have a handful of applications going that use Cassandra:
- Audit: Stores ActiveRecord change data to Cassandra. This was our training-wheels trial project where we experimented with Cassandra to see if it was useful for us. It was incrementally deployed using rollout and degrade. Worked well, so we proceeded.
- Chronologic: This is an activity feed service, storing the events and timelines in Cassandra. It started off life as a secondary index cache, but became a system of record in our latest release. It works great operationally, but the query/access model didn’t always jive with how web developers expected to access data.
- Active stories: We store “joinability” data for users at a spot so we can pre-merge stories and prevent proliferation of a bunch of boring, one-person stories. This was built by Brad Fults and integrated in one pull request a few weeks before launch. The nice thing about this one was that it was able to take advantage of Cassandra’s column expiration and fit really nicely into Cassandra’s data model.
- Social graph caches: We store friend data from other systems so we can quickly list/suggest friends when they connect their Gowalla profile to Facebook or Twitter. This started life on Redis, but the data was growing too quickly. We decoupled it from Redis and wrote a Cassandra backend over a few days. We incrementally deployed it and got Redis out of the picture within two weeks. That was pretty cool.
What worked?
- Stable at launch. A couple weeks before launch, I switched to “devops” mode. Along with Adam McManus, our ops guy, we focused on tuning Cassandra for better read performance and to resolve stability problems. We ended up bringing in a DataStax consultant to help us verify we were doing the right things with Cassandra. The result of this was that, at launch, our cluster held up well and we didn’t have any Cassandra-related problems.
- Easy to tune. I found Cassandra interesting and easy to tune. There is a little bit of upfront research in figuring out exactly what the knobs mean and what the reporting tools are saying. Once I figured that out, it was easy to iteratively tweak things and see if they were having a positive effect on the performance of our cluster.
- Time-series or semi-granular data. Of the databases I’ve tinkered with, Cassandra stands out in terms of modeling time-related data. If an application is going to pull data in time-order most of the time, Cassandra is a really great place to start. I also like the column-oriented data model. It’s great if you mostly need a key-value store, but occasionally need a key-key-value store.
What would we do differently next time?
- Developer localhost setups. We started using Cassandra in the 0.6 release, when it was a giant pain to set up locally (XML configs). It’s better now, but I should have put more energy into helping the other developers on our team getting Cassandra up and working properly. If I were to do it again, I’d probably look into leaning on the install scripts the cassandra gem includes, rather than Homebrew and a myriad of scripts to hack the Cassandra config.
- Eventual consistency and magic database voodoo. Cassandra does not work like MySQL or Redis. It has different design constraints and a relatively unique approach to those constraints. In advocating and explaining Cassandra, I think I pitched it too much as a database nerd and not enough as “here’s a great tool that can help us solve some problems”. I hope that CQL makes it easier to put Cassandra in front of non-database nerds in terms that they can easily relate to and immediately find productivity.
- Rigid query model. Once we got several million rows of data into Cassandra, we found it difficult to quickly change how we represented that data. It became a game of “how can we incrementally rejigger this data structure to have these other properties we just figured out we want?” I’m not sure that’s a game you can easily win at with Cassandra. I’d love to read more about building evolvable data structures in Cassandra and see how people are dealing with high-volume, evolving data.
Things we’ll try differently next time
- More like a hash, less like a database. Having developed a database-like thing, I have come to the conclusion that developers really don’t like them very much. ActiveRecord was hugely successful because it was so much more effective than anything previous to it that tried to make databases just go away. The closer a database is to one of the native data structures in the host language, the better. If it’s not a native data structure, it should be something they can create in a REPL and then say “magically save this for me!”
- Better tools and automation. That said, every abstraction leaks. Once it does, developers want simple and useful tools that let them figure out what’s going on, what the data really looks like, tinker with it, and get back to their abstracted world as quickly as possible. This starts with tools for setting up the database, continues through interacting with it (database REPL), and for operating it (logging, introspection, etc.) Cassandra does pretty well with these tools, but they’re still a bit nerdy.
- More indexes. We didn’t design our applications to use secondary indexes (a great feature) because they didn’t exist just yet. I should have spent more time integrating this into the design of our services. We got bit a lot towards the end of our release cycle because we were building all of our indexes in the application and hadn’t designed for reverse indexes. We also designed a rather coarse schema, which further complicated ad-hoc querying, which is another thing non-database-nerds love.
What’s that mean for me?
Cassandra has a lot of strengths. Once you get to a scale where you’re running data through a replicated database setup and some kind of key-value database or cache, it makes sense to start thinking about Cassandra. There are a lot of things you can do with it, and it lets you cheat in interesting ways. Take some extra time to think about the data model you build and how you’ll change it in the future. Like anything else, build tools for yourself to automate the things you do repeatedly.
Don’t use it because you read a blog post about it. Use it because it fits your application and your team is excited about using it.