2011
When to Class.new
In response to Why metaprogram when you can program?, an astute reader asked for an example of when you would want to use Class.new
in Ruby. It’s a rarely needed method, but really fun when faced with a tasteful application. Herein, a couple ways I’ve used it and an example from the wild.
Dead-simple doubles
In my opinion, the most “wholly legitimate” frequent application of Class.new
is in test code. It’s a great tool for creating test doubles, fakes, stubs, and mocks without the weight of pulling in a framework. To wit:
TinyFake = Class.new do
def slow_operation
"SO FAST"
end
def critical_operation
@critical = true
end
def critical_called?
@critical
end
end
tiny_fake = TinyFake.new
tiny_fake.slow_operation
tiny_fake.critical_operation
tiny_fake.critical_called? == true
TinyFake
functions as a fake and as a mock. We can call a dummy implementation of slow_operation
without worrying about the snappiness of our suite. We can verify that a method was called in the verification section of our test method. Normally you would only do one of these things at a time, but this shows how easy it is to roll your own doubles, fakes, stubs, or mocks.
The thing I like about this approach over defining classes inside a test file or class is that it’s all scoped inside the method. We can assign the class to a local and keep the context for each test method small. This approach is also really great for testing mixins and parent classes; define a new class, add the desired functionality, and test to suit.
DSL internals
Rack and Resque are two examples of libraries that expose an API based largely on writing a class with a specific entry point. Rack middlewares are objects with a call
method that generates a response based on an environment hash and any other middlewares that are contained within the middleware. Resque expects the classes that work through enqueued jobs define a perform
method.
In practice, putting these methods in a class is the way to go. But, hypothetically, we are way too lazy to type class
/end
, or perhaps we want to wrap a bunch of standard instrumentation and logging around a simple chunk of code. In that case, we can write ourself a little shortcut:
module TinyDSL
def self.performer(&block)
c = Class.new
c.class_eval { define_method(:perform, block) }
c
end
end
Thingy = TinyDSL.performer { |*args| p args }
Thingy.new.perform("one", 2, :three)
This little DSL gives us a shortcut for defining classes that implement whatever contract is expected of performer
objects. From this humble beginning, we could mix in modules to add functionality around the performer
, or we could pass a parent class to Class.new
to make the generated class inherit from another class.
That leads us to the sort-of shortcoming of this particular application of Class.new
: if the unique function of performer
is to wrap a class around a method (for instance, as part of an API exported by another library), why not just subclass or mixin that functionality in the client application? This is the question you have to ask yourself when using Class.new
in this way and decide if the metaprogramming is pulling its weight.
How Class.new is used in Sinatra
Sinatra is a little language for writing web applications. The language specifies how HTTP requests are mapped to blocks of Ruby. Originally, you wrote your Sinatra applications like so:
get '/' { [200, {"Content-Type" => "text/plain"}, "Hello, world!"] }
Right before Sinatra 1.0, the team added a cleaner way to to build and compose applications as Ruby classes. It looks the same, except it happens inside the scope of a class instead of the global scope:
class SomeApp < Sinatra::Base
get '/' { [200, {"Content-Type" => "text/plain"}, "Hello, world!"] }
end
It turns out that the former is implemented in terms of the latter. When you use the old, global-level DSL, it creates a new class via Class.new(Sinatra::Base)
and then class_eval
s a block into it to define the routes. Short, clever, effective: the best sort of Class.new
.
So that’s how you might see Class.new
used in the wild. As with any metaprogramming or construct labeled “Advanced (!)”, the main thing to keep in mind, when you use it or when you set upon refactoring an existing usage, is whether it is pulling its conceptual weight. If there’s a simpler way to use it, do that instead.
But sometimes a nail is, in fact, a nail.
Four essential topics of 2011, in charts
The Year In 4 Charts: Planet Money does an excellent job collecting four economic charts (themselves chosen from three collections of best-of charts). I’m a dilettante as far as economics and economics go, but these charts do a great job of rolling up what seemed to have been the essential stories of the year.

A picture can nullify a thousand talking points, no?
Making a little musical thing
After software development, music is probably the thing I know the most about. My brain is full of history, trivia, and a modest bit of practical knowledge on how to read notation and make music come out. That said, I haven’t really practiced music in several years. I’ve been busy nerding out on other things, and I’ve grown a bit lazy. Too lazy to find people to play with, too lazy for scales, too lazy to even tune a stringed instrument. Very, very lazy.
Long story short, I’ve been wanting to get back into music lately, but I want to learn something new. Something entirely mysterious to me. Given my recent fascination with hip-hop, I’m eager to try my hand at making the beats that form the musical basis of the form.
There are a lot of priors to cover (tinkering with various sequencers, drum machines, and synthesizers; steeping myself in sample culture; listening to the actual music and understanding its history), but I just made a short, mediocre little beat and put it on the internet. Herein, I reflect on making that little musical thing:
- I’m sure that, if I get serious about this, I’ll need real software like Ableton or Logic. But for my tinkering, it turns out GarageBand is sufficient. The included software instruments aren’t amazing or even idiomatic samples (no TR808, no “Apache” break included), but with a little bit of tinkering, they produce results.
- Laying a drum track down that is little more than a fancy click track helps to get started. GarageBand has a handy feature where you can define the a number of bars as a loop and then record multiple takes, review them, and discard the takes you don’t want.
- What an app lacks in samples you can make up in effects. Throwing a heavy dose of echo and a ridiculous helping of reverb made an otherwise pedestrian drum track way more interesting.
- I didn’t go into this with anything in my head that I wanted to make real. For the drum track, I ended up with a pretty typical beat. A little quantization made it end up sound better and more interesting than it really is. This process, manual input with some computer-assisted tweaking, produced way better results than the iOS drum machines I’ve used in the past.
- Tapping out the bass-line took a little more time than the drums. I didn’t have anything “standard” in my head, so I doodled a bit. This is where the “takes” gizmo in GarageBand came in really handy. Record a bunch of things, decide which one is most interesting, clean it up a little, throw an effect or two on it to make it more interesting, on to the next track.
- In retrospect, lots of effects is maybe a crutch. I don’t have enough taste yet to tell.
- With the drums and bass down, it’s time to adorn the track with a melody or interesting hit for effect. I added one subtle thing, but couldn’t think of anything I liked that was worth making prominent. If I were actually trying to use this beat for something, I’d keep digging. But for my first or second beat, it’s not a big deal.
I wanted to jot down my thoughts because I’d like to write more about making and understanding music, but also because I keep meaning to write down what I find challenging and interesting as I start from a “beginner’s mind” in some craft or skill. And so I did.
You’re six hundred words into this thing now, so I’ll reward you, if we could call it a reward, with “An Beat”.
Crafting lightsabers, uptime the systems, a little Clojure
Herein, some great technical writings from the past week or two.
Crafting your editor lightsaber
Vim: revisited, on how to approach Vim and build your very own config from first principles. My personal take on editor/shell configurations is that its way better to have someone else maintain them. Find something like Janus or oh-my-zsh, tweak the things it includes to work for you, and get back to doing what you do. That said, I’m increasingly tempted to craft my own config, if only to promote the fullness and shine of my neck beard.
Uptime all the systems
Making the Netflix API More Resilient lays out the system of circuit breakers, dashboards, and automatons Netflix uses to proactively maintain API reliability in the face of external failures. Great ideas anyone maintaining a service that needs to stay online.
List All of the Riak Keys, on the trickiness of SELECT * FROM all_the_things
-style queries in Riak, or any distributed database, really. The short story is that these kinds of queries are impractical and not something you can do in production. The longer story is that there are ways to work around it with clever use of indexes and data structures. Make sure you check out the Riak Handbook from the same author.
A little bit of Clojure
Introducing Knockbox introduces a Clojure library for dealing with conflict resolution in data stored in distributed databases like Riak. If you’re working with any database that leaves you wondering what to do when two clients get in a race condition, these are the droids you’re looking for. I would have paid pretty good money to have known about this a few months ago.
Clojure’s Mini-languages is a great teaser on Clojure if, like me, you’ve tinkered with it before but are coming back to it. This is particularly useful if you’ve seen some Lisp or Scheme before, but are slightly confused by what’s going on with all the non-paren characters that appear in your typical Clojure program. Having taken a recent dive into the JVM ecosystem, I have to say there’s a lot to like in Clojure. If your brain understands static types but thinks better in dynamic types (mine does), give this a look.
I occasionally post links with shorter comments, if you’d like a slightly more-frequent dose of what you just read.
Quality in the inner loop
In software, this means that every piece of code and UI matters on its own, as it’s being crafted. Quality takes on more of a verb-like nature under this conception: to create quality is to care deeply about each bit of creation as it is added and to strive to improve one’s ability to translate that care into lasting skills and appreciable results.
When I wrote on “quality” a few months ago, I was thinking of it as an attribute one would use to describe the outer loop of a project. Do a bunch of work, locate areas that need more quality, but a few touches on those areas or note improvements for the next iteration, and ship it.
But what Brad is describing is putting quality into the inner loop. Work attains “the quality” as it is created, rather than as a secondary editing or review step. Little is done without considering its quality.
I’m extrapolating a bit from the letter of what Brad has written here, but that’s because I’ve been lucky enough to work with him. Indeed Brad’s work is of consistently high quality. Hopefully he’ll write more specifics about how quality code is created in the future (hint, Brad), and how much it relates to Christopher Alexander’s “quality without a name”.
Why metaprogram when you can program?
When I sought to learn Ruby, it was for three reasons. I’d heard of this cool thing called blocks, and that they had a lot of great use cases. I read there was this thing called metaprogramming and it was easier and more practical than learning Lisp. Plus, I knew several smart, nice people who were doing Ruby so it was probably a good thing to pay attention to. As it turns out, I will never go back to a language without the first and last. I can’t live without blocks, and I can’t live without smart, kind, fun people.
Metaprogramming requires a little more nuance. I understand metaprogramming well enough to get clever with it, and I understand it well enough to mostly understand what other people’s metaprogramming does. I still struggle with the nomenclature (eigenclass, metaclass, class Class?) and I often fall back to trial and error or brute-force tinkering to get things working.
On the other hand, I think I’ve come far enough that I can start to smell out when metaprogramming is done in good taste. See, every language has a feature that is terribly abused because it’s the cool, clever thing in the language: operator overloading in Scala, monadic everything in Haskell, XML in Java, and metaprogramming in Ruby.
Adam’s Handy Guide to Metaprogramming
This guide won’t teach you how to metaprogram, but it will teach you when to metaprogram.
I want you to think twice the next time you reach for the metaprogramming hammer. It’s a great tool for building developer-friendly APIs, little languages, and using code as data. But often, it’s a step too far. Normal, everyday programming will do you just fine.
There are two principles at work here.
Don’t metaprogram when you can just program
Exhaust all your all tricks before you reach for metaprogramming. Use Ruby’s mixins and method delegation to compose a class. Dip into your Gang of Four book and see if there isn’t a pattern that solves your problem.
Lots of metaprogramming is in support of callback-oriented programming. Think “before”/”after”/”around” hooks. You can do this by defining extension points in the public API for your class and mixing other modules into the class that implement logic around those public methods.
Another common form is configuring an object or framework. Think about things that declare models, connections, or queries. Use method chaining to build or configure an object that acts as a parameter list for another method or object.
Use the weakest form of metaprogramming possible
Once you’ve exhausted your patterns and static Ruby tricks, it’s time to play a game: how little metaprogramming can you do and get the job done?
Various forms of metaprogramming are weaker or stronger than others. The weaker ones are harder to screw up and less likely to require a deep understanding of Ruby. The stronger ones have trade-offs that require careful application and possibly need a lot of explanation to newcomers to your codebase.
Now, I will present to you a partial ordering of metaprogramming forms, in order of weak to strong. We can bicker on their specific placement, but I’m pretty certain that the first one is far better to use frequently than the last.
- Blocks - I hesitate to call this a form of metaprogramming. But, it is sometimes abused, and it is sometimes smart to use blocks instead of tricks further down this list. That said, if you find yourself needing more than one block parameter to a method, you should consider a parameter object that holds those blocks instead.
- Dynamic message send on a static object - You set a symbol on an object and later it will send that symbol as a method selector to an object that doesn’t change at runtime. This is weak because the only thing that varies is the method that gets called. On the other hand, you could have just used a block.
- Dynamic message send on a dynamic object - You set a symbol and a receiver object, at some point they are combined into a method call. This is stronger than the previous form because you’ve got two points of variability, which means two things to hunt down and two more things to hold in your brain.
Class.new
- I love this method so much. But, it’s a source of potential hurt when trying to understand a new piece of code. Classes magically poofing into existence at runtime makes code harder to read and navigate with simple tools. At the very least, have the civility to assign classes created this way to a constant so they feel like a normal class. Downsides, err, aside, I love this method so much, having it around is way better than not.define_method
- I like this method a lot too. Again, it’s way better to have it around than not. It’s got two modes of use, one gnarly and one not-so-bad. If you look at how its used in Rails, you’ll see a lot of instances where its passed a string of code, sometimes with interpolations inside said string. This is the gnarly form; unfortunately, it’s also faster on MRI and maybe other runtimes. There is another form, where you pass a block todefine_method
and the block becomes the body of the newly defined method. This one is far easier to read. Don’t even ask me the differences in how variables are bound in that block; Evan Phoenix and Wilson Bilkovich tried to explain it to me once and I just stared at them like a yokel.class_eval
- We’re getting into the big guns of metaprogramming now. The trick withclass_eval
is that its tricky to understand exactly which class (the metaclass or the class itself) the parameters toclass_eval
apply to. The upside is that’s mostly a write-time problem. It’s easy to look at code that usesclass_eval
and figure out what it intends to do. Just don’t put that stuff in front of me in an interview and expect me to tell you where the methods land without typing the damn thing into IRB.instance_eval
- Same tricks asclass_eval
. This may have simpler semantics, but I always find myself falling back to tinkering with IRB, your mileage may vary. The one really tricky thing you can do withinstance_eval
(and theclass <<some_obj
trick) is put methods on specific instances of an object. Another thing that’s better to have around than not, but always gives me pause when I see it or think I should use it.method_missing
- Behold, the easiest form of metaprogramming to grasp and thus the most widely abused. Don’t feel like typing out methods to delegate or want to build an API that’s easy to use but impossible to document?method_missing
that stuff! Builder objects are a legitimate use ofmethod_missing
. Everything else requires deep zen to justify. Remember: friends don’t let friends write objects that indiscriminately swallow messages.eval
- You almost certainly don’t need this; almost everything else is better off as a weaker form of metaprogramming. If I see this, I expect that you’re doing something really, really clever and therefore have a well-written justification and a note from your parents.
Bonus principle!
At some point you will accidentally type “meatprogram” instead of “metaprogram”. Cherish that moment!
It’s OK to write a few more lines of code if they’re simple, concise, and easy to test. Use delegation, decorators, adapters, etc. before you metaprogram. Exhaust your GoF tricks. Read up on SOLID principles and understand how they change how you program and give you much of the flexibility that metaprogramming provides without all the trickery. When you do resort to trickery, use the simplest trickery you can. Document it, test it, and have someone review it.
When it comes to metaprogramming, it’s not about how much of the language you use. It’s about what the next person to see the code whispers under their breath. Don’t let your present self make future enemies.
Modern Von Neumann machines, how do they work?
Modern Microprocessors - A 90 Minute Guide!. If you didn't find a peculiar joy in computer architecture classes or the canonical tomes on the topic by Patterson and Hennessey, this is the thing for you. It's a great dive into how modern processors work, what the design challenges and trade-offs are, and what you need to know as a software developer.
Totally unrelated: when I interned at Texas Instruments, my last project was writing tests for a pre-silicon DSP. Because there were no test devices, I had to run my code against a simulator. It simulated several million gates of logic and output the result of my program as the wires that come out of the processor registers. This was fun, again in a way peculiar to my interest, at the time, in being a hardware designer/driver hacker. Let me tell you, every debugging tool you will ever see is better than inspecting hex values coming out of registers.
Anyway, these programs ran super slow, each run took about an hour. One day I did the math and figured out the simulator was basically running at 100 hz. Not kilohertz or megahertz. One hundred hertz. So, yeah. In the snow, uphills, both way.
Changing legacy code, made less painful
Rescuing Legacy Code by Extracting Pure Functions. Come across strange, pre-existing code. Decide you need to change it. Follow the pattern described herein. Apply TDD afterwards. I so wish someone had shown me this technique years and years ago. Also, Composed Method (from Smalltalk Best Practice Patterns) is so great, I can't even put it into words.
Cassandra at Gowalla
Over the past year, I’ve done a lot of work making Cassandra part of Gowalla’s multi-prong database strategy. I recently spoke at Austin on Rails on this topic, doing a sort of retrospective on our adoption of Cassandra and what I learned in the process. You can check out the slide deck, or if you’re a database nerd like me, dig into the really nerdy details below.
Why does Gowalla use Cassandra?
We have a few motivations for using Cassandra at Gowalla. First off, it’s become out database of choice for applications with relatively fixed query patterns that, for us to succeed, need to handle a rapidly growing dataset. Cassandra’s read and write paths are optimized for these kinds of applications. It’s good at keeping the hot subset of a database in memory while keeping queries that require hitting disk pretty quick too.
Cassandra is also great for time-oriented applications. Any time we need to fetch data based primarily on some sort of timestamp, Cassandra is a great fit. It’s a bit unique in this regard, and that’s one of the main reasons I’m so interested in Cassandra.
Cassandra is a Dynamo-style database, which yields some nice operational aspects. If a node goes down over night, we don’t take an availability hit; the ops people can sleep through the night and fix it later. The Cassandra developers have also done a great job of eliminating all the cases where one need to an entire Cassandra cluster at one time, resulting in downtime.
When does Gowalla not use Cassandra?
I don’t think Cassandra is all that great for iterating on prototypes. When you’re not sure what your data or queries will end up looking like, it’s hard to build a schema that works well with Cassandra. You’re also unlikely to need the strengths that a distributed, column-oriented database offers at that stage. Plus, there aren’t any options for outsourced Cassandra right now, and early-stage applications/businesses rarely want to devote expertise to hosting a database.
Applications that don’t grow data quickly, or can fit their entire dataset in memory on a pair of machines doesn’t play to Cassandra’s strengths either. Given that you can get a machine with a few dozen gigabytes of memory for the cost of rent in the valley, sometimes it does pay out to scale vertically instead of horizontally as Cassandra encourages.
Cassandra applications at Gowalla
We have a handful of applications going that use Cassandra:
- Audit: Stores ActiveRecord change data to Cassandra. This was our training-wheels trial project where we experimented with Cassandra to see if it was useful for us. It was incrementally deployed using rollout and degrade. Worked well, so we proceeded.
- Chronologic: This is an activity feed service, storing the events and timelines in Cassandra. It started off life as a secondary index cache, but became a system of record in our latest release. It works great operationally, but the query/access model didn’t always jive with how web developers expected to access data.
- Active stories: We store “joinability” data for users at a spot so we can pre-merge stories and prevent proliferation of a bunch of boring, one-person stories. This was built by Brad Fults and integrated in one pull request a few weeks before launch. The nice thing about this one was that it was able to take advantage of Cassandra’s column expiration and fit really nicely into Cassandra’s data model.
- Social graph caches: We store friend data from other systems so we can quickly list/suggest friends when they connect their Gowalla profile to Facebook or Twitter. This started life on Redis, but the data was growing too quickly. We decoupled it from Redis and wrote a Cassandra backend over a few days. We incrementally deployed it and got Redis out of the picture within two weeks. That was pretty cool.
What worked?
- Stable at launch. A couple weeks before launch, I switched to “devops” mode. Along with Adam McManus, our ops guy, we focused on tuning Cassandra for better read performance and to resolve stability problems. We ended up bringing in a DataStax consultant to help us verify we were doing the right things with Cassandra. The result of this was that, at launch, our cluster held up well and we didn’t have any Cassandra-related problems.
- Easy to tune. I found Cassandra interesting and easy to tune. There is a little bit of upfront research in figuring out exactly what the knobs mean and what the reporting tools are saying. Once I figured that out, it was easy to iteratively tweak things and see if they were having a positive effect on the performance of our cluster.
- Time-series or semi-granular data. Of the databases I’ve tinkered with, Cassandra stands out in terms of modeling time-related data. If an application is going to pull data in time-order most of the time, Cassandra is a really great place to start. I also like the column-oriented data model. It’s great if you mostly need a key-value store, but occasionally need a key-key-value store.
What would we do differently next time?
- Developer localhost setups. We started using Cassandra in the 0.6 release, when it was a giant pain to set up locally (XML configs). It’s better now, but I should have put more energy into helping the other developers on our team getting Cassandra up and working properly. If I were to do it again, I’d probably look into leaning on the install scripts the cassandra gem includes, rather than Homebrew and a myriad of scripts to hack the Cassandra config.
- Eventual consistency and magic database voodoo. Cassandra does not work like MySQL or Redis. It has different design constraints and a relatively unique approach to those constraints. In advocating and explaining Cassandra, I think I pitched it too much as a database nerd and not enough as “here’s a great tool that can help us solve some problems”. I hope that CQL makes it easier to put Cassandra in front of non-database nerds in terms that they can easily relate to and immediately find productivity.
- Rigid query model. Once we got several million rows of data into Cassandra, we found it difficult to quickly change how we represented that data. It became a game of “how can we incrementally rejigger this data structure to have these other properties we just figured out we want?” I’m not sure that’s a game you can easily win at with Cassandra. I’d love to read more about building evolvable data structures in Cassandra and see how people are dealing with high-volume, evolving data.
Things we’ll try differently next time
- More like a hash, less like a database. Having developed a database-like thing, I have come to the conclusion that developers really don’t like them very much. ActiveRecord was hugely successful because it was so much more effective than anything previous to it that tried to make databases just go away. The closer a database is to one of the native data structures in the host language, the better. If it’s not a native data structure, it should be something they can create in a REPL and then say “magically save this for me!”
- Better tools and automation. That said, every abstraction leaks. Once it does, developers want simple and useful tools that let them figure out what’s going on, what the data really looks like, tinker with it, and get back to their abstracted world as quickly as possible. This starts with tools for setting up the database, continues through interacting with it (database REPL), and for operating it (logging, introspection, etc.) Cassandra does pretty well with these tools, but they’re still a bit nerdy.
- More indexes. We didn’t design our applications to use secondary indexes (a great feature) because they didn’t exist just yet. I should have spent more time integrating this into the design of our services. We got bit a lot towards the end of our release cycle because we were building all of our indexes in the application and hadn’t designed for reverse indexes. We also designed a rather coarse schema, which further complicated ad-hoc querying, which is another thing non-database-nerds love.
What’s that mean for me?
Cassandra has a lot of strengths. Once you get to a scale where you’re running data through a replicated database setup and some kind of key-value database or cache, it makes sense to start thinking about Cassandra. There are a lot of things you can do with it, and it lets you cheat in interesting ways. Take some extra time to think about the data model you build and how you’ll change it in the future. Like anything else, build tools for yourself to automate the things you do repeatedly.
Don’t use it because you read a blog post about it. Use it because it fits your application and your team is excited about using it.
Pass interference: can't live with it, can't live without it.
Bill Barnwell on revamping defensive penalties. Pass interference is tough business in the NFL. It's one of the easiest calls to get wrong on the field (besides the myriad of missed holding calls), but the easiest to fix with a slow-motion camera. It's too easy for both sides to game it as well. There's some good ideas in here, but I think just making pass interference calls and non-calls is a simple first step.
The pitfalls of growing a team
Premature Ramp-up, Martin Fowler on the perils of building up a development team too quickly: loss of code cohesion, breakdown of communication, plus the business costs of on-boarding. The problem I'm more concerned with, when growing a software team, is maintaining culture.
Adding a new person to a team is a process of integrating the new person’s unique good qualities to the team’s existing culture. It’s critical to use their prior experiences to clean up the sharp edges of the existing team practice without accidentally integrating new sharp edges. It’s a careful balancing act of taking advantage of the beginner’s mind and cultural indoctrination. Both sides have to give and take.
If you grow too quickly, it’s very easy for this balancing act to get, well, out of balance. The new people are only indoctrinated and the team doesn’t learn, or the new people don’t understand the team and go about doing whatever they felt was successful at their previous gig.
Its common to focus on the difficulty of recruiting a team, but finding a culture match and growing that culture is equally, if not more, challenging.
A food/software change metaphor
Are You Changing the Menu or the Food? Incremental change, the food metaphor edition. It's about software and startups. But food too. Think "software" when he says "food". Just read it, OK?
Your frienemy, the ORM
When modeling how our domain objects map to what is stored in a database, an object-relational mapper often comes into the picture. And then, the angst begins. Bad queries are generated, weird object models evolve, junk-drawer objects emerge, cohesion goes down and coupling goes up.
It’s not that ORMs are a smell. They are genuinely useful things that make it easier for developers to go from an idea to a working, deployable prototype. But its easy to fall into the habit of treating them as a top-level concern in our applications.
Maybe that is the problem!
What if our domain models weren’t built out from the ORM? Some have suggested treating the ORM, and the persistence of our objects themselves, as mere implementation details. What might that look like?
Hide the ORM like you’re ashamed of it
Recently, I had the need to build an API for logging the progress of a data migration as we ran it over many million records, spitting out several new records for every input record. Said log ended up living in PostgreSQL1.
Visions of decoupled grandeur in my head, I decided that my API should be not leak its databaseness out to the user. I started off trying to make the API talk directly to the PostgreSQL driver, but that I wasn’t making much progress down that road. Further, I found myself reinventing things I would get for free in ActiveRecord-land.
Instead, I took a principled plunge. I surrendered to using an AR model, but I kept it tucked away inside the class for my API. My API makes several calls into the AR model, but it never leaks that ARness out to users of the API.
I liked how this ended up. I was free to use AR’s functionality within the inner model. I can vary the API and the AR model independently. I can stub out, or completely replace the model implementation. It feels like I’m doing OO right.
Enough of the suspense, let’s see a hypothetical example
User model. Everyone has a name, a city, and a URL. I can all do this in my sleep, right?
I start with by defining an API. Note that all it knows is that there is some object called Model
that it delegates to.
class User
attr_accessor :name, :city, :url
def self.fetch(key)
Model.fetch(key)
end
def self.fetch_by_city(key)
Model.fetch_by_city(key)
end
def save
Model.create(name, city, url)
end
def ==(other)
name == other.name && city == other.city && url == other.url
end
end
That’s a pretty straight-forward Ruby class, eh? The RSpec examples for it aren’t elaborate either.
describe User do
let(:name) { "Shauna McFunky" }
let(:city) { "Chasteville" }
let(:url) { "http://mcfunky.com" }
let(:user) do
User.new.tap do |u|
u.name = name
u.city = city
u.url = url
end
end
it "has a name, city, and URL" do
user.name.should eq(name)
user.city.should eq(city)
user.url.should eq(url)
end
it "saves itself to a row" do
key = user.save
User.fetch(key).should eq(user)
end
it "supports lookup by city" do
user.save
User.fetch_by_city(user.city).should eq(user)
end
end
Not much coupling going on here either. Coding in a blog post is full of beautiful idealism, isn’t it?
“Needs more realism”, says the critic. Obliged:
class User::Model < ActiveRecord::Base
set_table_name :users
def self.create(name, city, url)
super(:name => name, :city => city, :url => url)
end
def self.fetch(key)
from_model(find(key))
end
def self.fetch_by_city(city)
from_model(where(:city => city).first)
end
def self.from_model(model)
User.new.tap do |u|
u.name = model.name
u.city = model.city
u.url = model.url
end
end
end
Here’s the first implementation of an actual access layer for my user model. It’s coupled to the actual user model by names, but it’s free to map those names to database tables, indexes, and queries as it sees fit. If I’m clever, I might write a shared example group for the behavior of whatever implements create
, fetch
, and fetch_by_city
in User::Model
, but I’ll leave that as an exercise to the reader.
To hook my model up when I run RSpec, I add a moderately involved before
hook:
before(:all) do
ActiveRecord::Base.establish_connection(
:adapter => 'sqlite3',
:database => ':memory:'
)
ActiveRecord::Schema.define do
create_table :users do |t|
t.string :name, :null => false
t.string :city, :null => false
t.string :url
end
end
end
As far as I know, this is about as simple as it gets to bootstrap ActiveRecord outside of a Rails test. So it goes.
Let’s fake that out
Now I’ve got a working implementation. Yay! However, it would be nice if I didn’t need all that ActiveRecord stuff when I’m running isolated, unit tests. Because my model and data access layer are decoupled, I can totally do that. Hold on to your pants:
require 'active_support/core_ext/class'
class User::Model
cattr_accessor :users
cattr_accessor :users_by_city
def self.init
self.users = {}
self.users_by_city = {}
end
def self.create(name, city, url)
key = Time.now.tv_sec
hsh = {:name => name, :city => city, :url => url}
users[key] = hsh
users_by_city[city] = hsh
key
end
def self.fetch(key)
attrs = users[key]
from_attrs(attrs)
end
def self.fetch_by_city(city)
attrs = users_by_city[city]
from_attrs(attrs)
end
def self.from_attrs(attrs)
User.new.tap do |u|
u.name = attrs[:name]
u.city = attrs[:city]
u.url = attrs[:url]
end
end
end
This “storage” layer is a bit more involved because I can’t lean on ActiveRecord to handle all the particulars for me. Specifically, I have to handle indexing the data in not one but two hashes. But, it fits on one screen and its in memory, so I get fast tests at not too much overhead.
This is a classic test fake. It’s not the real implementation of the object; it’s just enough for me to hack out tests that need to interact with the storage layer. It doesn’t tell me whether I’m doing anything wrong like a mock or stub might. It just gives me some behavior to collaborate with.
Switching my specs to use this fake is pretty darn easy. I just change my before
hook to this:
before { User::Model.init }
Life is good.
Now for some overkill
Time passes. Specs are written, code is implemented to pass them. The application grows. Life is good.
Then one day the ops guy wakes up, finds the site going crazy slow and see that there are a couple hundred million user in the system. That’s a lot of rows. We’re gonna need a bigger database.
Migrating millions of rows to a new database is a pretty big headache. Even if it’s fancy and distributed. But, it turns out changing our code doesn’t have to tax our brains so much. Say, for example, we chose Cassandra:
require 'cassandra/0.7'
require 'active_support/core_ext/class'
class User::Model
cattr_accessor :connection
cattr_accessor :cf
def self.create(name, city, url)
generate_key.tap do |k|
cols = {"name" => name, "city" => city, "url" => url}
connection.insert(cf, k, cols)
end
end
def self.generate_key
SimpleUUID::UUID.new.to_guid
end
def self.fetch(key)
cols = connection.get(cf, key)
from_columns(cols)
end
def self.fetch_by_city(city)
expression = connection.create_index_expression("city", city, "EQ")
index_clause = connection.create_index_clause([expression])
slices = connection.get_indexed_slices(cf, index_clause)
cols = hash_from_slices(slices).values.first
from_columns(cols)
end
def self.from_columns(cols)
User.new.tap do |u|
u.name = cols["name"]
u.city = cols["city"]
u.url = cols["url"]
end
end
def self.hash_from_slices(slices)
slices.inject({}) do |hsh, (k, columns)|
column_hash = columns.inject({}) do |inner, col|
column = col.column
inner.update(column.name => column.value)
end
hsh.update(k => column_hash)
end
end
end
Not nearly as simple as the ActiveRecord example. But sometimes it’s about making hard problems possible even if they’re not mindless retyping. In this case, I had to implement ID/key generation for myself (Cassandra doesn’t implement any of that). I also had to do some cleverness to generate an indexed query and then to convert the hashes that Cassandra returns into my User
model.
But hey, look! I changed the whole underlying database without worrying too much about mucking with my domain models. I can dig that. Further, none of my specs need to know about Cassandra. I do need to test the interaction between Cassandra and the rest of my stack in an integration test, but that’s generally true of any kind of isolated testing.
This has all happened before and it will all happen again
None of this is new. Data access layers have been a thing for a long time. Maybe institutional memory and/or scars have prevented us from bringing them over from Smalltalk, Java, or C#.
I’m just sayin’, as you think about how to tease your system apart into decoupled, cohesive, easy-to-test units, you should pause and consider the idea that pushing all your persistence needs down into an object you later delegate to can make your future self think highly of your present self.
This ended up being a big mistake. I could have saved myself some pain, and our ops team even more pain, if I’d done an honest back-of-the-napkin calculation and stepped back for a few minutes to figure out a better angle on storage. ↩