writings

October 9, 2011

Your frienemy, the ORM

When modeling how our domain objects map to what is stored in a database, an object-relational mapper often comes into the picture. And then, the angst begins. Bad queries are generated, weird object models evolve, junk-drawer objects emerge, cohesion goes down and coupling goes up.

It’s not that ORMs are a smell. They are genuinely useful things that make it easier for developers to go from an idea to a working, deployable prototype. But its easy to fall into the habit of treating them as a top-level concern in our applications.

Maybe that is the problem!

What if our domain models weren’t built out from the ORM? Some have suggested treating the ORM, and the persistence of our objects themselves, as mere implementation details. What might that look like?

Hide the ORM like you’re ashamed of it

Recently, I had the need to build an API for logging the progress of a data migration as we ran it over many million records, spitting out several new records for every input record. Said log ended up living in PostgreSQL1.

Visions of decoupled grandeur in my head, I decided that my API should be not leak its databaseness out to the user. I started off trying to make the API talk directly to the PostgreSQL driver, but that I wasn’t making much progress down that road. Further, I found myself reinventing things I would get for free in ActiveRecord-land.

Instead, I took a principled plunge. I surrendered to using an AR model, but I kept it tucked away inside the class for my API. My API makes several calls into the AR model, but it never leaks that ARness out to users of the API.

I liked how this ended up. I was free to use AR’s functionality within the inner model. I can vary the API and the AR model independently. I can stub out, or completely replace the model implementation. It feels like I’m doing OO right.

Enough of the suspense, let’s see a hypothetical example

User model. Everyone has a name, a city, and a URL. I can all do this in my sleep, right?

I start with by defining an API. Note that all it knows is that there is some object called Model that it delegates to.

class User
  attr_accessor :name, :city, :url

  def self.fetch(key)
    Model.fetch(key)
  end

  def self.fetch_by_city(key)
    Model.fetch_by_city(key)
  end

  def save
    Model.create(name, city, url)
  end

  def ==(other)
    name == other.name && city == other.city && url == other.url
  end

end

That’s a pretty straight-forward Ruby class, eh? The RSpec examples for it aren’t elaborate either.

describe User do

  let(:name) { "Shauna McFunky" }
  let(:city) { "Chasteville" }
  let(:url) { "http://mcfunky.com" }

  let(:user) do
    User.new.tap do |u|
      u.name = name
      u.city = city
      u.url = url
    end
  end

  it "has a name, city, and URL" do
    user.name.should eq(name)
    user.city.should eq(city)
    user.url.should eq(url)
  end

  it "saves itself to a row" do
    key = user.save
    User.fetch(key).should eq(user)
  end

  it "supports lookup by city" do
    user.save
    User.fetch_by_city(user.city).should eq(user)
  end

end

Not much coupling going on here either. Coding in a blog post is full of beautiful idealism, isn’t it?

“Needs more realism”, says the critic. Obliged:

  class User::Model < ActiveRecord::Base
    set_table_name :users

    def self.create(name, city, url)
      super(:name => name, :city => city, :url => url)
    end

    def self.fetch(key)
      from_model(find(key))
    end

    def self.fetch_by_city(city)
      from_model(where(:city => city).first)
    end

    def self.from_model(model)
      User.new.tap do |u|
        u.name = model.name
        u.city = model.city
        u.url = model.url
      end
    end

  end

Here’s the first implementation of an actual access layer for my user model. It’s coupled to the actual user model by names, but it’s free to map those names to database tables, indexes, and queries as it sees fit. If I’m clever, I might write a shared example group for the behavior of whatever implements create, fetch, and fetch_by_city in User::Model, but I’ll leave that as an exercise to the reader.

To hook my model up when I run RSpec, I add a moderately involved before hook:

  before(:all) do
    ActiveRecord::Base.establish_connection(
      :adapter => 'sqlite3',
      :database => ':memory:'
    )

    ActiveRecord::Schema.define do
      create_table :users do |t|
        t.string :name, :null => false
        t.string :city, :null => false
        t.string :url
      end
    end
  end

As far as I know, this is about as simple as it gets to bootstrap ActiveRecord outside of a Rails test. So it goes.

Let’s fake that out

Now I’ve got a working implementation. Yay! However, it would be nice if I didn’t need all that ActiveRecord stuff when I’m running isolated, unit tests. Because my model and data access layer are decoupled, I can totally do that. Hold on to your pants:

require 'active_support/core_ext/class'

class User::Model
  cattr_accessor :users
  cattr_accessor :users_by_city

  def self.init
    self.users = {}
    self.users_by_city = {}
  end

  def self.create(name, city, url)
    key = Time.now.tv_sec
    hsh = {:name => name, :city => city, :url => url}
    users[key] = hsh
    users_by_city[city] = hsh
    key
  end

  def self.fetch(key)
    attrs = users[key]
    from_attrs(attrs)
  end

  def self.fetch_by_city(city)
    attrs = users_by_city[city]
    from_attrs(attrs)
  end

  def self.from_attrs(attrs)
    User.new.tap do |u|
      u.name = attrs[:name]
      u.city = attrs[:city]
      u.url = attrs[:url]
    end
  end

end

This “storage” layer is a bit more involved because I can’t lean on ActiveRecord to handle all the particulars for me. Specifically, I have to handle indexing the data in not one but two hashes. But, it fits on one screen and its in memory, so I get fast tests at not too much overhead.

This is a classic test fake. It’s not the real implementation of the object; it’s just enough for me to hack out tests that need to interact with the storage layer. It doesn’t tell me whether I’m doing anything wrong like a mock or stub might. It just gives me some behavior to collaborate with.

Switching my specs to use this fake is pretty darn easy. I just change my before hook to this:

  before { User::Model.init }

Life is good.

Now for some overkill

Time passes. Specs are written, code is implemented to pass them. The application grows. Life is good.

Then one day the ops guy wakes up, finds the site going crazy slow and see that there are a couple hundred million user in the system. That’s a lot of rows. We’re gonna need a bigger database.

Migrating millions of rows to a new database is a pretty big headache. Even if it’s fancy and distributed. But, it turns out changing our code doesn’t have to tax our brains so much. Say, for example, we chose Cassandra:

require 'cassandra/0.7'
require 'active_support/core_ext/class'

class User::Model

  cattr_accessor :connection
  cattr_accessor :cf

  def self.create(name, city, url)
    generate_key.tap do |k|
      cols = {"name" => name, "city" => city, "url" => url}
      connection.insert(cf, k, cols)
    end
  end

  def self.generate_key
    SimpleUUID::UUID.new.to_guid
  end

  def self.fetch(key)
    cols = connection.get(cf, key)
    from_columns(cols)
  end

  def self.fetch_by_city(city)
    expression = connection.create_index_expression("city", city, "EQ")
    index_clause = connection.create_index_clause([expression])
    slices = connection.get_indexed_slices(cf, index_clause)
    cols = hash_from_slices(slices).values.first
    from_columns(cols)
  end

  def self.from_columns(cols)
    User.new.tap do |u|
      u.name = cols["name"]
      u.city = cols["city"]
      u.url = cols["url"]
    end
  end

  def self.hash_from_slices(slices)
    slices.inject({}) do |hsh, (k, columns)|
      column_hash = columns.inject({}) do |inner, col|
      column = col.column
      inner.update(column.name => column.value)
      end
    hsh.update(k => column_hash)
    end
  end
end

Not nearly as simple as the ActiveRecord example. But sometimes it’s about making hard problems possible even if they’re not mindless retyping. In this case, I had to implement ID/key generation for myself (Cassandra doesn’t implement any of that). I also had to do some cleverness to generate an indexed query and then to convert the hashes that Cassandra returns into my User model.

But hey, look! I changed the whole underlying database without worrying too much about mucking with my domain models. I can dig that. Further, none of my specs need to know about Cassandra. I do need to test the interaction between Cassandra and the rest of my stack in an integration test, but that’s generally true of any kind of isolated testing.

This has all happened before and it will all happen again

None of this is new. Data access layers have been a thing for a long time. Maybe institutional memory and/or scars have prevented us from bringing them over from Smalltalk, Java, or C#.

I’m just sayin’, as you think about how to tease your system apart into decoupled, cohesive, easy-to-test units, you should pause and consider the idea that pushing all your persistence needs down into an object you later delegate to can make your future self think highly of your present self.

This ended up being a big mistake. I could have saved myself some pain, and our ops team even more pain, if I’d done an honest back-of-the-napkin calculation and stepped back for a few minutes to figure out a better angle on storage. ↩

August 31, 2011

Relentless Shipping

Relentless Quality is a great piece. We should all strive to make really fantastic stuff. But I think there’s a nuance worth observing here:

Sharpen the edges, polish the surface and make it shine.

I’m afraid that some people are going to read more than the Kneath intends here. Quality does not mean perfection. Perfection is the enemy of shipping. Quality is useless if it doesn’t ship. Quality is not an excuse for not shipping.

Quality is a subjective, amorphous thing. To you, it means the fit and finish. To me, it means that all the bugs have been eliminated and possible bugs thought about and excised. Even to Christopher Alexander, quality isn’t nailed down; he refers to good buildings as possessing the “quality without a name”.

To whit, this shortcoming is pointed out in the original essay:

Move fast and break things, then move fast and fix it. Ship early, ship often, sacrificing features, never quality.

Scope and quality are sometimes at odds. Schedules and quality are sometimes at odds. There may come a time when you have to decide between shipping, maintaining quality, and including all the features.

The great thing about shipping is that if you can do it often enough, these problems of slipping features or making sacrifices in quality can fade away. If you can ship quickly, you can build features out, test them, and put that quality on them in an iterative fashion. Shipping can’t cure all ills, but it can ease many of them.

Kneath is urging you to maintain quality; I’m urging you to ship some acceptable value of quality and then iterate to make it amazing. Relent on quality, if you must, so you can ship relentlessly.

July 16, 2011

The guy doing the typing makes the call

Everyone brings unique perspective to a team. Each person has learned from successes and failures. There is a spectrum of things that are highly valued and that are strongly avoided and each team member is a different point on that spectrum.

It’s easy to bikeshed decisions. Everyone should feel free to share their ideas if they have something useful and constructive to contribute. High-functioning teams share assets and liabilities, so naturally they should share and discuss ideas.

That said, teams don’t exist for rhetorical indulgence. They exist to get shit done. Teams have to get all the ideas on the floor, decide what is practical, and move on to the next thing.

If there isn’t an outstanding consensus, the tie breaker is simple: the person who ends up doing the work makes the call. That’s not to say they should go cowboy and do whatever they want; they should use their knowledge of the “situation on the ground” to figure out what is most practical. With responsibility comes the right to pick a resolution.

It’s worth repeating: the guy doing the typing makes the decision.

July 9, 2011

How to listen to Stravinsky's Rite of Spring

Igor Stravinsky’s The Rite of Spring is an amazing piece of classical music. It’s one of the rare pieces that was really revolutionary in its time. But in our time, almost one hundred years on, it doesn’t sound that different.

Music has moved on. We are used to the odd times of “Take Five” and the dissonant horns of a John Williams soundtrack. Music offending the status quo is nothing unheard of.

To enjoy Rite of Spring in its proper context, you have to forget all that. Put yourself in the shoes of a Parisian in 1913, probably well off. You probably just enjoyed a Monet and a coffee. But your world is changing. Something about workers revolting. A transition from manual labor to mechanical labor.

Now imagine yourself at the premier for this new ballet from Russia. You being a Parisian, you’re probably expecting something along the lines of Debussy or perhaps Debussy or Berlioz.

Instead, you get mild dissonance and then total chaos. The changing time signatures, the dissonance, the subject of virgin sacrifice. You’d probably riot too!

July 2, 2011

Skip the hyperbole

Hyperbole is a tricky thing. In a joke, it works great. Its the foundation of a tall tale (TO BRASKY!). But in a conversation of ideas, it can backfire.

The trick about humans is that we rarely know exactly what the humans around us are thinking. Do they agree with what I’m saying? Are my jokes bombing? Is this presentation interesting or is the audience playing on their phones?

So the trick with hyperbole is that I might make an exagerated statement to move things along. But the other people in the conversation might think I actually mean what I said. Maybe they understand the thought behind the hyperbole, but maybe I end up unintentionally derailing the conversation. More times than I can remember, I’ve said something bold to move things along and it totally backfired. Hyperbole backfired.

Nothing beats concise language.

June 12, 2011

Locking and how did I get here?

I've got a bunch of browsers tabs open. This is unusual; I try to have zero open. Except right now. I'm digging into something. I'm spreading ephemeral papers around on my epemeral desk and trying to make a concept, not ephemeral, at least in my head.

It all started with locking. It's a hard concept, but some programs need it. In particular, applications running across multiple machines connected by imperfect software and unreliable networks need it. And this sort of thing ends up being difficult to get right.

I've poked around with this before. Reading the code of some libraries that are implementing locking in a way that might come in handy to me, I check out some documentation that I've seen referenced a couple times. Redis' setnx command can function as a useful primitive for implementing locks. It turns out (getset) is pretty interesting too. Ohm, redis-objects and adapter-redis all implement locking using a combination of those two primitives. Then I start to dig deeper into Ohm; there's some interesting stuff here. Activity feeds with Ohm is relevant to my interests. I've got a thing for persistence tools that enumerate their philosophy. Nest seems like a useful set of concepts too.

I'm mentally wandering here. Let's rewind back to what I'm really after: a way to do locking in Cassandra. There's a blog post I came across before on doing critical sections in Cassandra, but it uses ZooKeeper, so that's cheating. Then I get distraced by a thing on HBase vs. Cassandra and another perspective on Cassandra that mentions but does not really focus on locking.

And then, paydirt. A wiki page on locking in Cassandra. It may be a little rough, and might not even work, but it's worth playing with. Turns out it's an adaptation of an algorithm devised by Leslie Lamport for implementing locking with atomic primitives. It uses a bakery as an analgoy. Neat.

Then I get really distracted again. I remember doozer, a distributed consensus gizmo developed by Blake Mizerany at Heroku. I get to reading its documentation and come across the protocol spec, which has an intriguing link to a Plan 9 manpage on the Plan 9 File Protocol. That somehow drives me to ponder serialization and read about TNetstrings.

At this point, my cup has overfloweth. I've got locking, distributed consensus, serialization, protocols, and philosophies all on my mind. Lots of fun intellectual fodder, but I'll get nowhere if I don't stick my nose into one of them exclusively and really try to figure out what it's about. So I do. Fin.

May 4, 2011

Post-hoc career advice for twenty-something Adam

No program was ever made better by one developer scoffing at another. Computer science does not move forward with condescending attitudes. Success in software isn’t the result of looking down your nose or wagging your finger at others.

And yet, if you observe from the outside, you’d think that we all live in a wacky world of wonks, one where it’s not the facts, but how violently you repeat your talking points that matters the most. The Javascript guys do this in 2011, the Ruby guys did it in 2005, the .NET people before that in 2002, and on down the line.

Civility isn’t always what gets you noticed, but if you don’t have an outsized ability to focus on technical problems for tens of hours, it sure helps. You’re not the most brilliant developer on the planet, but you like to make people laugh, and you like to hang around those who are smarter than me. That’s not the recipe for a solid career in programming, but it’s a good bridge to get you from the journeyman side of the river over to the side where people think you might know what you’re doing.

Once you reach the other side, its a matter of putting in the hours, doing the practice, learning things, and always challenging yourself. Work with the smartest people you can, push yourself to make something better every day. Grind on that enough and you’ll get to the point where you really know what you’re doing.

Then, you close the loop. You were civil, you didn’t piss too many people off. They are eager to hear about the awesome and exciting things you did. So tell them. Even if you don’t think it’s all that awesome, some will know that you’ve got the awesome in you and that it will come out eventually. Some of them aren’t your mom!

This is what some call a successful career. It’s not so bad, but it’s not exactly the extravagant lifestyle you imagined when you were twenty. On the plus side, you do roughly the same things on a daily basis as you did back then, which isn’t so bad. Being an adult turns out to be pretty alright.

At some point, you write this advice to yourself on your weblog, except in the second person. Hopefully someone younger, perhaps on the precipice of idolizing a brilliant asshole, will read it and take a more civil path. Maybe you’ll get to work with them someday. Let’s hope it’s not too awkward.