Your frienemy, the ORM
When modeling how our domain objects map to what is stored in a database, an object-relational mapper often comes into the picture. And then, the angst begins. Bad queries are generated, weird object models evolve, junk-drawer objects emerge, cohesion goes down and coupling goes up.
It’s not that ORMs are a smell. They are genuinely useful things that make it easier for developers to go from an idea to a working, deployable prototype. But its easy to fall into the habit of treating them as a top-level concern in our applications.
Maybe that is the problem!
What if our domain models weren’t built out from the ORM? Some have suggested treating the ORM, and the persistence of our objects themselves, as mere implementation details. What might that look like?
Hide the ORM like you’re ashamed of it
Recently, I had the need to build an API for logging the progress of a data migration as we ran it over many million records, spitting out several new records for every input record. Said log ended up living in PostgreSQL1.
Visions of decoupled grandeur in my head, I decided that my API should be not leak its databaseness out to the user. I started off trying to make the API talk directly to the PostgreSQL driver, but that I wasn’t making much progress down that road. Further, I found myself reinventing things I would get for free in ActiveRecord-land.
Instead, I took a principled plunge. I surrendered to using an AR model, but I kept it tucked away inside the class for my API. My API makes several calls into the AR model, but it never leaks that ARness out to users of the API.
I liked how this ended up. I was free to use AR’s functionality within the inner model. I can vary the API and the AR model independently. I can stub out, or completely replace the model implementation. It feels like I’m doing OO right.
Enough of the suspense, let’s see a hypothetical example
User model. Everyone has a name, a city, and a URL. I can all do this in my sleep, right?
I start with by defining an API. Note that all it knows is that there is some object called Model
that it delegates to.
class User
attr_accessor :name, :city, :url
def self.fetch(key)
Model.fetch(key)
end
def self.fetch_by_city(key)
Model.fetch_by_city(key)
end
def save
Model.create(name, city, url)
end
def ==(other)
name == other.name && city == other.city && url == other.url
end
end
That’s a pretty straight-forward Ruby class, eh? The RSpec examples for it aren’t elaborate either.
describe User do
let(:name) { "Shauna McFunky" }
let(:city) { "Chasteville" }
let(:url) { "http://mcfunky.com" }
let(:user) do
User.new.tap do |u|
u.name = name
u.city = city
u.url = url
end
end
it "has a name, city, and URL" do
user.name.should eq(name)
user.city.should eq(city)
user.url.should eq(url)
end
it "saves itself to a row" do
key = user.save
User.fetch(key).should eq(user)
end
it "supports lookup by city" do
user.save
User.fetch_by_city(user.city).should eq(user)
end
end
Not much coupling going on here either. Coding in a blog post is full of beautiful idealism, isn’t it?
“Needs more realism”, says the critic. Obliged:
class User::Model < ActiveRecord::Base
set_table_name :users
def self.create(name, city, url)
super(:name => name, :city => city, :url => url)
end
def self.fetch(key)
from_model(find(key))
end
def self.fetch_by_city(city)
from_model(where(:city => city).first)
end
def self.from_model(model)
User.new.tap do |u|
u.name = model.name
u.city = model.city
u.url = model.url
end
end
end
Here’s the first implementation of an actual access layer for my user model. It’s coupled to the actual user model by names, but it’s free to map those names to database tables, indexes, and queries as it sees fit. If I’m clever, I might write a shared example group for the behavior of whatever implements create
, fetch
, and fetch_by_city
in User::Model
, but I’ll leave that as an exercise to the reader.
To hook my model up when I run RSpec, I add a moderately involved before
hook:
before(:all) do
ActiveRecord::Base.establish_connection(
:adapter => 'sqlite3',
:database => ':memory:'
)
ActiveRecord::Schema.define do
create_table :users do |t|
t.string :name, :null => false
t.string :city, :null => false
t.string :url
end
end
end
As far as I know, this is about as simple as it gets to bootstrap ActiveRecord outside of a Rails test. So it goes.
Let’s fake that out
Now I’ve got a working implementation. Yay! However, it would be nice if I didn’t need all that ActiveRecord stuff when I’m running isolated, unit tests. Because my model and data access layer are decoupled, I can totally do that. Hold on to your pants:
require 'active_support/core_ext/class'
class User::Model
cattr_accessor :users
cattr_accessor :users_by_city
def self.init
self.users = {}
self.users_by_city = {}
end
def self.create(name, city, url)
key = Time.now.tv_sec
hsh = {:name => name, :city => city, :url => url}
users[key] = hsh
users_by_city[city] = hsh
key
end
def self.fetch(key)
attrs = users[key]
from_attrs(attrs)
end
def self.fetch_by_city(city)
attrs = users_by_city[city]
from_attrs(attrs)
end
def self.from_attrs(attrs)
User.new.tap do |u|
u.name = attrs[:name]
u.city = attrs[:city]
u.url = attrs[:url]
end
end
end
This “storage” layer is a bit more involved because I can’t lean on ActiveRecord to handle all the particulars for me. Specifically, I have to handle indexing the data in not one but two hashes. But, it fits on one screen and its in memory, so I get fast tests at not too much overhead.
This is a classic test fake. It’s not the real implementation of the object; it’s just enough for me to hack out tests that need to interact with the storage layer. It doesn’t tell me whether I’m doing anything wrong like a mock or stub might. It just gives me some behavior to collaborate with.
Switching my specs to use this fake is pretty darn easy. I just change my before
hook to this:
before { User::Model.init }
Life is good.
Now for some overkill
Time passes. Specs are written, code is implemented to pass them. The application grows. Life is good.
Then one day the ops guy wakes up, finds the site going crazy slow and see that there are a couple hundred million user in the system. That’s a lot of rows. We’re gonna need a bigger database.
Migrating millions of rows to a new database is a pretty big headache. Even if it’s fancy and distributed. But, it turns out changing our code doesn’t have to tax our brains so much. Say, for example, we chose Cassandra:
require 'cassandra/0.7'
require 'active_support/core_ext/class'
class User::Model
cattr_accessor :connection
cattr_accessor :cf
def self.create(name, city, url)
generate_key.tap do |k|
cols = {"name" => name, "city" => city, "url" => url}
connection.insert(cf, k, cols)
end
end
def self.generate_key
SimpleUUID::UUID.new.to_guid
end
def self.fetch(key)
cols = connection.get(cf, key)
from_columns(cols)
end
def self.fetch_by_city(city)
expression = connection.create_index_expression("city", city, "EQ")
index_clause = connection.create_index_clause([expression])
slices = connection.get_indexed_slices(cf, index_clause)
cols = hash_from_slices(slices).values.first
from_columns(cols)
end
def self.from_columns(cols)
User.new.tap do |u|
u.name = cols["name"]
u.city = cols["city"]
u.url = cols["url"]
end
end
def self.hash_from_slices(slices)
slices.inject({}) do |hsh, (k, columns)|
column_hash = columns.inject({}) do |inner, col|
column = col.column
inner.update(column.name => column.value)
end
hsh.update(k => column_hash)
end
end
end
Not nearly as simple as the ActiveRecord example. But sometimes it’s about making hard problems possible even if they’re not mindless retyping. In this case, I had to implement ID/key generation for myself (Cassandra doesn’t implement any of that). I also had to do some cleverness to generate an indexed query and then to convert the hashes that Cassandra returns into my User
model.
But hey, look! I changed the whole underlying database without worrying too much about mucking with my domain models. I can dig that. Further, none of my specs need to know about Cassandra. I do need to test the interaction between Cassandra and the rest of my stack in an integration test, but that’s generally true of any kind of isolated testing.
This has all happened before and it will all happen again
None of this is new. Data access layers have been a thing for a long time. Maybe institutional memory and/or scars have prevented us from bringing them over from Smalltalk, Java, or C#.
I’m just sayin’, as you think about how to tease your system apart into decoupled, cohesive, easy-to-test units, you should pause and consider the idea that pushing all your persistence needs down into an object you later delegate to can make your future self think highly of your present self.
This ended up being a big mistake. I could have saved myself some pain, and our ops team even more pain, if I’d done an honest back-of-the-napkin calculation and stepped back for a few minutes to figure out a better angle on storage. ↩