Uncategorized

September 1, 2012

My inner dialog while coding

I’m a bit of a sailor when I’m wrangling my own creations.

August 31, 2012

Hello, you beautiful fixed-width font

Pitch. Not quite a programmer’s font, but holy cow is it gorgeous.

I love the thought put into this type; the creator actually tried to recreate the artifacts of type created by physically striking paper. Turned out that took away from the font, but it’s delightful that he went that deep in considering what a fixed-width font should feel like.

The history of fixed-width, typewriter-esque fonts is fantastic too. Even if you’re not typography-curious like myself, you should read the whole thing and not just look at the fantastic specimens.

August 17, 2012

Futures, Features, and the Enterprise-D

A future is a financial instrument (a thing you invest in) where you commit to paying a price today to receive something tomorrow. The price could go up or down tomorrow, but you’re locked into today’s price. Price goes up, you profit; price goes down, you eat the difference.

A feature is a thing that software does. For our purposes, we’ll say it’s also work that enables a feature: setting up CI, writing tests, refactoring code, adding documentation, etc. The general idea behind software development is that you should gain more time using a feature than the time you spent implementing it.

The Enterprise-D is a fictional space ship in the Star Trek: The Next Generation universe. It can split into two spaceships and is pretty well armed for a ship with an exploratory mission.

Today, Geordi and Worf (middle management) are recalibrating the forward sensor array. It takes them most of the day, but they get the job done. Captain Picard is studying ancient pan-flutes of the iron-age Vulcan era. Data (an android), as an experiment on his positronic net, is trying to learn how to tell an Aristocrat joke.

Tomorrow, in a series of events no one could predict, our friends find themselves in a tense situation with a Romulan Bird of Prey. Luckily, Worf detected it minutes before it decloaked, thanks to the work he and Geordi had performed the day before. This particular Bird of Prey is carrying ancient Romulan artifacts dating back to their own iron age. Amazingly, Picard is able to save the day by translating the inscriptions, which aren’t too different from Vulcan pan-flutes, and prevents an ancient doomsday weapon from consuming the Bird of Prey and Enterprise alike.

Data’s Aristocrat joke is never used. That’s good, because this is a family show.

Our friends on the Enterprise are savvy investors who look at their efforts in terms of risk and reward. They each invest time today into an activity (an instrument, in financial terms) which they may or may not use tomorrow. We can say that if they end up using the instrument, it pays off. We can then measure the pay-off of that instrument by assigning a value to the utility of that instrument. If the value of the instrument exceeds the time they invested in “acquiring” it, there is a profit.

Geordi and Worf’s investment was clearly a profit-bearing endeavor. Few other uses of their time, such as aligning the warp crystals or practicing Klingon combat moves, could have detected an invisible ship before it uninvisibles itself. In Wall Street terms, Geordi and Worf are getting the fat bonus and bottle of Bollinger champagne.

Picard’s investment seems less clear cut. It did come in handy in this particular case, but it probably wasn’t the only activity that would have saved the day. He could have belted out some Shakespeare or delegated to one of his officers to reconfigure the deflector dish. We’ll mark Picard as even for the day.

Data totally blew this one. His Aristocrat joke went unused. Even if he had used it, the best outcome would be that it’s a lame, sterile groaner that only ends up on a DVD extras reel. Data is in the red.

In terms of futures, we can say that the price of working on the foward sensor array went up, the price of pan-flute research was largely unchanged, and the price of Aristocrat jokes plummeted. Our friends on the Enterprise implicitly decided what risks are the most important to them and hedged against three of them. Some of them even came out ahead!

I’m working on software. Today, I can choose to do things on that software. I could 1) start on adding a new feature, 2) shore up the test suite, or 3) get CI setup and all-green. Respectively, these are futures addressing 1) the risk of losing money due to missing functionality, 2) losing money because adding features takes too long to get right, or 3) losing money because things are broken or not communicated in a timely manner.

Like our Enterprise episode, it’s hard to value these futures. If I deliver the feature tomorrow and it generates more money than the time I put into implementing, testing, and deploying the code, we’re looking at a clear profit. Revenue minus expense equals profit, grossly speaking.

Shoring up the test suite might make another feature easier to implement. It might give me confidence in moving code around to facilitate. It could tell me when I’ve broken some code, or some code is poorly designed and holding me back. But, these values are super hard to quantify. Did I save two hours on some feature because I spent one hour on the test suite yesterday? Tricky question!

Chore-ish tasks, like standing up a CI server or centralizing logs, are even harder to quantify. Either one of these tasks could save hours and days of wasted time due to missed communication or troubleshooting an opaque system. Or, they might not pay off at all for weeks and months.

I’m going to start writing down what I worked on every day, guess how many hours I spent on it, and then revisit each task weekly or monthly to guess if it paid out. Maybe I’ll develop an intuition for risk and reward for the things I work on. Maybe I’ll just end up with a mess of numbers. Almost certainly, I will seem pretty bookish and weird for tracking these sorts of things.

You should look bookish and weird too. Let me know what you find. I’ll write up whatever we figure out. Maybe there’s something to this whole “finance” thing besides nearly wrecking the global economy!

July 17, 2012

A romantic comedy: OO and FP

My magic ball predicts that OO and FP are going to take something of a “romantic comedy” path of evolution.

Act I. OO and FP are introduced at dinner parties and they could not seem more dissimilar and hilarious arguments ensue. No one goes home together. Despite the initial miss, the end of Act I finds OO and FP separately talking to friends about how they want the same things.

Act II. OO and FP run into each other at the coffee shop, and then again at the gym. OO is reading a book on ideas that FP loves. One of their friends invites them both to a bar, they get a little sauced and end up making out a bit. OO starts wearing FP’s jacket around town, even finding it a little comfortable. Towards the end of Act II, OO and FP are a bonfide thing, both borrowing ideas from each other. It’s pretty cute.

Act III. Open with a fight between OO and FP. It seems they just can’t come to agree on some important topic like mutability or the nature of behavior and state. Unfortunate and emotional words are uttered. The internet is abuzz with talk of the drama. They go back to their respective friends and rant about the shortcomings of the other. But, late at night, OO finds that not having FP around is less awesome than having FP around. OO cooks up a cooky plan to get FP back into their life. Hilarity, and a little awkwardness ensue. In the end, FP and OO go great together and we end with a montage of “everyone lived happily after” and see a clip that alludes to an OO/FP baby on the way.

If you’re playing at home, we’re already in Act II. Ruby and Python borrow various ideas on iteration from FP languages. We might be towards the end of Act II; Scala is very much wearing ML’s jacket around town. Surely there will be fallout at some point, someone ranting about how OO FP hybrids are too large, too poorly designed, too complicated, etc. The dust will settle, and someone will build an even better OO FP hybrid. Act III will play repeatedly until no one thinks of languages as OO FP hybrids, they just think of them as another language.

Then something different from OO or FP will become obviously useful and this whole romantic comedy will play again. It’s the way of Hollywood, and the way of software development. Everything old is new again; everything new is old again. Rinse, repeat.

July 12, 2012

Rediscovery: OO and FP

I’ve noticed some of the sharpest developers I know are doing one or both of these things:

Rediscovering object oriented design. Practicing evolving a design, often driven by the pain points illuminated by automated tests. Thinking about coupling and cohesion. Trying to encapsulate the right behaviors and find decide which principles are the most appropriate to the languages and systems they’re using.
Rediscovering functions. Applying functional programming to everyday programming problems. Using the features of functional languages as an advantage to build concurrent and distributed systems. Finding the differences in functional design and writing more idiomatic code.

The first is a cyclical thing. It happened in Java, it happened in .NET, it’s happening in Ruby now. People come to a language for what makes it different, write a lot of stuff, and keep bumping into the same problems. They (re-)discover OO, start refactoring things and shaping their systems differently. Some people dig it, others dismiss it as too much effort or ceremony. A new thing comes along, and it all happens again.

The second is harder for me to read. I’ve spent a fair amount of time studying FP, though I have yet to apply it to production software. Despite that, I have come across a lot of good ideas that are already part of the code I work with daily, or that I wish was part of the code I work with. FP has good answers to composing systems, reasoning about state, and handling concurrency. It has often suffered from a lack of pragmatism, overly dense literature, and rough tooling. The ideas are worth stealing, even if they haven’t broadly succeeded.

Both of these trends are crucial to moving the practice of software development forward. We need to keep rediscovering and sharpening old ideas whilst experimenting with new ideas to find out which ones are good and which ones less so.

July 10, 2012

Three kinds of distributed systems

Little-d distributed systems: the accidental sort. You built a program, it ran on one server. Then you added a database, some caches, perhaps a job worker somewhere. Whoops, you made a distributed system! Almost everything works this way now.

Big-D distributed systems: you read the Dynamo paper, maybe some Lamport papers too, and you set out to build on the principles set forth by those who have researched the topic. This is mostly open source distributed databases, but other systems surely fall under this category.

Ph.D distributed systems: you went to a top CS school, you ended up working with a distributed systems professor, and you wrote a system. You then graduated, ended up at Google, Facebook, Amazon, etc. and ended up writing more distributed systems, on a team of even more Ph.D’s.

If you’re building a little-d distributed system, study the patterns in the Big-D distributed systems. If you’re building a Big-D distributed, study what the Ph. D guys are writing. If you’re a Ph. D distributed system guy, please, write in clear and concise language! No one knows or cares what all the little greek symbols are, they just want to know what works, what doesn’t work, and why.

July 5, 2012

Protect that state: locks, monitors, and atomics

You need to protect a piece of data, like a counter or an output stream, from getting garbled by multiple threads.

Three choices, hot shot:

Explicit locks (aka mutexes): acquire a lock around the “critical section”, munge the data, release the lock. You have to manage the lock yourself. Multiple threads accessing the lock will not run concurrently anymore.
Implicit locks (aka monitors): annotate methods that modify important data. The monitor library manages the lock for you. Threads still serialize around the lock, reducing concurrency.
Atomic objects (aka compare-and-swap): use data structures that take advantage of runtime or processor semantics to guarantee that competing threads never interfere with each other. No locks! Much less serializing! Not broadly applicable, but I highly recommend them when you have the means.

Mutexes, aka lock “classic”

Mutexes are the lowest level of locks, at least in Ruby. They are the ur-locks, the most primitive of locks; everything is built on top of them. With any luck, you won’t ever need to use them directly, but it helps knowing how they work.

Eighty percent of what you need to know is synchronize. You create a lock, and then you use it to protect a piece of code that would go sideways if multiple threads hit it at the exact same time. Here’s a little class that locks around printing to standard output:

class Output

  def initialize
    @lock = Mutex.new
  end

  def log(msg)
    @lock.synchronize { puts msg }
  end

end

Using Output#log instead of puts will prevent the output of your multithreaded program from getting jumbled and confused by everyone writing to stdout at the same time. You could manually lock and unlock a Mutex if you had special needs.

Let’s talk counters

For the next couple examples, we’re going to implement a counter. Multiple threads will update said counter, so it needs to protect itself. Here’s how we use the counter:

    require 'thread'

    CORES=2
    ITERS=1_000

    threads = CORES.times.map do |n|
      Thread.new do
        ITERS.times do |i|
          out.log("Thread #{n}: Iteration: #{i} Counter: #{counter.value}") if i % 100 == 0
          counter.incr
        end
      end
    end

    threads.each(&:join)
    p counter.value

My Macbook Air has two real cores (don’t believe the hype!) and we’ll increment the counter a thousand times in each thread. Every hundred times through the loop, we’ll show some progress. At the end, we join each thread and then print the value of our counter. If all goes well, it will be CORES * ITERS.

All would not go well with this naive implementation:

class WildCounter

  def initialize
    @counter = 0
  end

  def incr
    @counter = @counter + 1
  end

  def value
    @counter
  end

end

If two threads execute incr at the same time, they will misread @counter or unintentionally overwrite a perfectly good value that was incremented behind their back.

We could protect this counter with a mutex, but I want to show you two other ways to go about it.

Monitors, aka intrinsic locks

Turns out, a well-designed class will tend to isolate state changes to a few methods. These “tell, don’t ask” methods are what you’ll likely end up locking. It would be pretty rad if you could just wrap a lock around the whole method without having to create variables and do a bunch of typing, don’t you think?

Those are a thing! They’re called monitors. You can read a bunch of academic stuff about them, but the crux of the biscuit is, a monitor is a lock around an entire instance of an object. You then declare methods that can only execute when that lock is held. Here’s a counter that uses a monitor:

require 'monitor'

class MonitorCounter

  def initialize
    @counter = 0
    # No idea why this doesn't work inside the class declaration
    extend(MonitorMixin)
  end

  def incr
    synchronize { @counter = @counter + 1 }
  end

  def value
    @counter
  end
end

It doesn’t look too much different from our naive counter. In the constructor, we extend Ruby’s MonitorMixin, which imbues this class with a lock and a synchronize method to protect mutator methods. (Ed. if anyone knows why the extend has to happen in the constructor instead of in the class declaration, I’m extremely stumped as to why!)

In incr, where we do the dirty work of updating the counter, all we need to do is put the actual logic inside a synchronize block. This ensures that only thread may execute this method on any given object instance at a time. Two threads could increment two counters safely, but if those two threads want to increment the same counter, they have to take turns.

A brief note on terminology: many Java concurrency texts refer to monitors as “intrinsic” locks because, in Java, they are part of every object. Mutexes are referred to as “extrinsic” locks because they aren’t tightly associated with any particular object instance.

Atomics, aka “wow that’s clever!”

It turns out that, in some cases, you can skip locks altogether. Amazing, right?!

Unfortunately, Ruby doesn’t have core support for atomic objects. Fortunately, Charles Nutter’s atomic library provides just that. It exploits operations provided by the underlying platform (the JVM in the case of JRuby, atomic compare-and-swap operations on Intel in the case of Rubinius) to implement objects that are guaranteed to update within one processor clock cycle. These operations work by taking two parameters, the old value and the new value; if the current value matches the old value, it’s safe to update it to the new value. If it doesn’t match, the operation fails and you have to try again.

Phew! Now you know a lot about atomic processor operations.

“Show me right now, Adam!” you say. Much obliged.

require 'atomic'

class AtomicCounter

  def initialize
    @counter = ::Atomic.new(0)
  end

  def incr
    @counter.update { |v| v + 1 }
  end

  def value
    @counter.value
  end

end

Luckily, Atomic encapsulates all the business of comparing and swapping and knowing about how to use atomic instructions. It maintains the value of the object internally and handles all the swapping logic for you. Call update, change the object in the block, and go on with your life. No locks necessary!

If that doesn’t make you love modern computer hardware, you are a programmer who does not know joy.

Tread carefully

Congratulations, you are now somewhat conversant on the topic of locking in concurrent Ruby programs. You know what the tools are, but, unfortunately, I haven’t the space to educate you on all the ways you are now equipped to shoot yourself in the foot. If you’re curious, you can read up on deadlock, livelock, starvation, priority inversion, and all the failure cases for dead processes left holding a lock.

The principle I try to follow, when I’m presented with a problem that needs locking, is to ask if I can work around the need for locking somehow. Could I use a Queue or atomic? Could I isolate this state in one thread and obviate the need for the lock? Is this state really necessary at all?

To anti-quote Ferris Buehler’s Day Off, when it comes to adding locks, “I highly unrecommend it, if you have the means”.

July 2, 2012

Future lies

It’s easy to delude yourself when writing software. Do these tests really describe what the application does? Does the documentation really describe how the system works now? Is this comment an accurate assertion on the state of affairs in the application?

My experience is that there’s little to solve this problem besides discipline. Always double check that you haven’t invalidated something that was written down in the margins. If there’s a way to encode something in code instead of prose, do it.

Vigilance against future-lies is an ever-mindful challenge.

June 27, 2012

Too eager to add code

I’m a little too eager to add code. If there’s a mess that needs refurbishing, rather than refactoring, I’m too quick to create a parallel world that is nice and tidy like I’d like it. Problem is, I don’t come back to the code in want of refurbishing enough. I know I should rejigger it to use the new shiny bits. For some reason, call it inertia, I don’t.

This is a shot across my own bow. Prefer refactoring to refurbishing. Prefer refurbishing to jumping into something new. Prefer shipping code to all of the previous tactics.

June 18, 2012

Getting started with Ruby Concurrency using two simple classes

Building a concurrent system isn’t as hard as they say it is. What it boils down to is, you can’t program by coincidence. Here’s a list of qualities in a strong developer:

Comfort in thinking about the state(s) of their program
Studies and understands the abstractions one or two layers above and below their program
Listens to the design forces on their code

Happily, that’s all you need to get started writing code running in multiple threads. You don’t need a graduate degree, mathematical tricks, a specially-ordained language, or membership in the cult of writing concurrent programs.

Today, I want to give you a starting point for tinkering with and understanding concurrent programs, particularly in modern Ruby (JRuby and Rubinius 2.0).

Work queues, out-of-process and in-process

Lots of apps use a queue to get stuff done. Throw jobs on a queue, spin up a bunch of processes, run a job worker in those processes. Simple, right? Well, not entirely. You’ve got to store those jobs somewhere, make sure pulling jobs out of it won’t lose critical work, run worker processes somewhere, restart them if they fail, make sure they don’t leak memory, etc. Writing that first bit of code is easy, but deploying it ends up being a little costly.

If process concurrency is the only available trick, running a Resque-style job queue works. But now that thread concurrency is viable with Ruby, we can look at handling these same kind of jobs in-process instead of externally. At the cost of some additional code and additional possible states in our process, we save all sorts of operational complexity.

Baby’s first work queue

Resque is a great abstraction. Let’s see if we can build something like it. Here’s how we’ll add jobs to our in-process queue:

Work.enqueue(EchoJob, "I am doing work!")

And this is how we’ll define a worker. Note that I’ve gone with call instead of perform, because that is my wont lately.

class EchoJob
def call(message)
puts message
end
end

Simple enough. Now let’s make this thing actually work!

Humble beginnings

require 'thread'
require 'timeout'
module Work
@queue = Queue.new
@n_threads = 2
@workers = []
@running = true
Job = Struct.new(:worker, :params)

First off, we pull in thread, which gives us Thread and our new best friend, Queue. We also need timeout so we have a way to interrupt methods that block.

Then we define our global work queue, aptly named Work. It’s got a modest amount of state: a queue to store pending work on, a parameter for the number of threads (I went with two since my MacBook Air has two real cores), an array to keep track of the worker threads, and a flag that indicates whether the work queue should keep running.

Finally, we define a little job object, because schlepping data around inside a library with a hash is suboptimal. Data that represents a concept deserves a name and some structure!

A public API appears

  module_function
  def enqueue(worker, *params)
    @queue <;<; Job.new(worker, params)
  end
def start
@workers = @n_threads.times.map { Thread.new { process_jobs } }
end

This is the heart of the API. Note the use of module_function with no arguments; this makes all the following methods attach to the module object like class methods. This saves us the tedium of typing self.some_method all the time. Happy fingers!

Users of Work will add new jobs with enqueue, just like Resque. It’s a lot simpler in our case, though, because we never have to cross process boundaries. No marshaling, no problem.

Once the queue is loaded up (or even if it’s not), users then call start. This fires up a bunch of threads and starts processing jobs. We need to keep track of those threads for later, so we toss them into a module instance variable.

The crux of the biscuit

  def process_jobs
    while @running
      job = nil
      Timeout.timeout(1) do
        job = @queue.pop
      end
      job.worker.new.call(*job.params)
    end
  end

Here’s the heart of this humble little work queue. It’s easiest to look at this one from the inside out. The crux of the biscuit is popping off the queue. For one thing, this is thread-safe, so two workers can pop off the queue at the same time and get different jobs back.

More importantly, @queue.pop will block, forever, if the queue is empty. That makes it easy for us to avoid hogging the CPU fruitlessly looking for new work. It does, however, mean we need to wrap the pop operation in a timeout, so that we can eventually get back to our loop and do some housekeeping.

Housekeeping task the first, run that job. This looks almost just like the code you’ll find inside Resque workers. Create a new instance of the class that handles this job, invoke our call interface, pass the job params on. Easy!

Housekeeping task the second, see if the worker should keep running. If the @running flag is still set, we’re good to continue consuming work off the queue. If not, something has signaled that it’s time to wrap up.

Shutting down

  def drain
    loop do
      break if @queue.empty?
      sleep 1
    end
  end
def stop
@running = false
@workers.each(&:join)
end
end

Shutting down our work queue is a matter of draining any pending jobs and then closing out the running threads. drain is a little oddly named. It doesn’t actually do the draining, but it does block until the queue is drained. We use it as a precondition for calling stop, which tells all the workers to finish the job they’ve got and then exit their processing loop. We then call Thread#join to shutdown the worker threads.

All together now

This is how we use our cute little work queue:

10.times { |n| Work.enqueue(EchoJob, "I counted to #{n}") }
Process jobs in another thread(s)
Work.start
Block until all jobs are processed
Work.drain
Stop the workers
Work.stop

Create work, start our workers, block until they finish, and then stop working. Not too bad for fifty lines of code.

That wasn’t too hard

A lot is made about how the difficulty of concurrent programming. “Oh, the locks, oh the error cases!” they cry. Maybe it is trickier. But it’s not rocket science hard. Hell, it’s not even monads and contravariance hard.

What I hope I’ve demonstrated today is that concurrent programming, even in Ruby with all its implementation shortcomings, is approachable. To wit:

Ruby has a great API for working with threads themselves. You call Thread.new and pass it some code to run in a thread. Done!
Ruby’s Queue class is threadsafe and a great tool for coordinating concurrent programs. You can get pretty far without thinking about locks with a queue. Push things onto it from one thread, pull things off from another. Block on a queue until you get the signal you’re waiting for. It’s a lovely little abstraction.
It’s easy to tinker with concurrency. You don’t have to write a giant program or have exotic problems to take advantage of Ruby for concurrency.

All that said, as I was writing this post up, some shortcomings in this example script jumped out at me. Output from the workers can appear out of order (classic concurrent program challenge), we can drain the queue while new work is still arriving (easily solved, but not with queues) and sleep loops (like in drain) are inelegant. If you want to read ahead, locks and latches are the droids you’re looking for.

I hope you see the ease with which we can get started doing concurrent Ruby programming by learning just two new classes. Don’t fear the threads, friend!</

June 17, 2012

Chronologic, a piece of software history

It’s long past time to call Chronologic a project at it’s end-of-life. About a year ago, it went into serious use as the storage system for social timelines in Gowalla. About six months ago, the Gowalla servers were powered down; no epilogue was written. In my work since then, I haven’t had the need for storing timelines and I haven’t been using Cassandra much at all. So, what purpose can Chronologic serve now that it’s not actively moving bits around in production?

A pretty OK Ruby/Cassandra example application

If you’re curious about how to use Cassandra with Ruby, Chronologic is probably an excellent starting point. This is doubly so if you’re interested in building your own indexes or using the lower-level driver API instead of CQL. If you’re interested in the latest and greatest in Cassandra schema design, which you should be, Chronologic won’t help you learn how to use CQL, secondary indexes, or composite columns.

Chronologic is also an acceptable take on building service-oriented, distributed systems with Ruby. It is a good demonstration of a layered, if not particularly OO, architecture. That test suite is fast, and that was nice.

A source of software archaeology

Chronologic started out as a vacation hack that Scott Raymond put together in the winter of 2010. I picked it up and soft launched parts of it in early 2011. As Gowalla went into a drastic product push in the summer of 2011, the development of Chronologic accelerated drastically. A couple other developers started making contributions as Chronologic become a bigger factor in how we built our application and API.

Within the branches and commits, you can probably see design decisions come and go. Dumb bugs discovered and fixed. Sophisticated bugs instrumented, fixes attempted, final solutions triumphantly committed. An ambitious software historian might even glean the pace of development within the Gowalla team and infer the emotional rollercoaster of a big product push through the tone and pace of commits to Chronologic.

An epilogue for Chronologic

I’m a little sad that Chronologic couldn’t become a more general thing useful to a lot of people. I’m a lot sad that, by the time Gowalla was winding down, I was sufficiently burnt out that I wanted little to do with the Chronologic code. All that said, I’m very glad that Scott Raymond encouraged me to work on it and that the team at Gowalla worked with me as I blundered through building my first distributed, high-volume system. It was stressful and challenging, but I’m proud of the work I did and what I learned.

June 15, 2012

The Grinder

As teams grow and specialize, I’ve noticed people tend to take on characters that I see over and over. Archetypes that seem to go beyond one project and apply to each team I work on over time. Today I want to talk about one of those archetypes: the Grinder.

The Grinder isn’t the smartest or most skilled guy on your team. They don’t write the prettiest code, they aren’t up-to-date on the state of the art, and the way they use tools can seem simplistic. Often, The Grinder doesn’t even push working code; there are often tiny bugs lurking, or even syntax errors. They push or deploy this code perhaps dozens of times a day. At first glance, The Grinder is a Terrifying Problem.

What sets The Grinder apart from your garden variety mediocre developer is that The Grinder is an expert at making progress. They move rapidly, they upset things, and then they get it working. The Grinder is an indispensable part of your team because they’re a bit cutthroat. They’re not worrying about the coolest new tech or design approaches the intelligensia are raving about. They’re just thinking, “how do I get this into production, get feedback, and get on with the next thing?”

The Grinder is an indispensable part of your team because they balance out the thinkers and worriers. While they’re asking “can we?” and “should we?” the Grinder is just getting it done. Grinders expand the realm of possibility by taking a journey of a thousand steps. They don’t invent a jetpack or hoverboard first; they just go with what they have.

The Grinders I’ve known are typically humble, kind people. They know how to operate their tools to get stuff done, and that’s mostly good enough for them. They’re not opposed to hearing about new techniques, but they want to know how it’s going to help them push code out faster. They are not particularly phased by brainy tech that appeals to novelty.

Pair a Grinder with a thinker who values how their skills complement each other and you can make a ton of progress without making a huge mess. A team of all Grinders would eventually burn itself out. Grinders stop when the feature is done, not when the day is over or their brain is out of gas. Grinders need thinkers to encourage them to regulate their work pace and to help them understand how to make rapid progress without coding themselves into a corner.

It’s not hard to recognize the Grinder on your team; it’s likely even the non-technical people in the company know who they are and recognize their strengths. If you’re a thinker who is a little flabberghasted that the Grinder approach works, take some inspiration from how they do what they do and ship some stuff with them.

If it’s late or the Grinder has been working long hours, tap them on the shoulder and tell them they do good work. Send them home so they can sustain it over weeks and months without running themselves down. A well-rested, excited Grinder is one of your team’s best assets.

June 5, 2012

Keep your application state out of your queue

I’m going to pick on Resque here, since it’s nominally great and widely used. You’re probably using it in your application right now. Unfortunately, I need to tell you something unsettling.

¡DRAMA MUSIC!

There’s an outside chance your application is dropping jobs. The Resque worker process pulls a job off the queue, turns it into an object, and passes it to your class for processing. This handles the simple case beautifully. But failure cases are important, if tedious, to consider.

What happens if there’s an exception in your processing code? There’s a plugin for that. What happens if you need to restart your Resque process while a job is processing? There’s a signal for that. What if, between taking a job off the queue and fully processing it, the whole server disappears due to a network partition, hardware failure, or the FBI “borrowing” your rack?

¡DRAMA MUSIC!

Honestly, you shouldn’t treat Resque, or even Redis, like you would a relational or dynamo-style database. Redis, like memcached, is designed as a cache. You put things in it, you can get it out, really fast. Redis, like memcached, rarely falls over. But if it does, the remediation steps are manual. Redis currently doesn’t have a good High Availability setup (it’s being contemplated).

Further, Resque assumes that clients will properly process every message they dequeue. This isn’t a bad assumption. Most systems work most of the time. But, if a Resque worker process fails, it’s not great. It will lose all of the message(s) held in memory, and the Redis instance that runs your Resque queues is none the wiser.

¡DRAMA MUSIC!

In my past usage of Resque, this isn’t that big of a deal. Most jobs aren’t business-critical. If the occasional image doesn’t get resized or a notification email doesn’t go out, life goes on. A little logging and data tweaking cures many ills.

But, some jobs are business-critical. They need stronger semantics than Resque provides. The processing of those jobs, the state of that processing, is part of our application’s logic. We need to model those jobs in our application and store that state somewhere we can trust.

I first became really aware of this problem, and a nice solution to it, listening to the Ruby Rogues podcast. Therein, one of the panelists advised everyone to model crucial processing as state machines. The jobs become the transitions from one model state to the next. You store the state alongside an appropriate model in your database. If a job should get dropped, it’s possible to scan the database for models that are in an inconsistent state and issue the job again.

Example follows

Let’s work an example. For our imaginary application, comment notifications are really important. We want to make sure they get sent, come hell or high water. Here’s what our comment model looks like originally:

    class Comment

      after_create :send_notification

      def send_notification
        Resque.enqueue(NotifyUser, self.user_id, self.id)
      end

    end

Now we’ll add a job to send that notification:

    class NotifyUser
      @queue = :notify_user

      def self.perform(user_id, comment_id)
        # Load the user and comment, send a notification!
      end

    end

But, as I’ve pointed out with great drama, this code can drop jobs. Let’s throw that state machine in:

    class Comment
      # We'll use acts-as-state-machine. It's a classic.
      include AASM

      # We store the state of sending this notification in the aptly named
      # `notification_state` column. AASM gives us predicate methods to see if this
      # model is in the `pending?` or `sent?` states and a `notification_sent!`
      # method to go from one state to the next.
      aasm :column => :notification_state do
        state :pending, :initial => true
        state :sent

        event :notification_sent do
          transitions :to => :sent, :from => [:pending]
        end
      end

      after_create :send_notification

      def send_notification
        Resque.enqueue(NotifyUser, self.user_id, self.id)
      end

    end

Our notification has two states: pending, and sent. Our web app creates it in the pending state. After the job finishes, it will put it in the sent state.

    class NotifyUser
      @queue = :notify_user

      def self.perform(user_id, comment_id)
        user    = User.find(user_id)
        comment = Comment.find(comment_id)

        user.notify_for(comment)
        # Notification success! Update the comment's state.
        comment.notification_sent!
      end

    end

This a good start for more reliably processing jobs. However, most jobs happen to handle the interaction between two systems. This notification is a great example. It integrates our application with a mail server or another service that handles our notifications. Talking to those things is probably something that isn’t tolerant to duplicate requests. If our process croaks between the time it tells the mail server to send and the time it updates the notification state in our database, we could accidentally process this notification twice. Back to square one?

¡DRAMA MUSIC!

Not quite. We can reduce our problem space once more by adding another state to our model.

    class Comment
      include AASM

      aasm :column => :notification_state do
        state :pending, :initial => true
        state :sending # We have attempted to send a notification
        state :sent    # The notification succeeded
        state :error   # Something is amiss :(

        # This happens right before we attempt to send the notification
        event :notification_attempted do
          transitions :to => :sending, :from [:pending]
        end

        # We take this transition if an exception occurs
        event :notification_error do
          transitions :to => :error, :from => [:sending]
        end

        # When everything goes to plan, we take this transition
        event :notification_sent do
          transitions :to => :sent, :from => [:sending]
        end

      end

      after_create :send_notification

      def send_notification
        Resque.enqueue(NotifyUser, self.user_id, self.id)
      end

    end

Now, when our job processes a notification, it first uses notification_attempted. Should this job fail, we’ll know which jobs we should look for in our logs. We could also get a little sneaky and monitor the number of jobs in this state if we think we’re bottlenecking around sending the actual notification. Once the job completes, we transition to the sent state. If anything goes wrong, we catch the exception and put the job in the error state. We definitely want to monitor this state and use the logs to figure out what went wrong, manually fix it, and perhaps write some code to fix bugs or add robustness.

The sending state is entered when at least one worker has picked up a notification and tried to send the message. Should that worker fail in sending the message or updating the database, we will know. When trouble strikes, we’ll know we have two cases to deal with: notifications that haven’t been touched at all, and notifications that were attempted and may have succeeded. The former, we’ll handle by requeueing them. The latter, we’ll probably have to write a script to grep our mail logs and see if we successfully sent the message. (You are logging everything, centrally aggregating it, and know how to search it, right?)

The truth is, the integration points between systems is a gnarly problem. You don’t so much solve the problem; you get really good at detecting and responding to edge cases. Thus is life in production. But losing jobs, we can make really good progress on that. Don’t worry about your low-value jobs; start with the really important ones, and weave the state of those jobs into the rest of your application. Some day, you’ll thank yourself for doing that.

May 17, 2012

Tables and lambdas, a cure for smelly cases

Lots of folks consider case expressions in Ruby a code smell. I’m not ready to write them off just yet, but I know a good replacement for some uses of case when I see it. Rad co-worker David Copeland’s Lookup Tables With Lambdas is one of those replacements. For cases where a method takes a parameter, throws it into a case, and returns a value, I can replace all that lookup business with a hash lookup. To carry the metaphor through, the hash is the lookup table. Rad.

Where it gets fun is when I need to do some kind of dynamic lookup in the hash. Normally I wouldn’t want to do that when the Ruby interpreter parses my hash literal. If I reach into my functional programming bag of tricks, I recall that lambdas can be used to defer evaluation. And that’s exactly what David recommends. If I’ve got database lookups or logic I need to embed in my tables, Ruby’s lambda comes to the rescue!

This approach works great at the small-to-medium scale. That said, I always keep in mind that a bunch of methods manipulating a hash, using its keys as a convention, is an encapsulated, orthogonal object begging to happen. Remember, it’s Ruby; we can make our objects behave like hashes but still do OO- and test-driven design.

May 15, 2012

Turns out I was wrong about RSpec subjects

I was afraid that David Chelimsky was going to take away my toys! Consider, explicit use of subject in RSpec considered a smell:

The problem with this example is that the word “subject” is not very intention revealing. That might not appear problematic in this small example because you can see the declaration on line 3 and the reference on line 6. But when this group grows to where you have to scroll up from the reference to find the declaration, the generic nature of the word “subject” becomes a hinderance to understanding and slows you down.

I’m so guilty of using subject heavily. Even worse, I’ve been advocating it to others too. In my defense, it does lend a good deal of concision to specs and seemed like a golden path.

Luckily, David isn’t taking away my toys. He’s got an even better recommendation: just use a method or let with a intention-revealing name. Here’s his example:

describe Article do
  def article; Article.new; end

  it "validates presence of :title" do
    article.should validate_presence_of(:title)
  end
end

This is, now that I’m looking at it, way better. As this spec grows, you can add helpers for article_with_comments, article_with_author, etc. and it’s clear right on the line that helper is used what’s going on. No jumping back and forth between contexts. Thumbs up!

April 28, 2012

Three Easy Essays on Distributed Systems

Ryan Smith is pretty good at thinking about distributed systems. Distributed systems, the systems we (sometimes unwittingly) create on a regular basis these days, are a complicated, dense, far-reaching topic. Ryan’s managed to take a few of its problems and concisely introduce them with simple solutions that apply to all but the largest systems.

In The Worker Pattern, he presents a novel solution to a problem you are probably tackling with background or asynchronous job queues. Teaser: do you know what the HTTP 202 status code does?

A web service that requires high throughput will undoubtedly need to ensure low latency while processing requests. In other words, the process that is serving HTTP requests should spend the least amount of time possible to serve the request. Subsequently if the server does not have all of the data necessary to properly respond to the request, it must not wait until the data is found. Instead it must let the client know that it is working on the fulfillment of the request and that the client should check back later.

Coordinating multiple processes that need to process a dataset in bulk is tricky. Large systems usually end up needing some kind of Paxos service like Doozer or ZooKeeper to keep all the worker processes from butting heads or duplicating work. Leader Election shows how, by scoping the problem space to existing tools, it becomes possible to put together a solution that scales down to small and medium-sized systems:

My environment already is dependent on Ruby & PostgreSQL so I want a solution that leverages my existing technologies. Also, I don’t want to create a table other than the one which I need to process.

As applications grow, they tend to maintain more and more state across more and more systems. Incidental state is problematic, especially when you have to maintain several services to keep all of it available. Applying Event Buffering mitigates many of these problems. The core idea of this one is my favorite:

We have seen several examples of how to transfer state from our client to our server. The primary reason that we take these steps to transfer state is to eliminate the number of services in our distributed system that have to maintain state. Keeping a database on a service eventually becomes and operational hazard.

Most of the systems we build on the web today are distributed systems. Ryan’s writings are an excellent introduction to thinking about and building these systems. It certainly helps to comb through research papers on the topic, but these three essays are excellent starters down the path to intentionally building distributed systems.

April 14, 2012

Cowboy dependencies

So you’ve written a cool open source library. It’s at the point where it’s useful. You’re pretty excited. Even better, it seems like something that might be useful at your day job. You could go ahead and integrate it. Win-win! You get to work out the rough edges on your open source project and make progress on your professional project.

This is tricky ground and it’s not as win-win as you might think. Integrating a new dependency, whether its one maintained by a team-mate or not, requires communication. Everyone on the team will have to know about the dependency, how to work with it, and how to maintain it within the project. If there’s a deal-breaking concern with the library, consider it feedback on your library; it either needs to better address the problem, or it needs better documentation to address why the problem isn’t so much a problem.

It all comes down to communication. Adding a dependency, even if you know the person who wrote it really well, requires collaboration from your teammates. If you’re not talking to your teammates, you’re just cowboy coding.

Don’t cowboy dependencies into your project!

March 30, 2012

A Presenter is a signal

When someone says “your view or API layer needs presenters”, it’s easy to get confused. Presenter has become wildcard jargon for a lot of different sorts of things: coordinators, conductors, representations, filter, projections, helpers, etc. Even worse, many developers are in the “uncanny valley” stage of understanding the pattern; it’s close to being a thing, but not quite. I’ve come across presenters with entirely too much logic, presenters that don’t pull their weight as objects, and presenters that are merely indirection. Presenter is becoming a catch-all that stands for organizing logic more strictly than your framework typically provides for.

I could make it my mission to tell every single person they’re wrong about presenters, but that’s not productive and it’s not entirely correct. Rather, presenters are a signal. When you say “we use presenters for our API”, I hear you say “we found we had too much logic hanging around in our templates and so we started moving it into objects”. From there on out, every application is likely to vary. Some applications are template focus and so need objects that are focused on presentational logic. Other apps are API focused and need more support in the area of emitting and parsing JSON.

At first, I was a bit concerned about the explosion of options for putting more objects into the view part of your typical model-view-controller application. But as I see more applications and highly varied approaches, I’m fine with Rails not providing a standard option. Better to decide what your application really needs and craft it yourself or find something viable off the shelf.

As long as your “presenters” reduce the complexity of your templates and makes logic easier to decouple and test, we’re all good, friend.

March 20, 2012

How to approach a database-shaped problem

When it comes to caching and primary storage of an application’s data, developers are faced with a plethora of shiny tools. It’s easy to get caught up in how novel these tools are and get over enthusiastic about adopting them; I certainly have in the past! Sadly, this route often leads to pain. Databases, like programming languages, are best chosen carefully, rationally, and somewhat conservatively.

The thought process you want to go through is a lot like what former Gowalla colleague Brad Fults did at his new gig with OtherInbox. He needed to come up with a new way for them to store a mapping of emails. He didn’t jump on the database of the day, the system with the niftiest features, the one with the greatest scalability, or the one that would look best on his resume. Instead, he proceeded as follows:

Describe the problem domain and narrow it down to two specific, actionable challenges
Elaborate on the existing solution and its shortcomings
Identify the possible databases to use and summarize their advantages and shortcomings
Describe the new system and how it solves the specific challenges

Of course, what Brad wrote is post-hoc. He most likely did the first two steps in a matter of hours, took some days to evaluate each possible solution, decided which path to take, and then hacked out the system he later wrote about.

But more importantly, he cheated aggressively. He didn’t choose one database, he chose two! He identified a key unique attribute to his problem; he only needed a subset of his data to be relatively fresh. This gave him the luxury of choosing a cheaper, easier data store for the complete dataset.

In short: solve your problem, not the problem that fits the database, and cheat aggressively when you can.

March 15, 2012

How I use vim splits

[vimeo www.vimeo.com/38571167 w=500&h=394]

A five-minute exploration of how I use splits in vim to navigate between production or test code and how I move up and down through layers of abstractions. Spoiler: I use vertical splits to put test and production code side-by-side; horizontal splits come into play when I’m navigating through the layers of a web app or something with a client/server aspect to it. I haven’t customized vim to make this work; I just use the normal window keybindings and a bunch of commands from vim-rails.

I seriously heart this setup. It’s worth taking thirty minutes to figure out how you can approximate it with whatever editor setup you enjoy most. Hint: splits are really fantastic.