2012
Three application growth stories
First you grow your application, then you grow your organization, and then you get down to the metal and eek out all the performance you can.
Evolution of SoundCloud’s Architecture, this is how you grow an application without eating the elephant too soon. I would love to send this back to Adam from two years ago. Note that they ended up using RabbitMQ as a broker instead of Resque as a job queue. This nuanced position put them in a pretty nice place, architecturally.
Addicted to Stable is equal parts “hey, you should automate things and use graphs/monitoring to tell you when things break” and “look at all of GitHub’s nifty internal tools”. Even though I’ve seen the latter a few times already, I like my pal John Nunemaker’s peak into how it all comes together.
High Performance Network Programming on the JVM explains how to choose network programming libraries for the JVM, some pro’s and con’s to know about, and lays out a nice conceptual model for building network services. Seems like this is where you want to start once you reach the point where your application needs to serve tens of thousands of client concurrently.
I’m going to keep posting links like these until, some day, I feel like I’m actually doing it right. Until then, stand on other people’s shoulders, learn from experience.
Hello, you beautiful fixed-width font
Pitch. Not quite a programmer’s font, but holy cow is it gorgeous.
I love the thought put into this type; the creator actually tried to recreate the artifacts of type created by physically striking paper. Turned out that took away from the font, but it’s delightful that he went that deep in considering what a fixed-width font should feel like.
The history of fixed-width, typewriter-esque fonts is fantastic too. Even if you’re not typography-curious like myself, you should read the whole thing and not just look at the fantastic specimens.
Know a feedback loop
TDD is one way to create a feedback loop for building your application. Spiking code out and then stabilizing it is another:
For most people, TDD is a mechanism for discovery and learning. For some of us, if we can write an example in our heads, our biggest areas of learning probably lie elsewhere. Since ignorance is the constraint in our system, and we’re not ignorant about much that TDD can teach us, skipping TDD allows us to go faster. This isn’t true of everything. Occasionally the feedback is on some complex piece of business logic. Every time I’ve tried to do that without TDD it’s stung me, so I’m getting better at working out when to do it, and when it’s OK to skip it.
TDD helps me a lot when I have an idea what the problem looks like. Spiking out a prototype and backfilling tests helps me when I don’t know what the problem looks like.
You’re possibly different in how you approach problems. If you’re flying more by the seat of your pants, or you aren’t including the composition and organization of the code in your feedback loop, I will probably insist you work on something that isn’t in the core layers of the application. That’s cool though; as long as you have any feedback loop that will nudge you towards better discovering and solving the core problem, we’re cool.
Futures, Features, and the Enterprise-D
A future is a financial instrument (a thing you invest in) where you commit to paying a price today to receive something tomorrow. The price could go up or down tomorrow, but you’re locked into today’s price. Price goes up, you profit; price goes down, you eat the difference.
A feature is a thing that software does. For our purposes, we’ll say it’s also work that enables a feature: setting up CI, writing tests, refactoring code, adding documentation, etc. The general idea behind software development is that you should gain more time using a feature than the time you spent implementing it.
The Enterprise-D is a fictional space ship in the Star Trek: The Next Generation universe. It can split into two spaceships and is pretty well armed for a ship with an exploratory mission.
Today, Geordi and Worf (middle management) are recalibrating the forward sensor array. It takes them most of the day, but they get the job done. Captain Picard is studying ancient pan-flutes of the iron-age Vulcan era. Data (an android), as an experiment on his positronic net, is trying to learn how to tell an Aristocrat joke.
Tomorrow, in a series of events no one could predict, our friends find themselves in a tense situation with a Romulan Bird of Prey. Luckily, Worf detected it minutes before it decloaked, thanks to the work he and Geordi had performed the day before. This particular Bird of Prey is carrying ancient Romulan artifacts dating back to their own iron age. Amazingly, Picard is able to save the day by translating the inscriptions, which aren’t too different from Vulcan pan-flutes, and prevents an ancient doomsday weapon from consuming the Bird of Prey and Enterprise alike.
Data’s Aristocrat joke is never used. That’s good, because this is a family show.
Our friends on the Enterprise are savvy investors who look at their efforts in terms of risk and reward. They each invest time today into an activity (an instrument, in financial terms) which they may or may not use tomorrow. We can say that if they end up using the instrument, it pays off. We can then measure the pay-off of that instrument by assigning a value to the utility of that instrument. If the value of the instrument exceeds the time they invested in “acquiring” it, there is a profit.
Geordi and Worf’s investment was clearly a profit-bearing endeavor. Few other uses of their time, such as aligning the warp crystals or practicing Klingon combat moves, could have detected an invisible ship before it uninvisibles itself. In Wall Street terms, Geordi and Worf are getting the fat bonus and bottle of Bollinger champagne.
Picard’s investment seems less clear cut. It did come in handy in this particular case, but it probably wasn’t the only activity that would have saved the day. He could have belted out some Shakespeare or delegated to one of his officers to reconfigure the deflector dish. We’ll mark Picard as even for the day.
Data totally blew this one. His Aristocrat joke went unused. Even if he had used it, the best outcome would be that it’s a lame, sterile groaner that only ends up on a DVD extras reel. Data is in the red.
In terms of futures, we can say that the price of working on the foward sensor array went up, the price of pan-flute research was largely unchanged, and the price of Aristocrat jokes plummeted. Our friends on the Enterprise implicitly decided what risks are the most important to them and hedged against three of them. Some of them even came out ahead!
I’m working on software. Today, I can choose to do things on that software. I could 1) start on adding a new feature, 2) shore up the test suite, or 3) get CI setup and all-green. Respectively, these are futures addressing 1) the risk of losing money due to missing functionality, 2) losing money because adding features takes too long to get right, or 3) losing money because things are broken or not communicated in a timely manner.
Like our Enterprise episode, it’s hard to value these futures. If I deliver the feature tomorrow and it generates more money than the time I put into implementing, testing, and deploying the code, we’re looking at a clear profit. Revenue minus expense equals profit, grossly speaking.
Shoring up the test suite might make another feature easier to implement. It might give me confidence in moving code around to facilitate. It could tell me when I’ve broken some code, or some code is poorly designed and holding me back. But, these values are super hard to quantify. Did I save two hours on some feature because I spent one hour on the test suite yesterday? Tricky question!
Chore-ish tasks, like standing up a CI server or centralizing logs, are even harder to quantify. Either one of these tasks could save hours and days of wasted time due to missed communication or troubleshooting an opaque system. Or, they might not pay off at all for weeks and months.
I’m going to start writing down what I worked on every day, guess how many hours I spent on it, and then revisit each task weekly or monthly to guess if it paid out. Maybe I’ll develop an intuition for risk and reward for the things I work on. Maybe I’ll just end up with a mess of numbers. Almost certainly, I will seem pretty bookish and weird for tracking these sorts of things.
You should look bookish and weird too. Let me know what you find. I’ll write up whatever we figure out. Maybe there’s something to this whole “finance” thing besides nearly wrecking the global economy!
The test-driven astronaut
Don't Make Your Code "More Testable", make the design of your program better. Snappy test suites are all the vogue, but that misses the point of even writing tests: create a feedback loop to know when your program works and when your program is organized well. Listen carefully to the whispers in your code; if you're spending all your time writing tests or shuffling code instead of adding features, improving features, or shipping features then you're falling to the siren song of the test-driven astronaut.
Simplicators for sanity
For those rainy days when integrating with a not-entirely sane system is getting you down:
A Simplicator introduces a new seam into the system that did not exist when the service's byzantine API was used directly. As well helping us test the system, I've noticed that this seam is ideal for monitoring and regularing our systems' use of external services. If a widely supported protocol is used, we can do this with off-the-shelf components.
The Simplicator is a component that lives outside the architecture of your system. It exports a sane interface to your system. You test it separately from your system. Its only purpose in life is to deal with the insanity of others.
Hell is other people’s systems; QED this is a heavenly idea.
Smelly obsessions
Get Rid of That Code Smell - Primitive Obsession:
Think about it this way: would you use a string to represent a date? You could, right? Just create a string, let’s say "2012-06-25" and you’ve got a date! Well, no, not really – it’s a string. It doesn’t have semantics of a date, it’s missing a lot of useful methods that are available in an instance of Date class. You should definitely use Date class and that’s probably obvious for everybody. This is exactly what Primitive Obsession smell is about.
Rails developers can fall into another kind of obsession: framework obsession. Rails gives you folders for models, views, controllers, etc. Everything has to be one of those. Logic is shoehorned into models instead of put in objects unrelated to persistence. Controller methods and helpers grow huge with conditionals and accreted behavior.
This is partially an education and advocacy problem. Luckily, folks like Avdi Grimm, Corey Haines, Gary Bernhardt, and Steve Klabnik, amongst others, are spreading the word of how to use object oriented principles to design Rails applications without obsessing over the constructs in the Rails framework.
The second part is practice. Once you’ve educated yourself and bought into the notion that a Rails app isn’t all Rails classes, you’ve got to practice and struggle with the concepts. It won’t be pretty the first time; at least, it wasn’t for me. But with time, I’ve come to feel far better about how I design applications using both Rails principles and object-oriented principles.
How to think about organizing folders: don't.
Mountain Lion’s New File System:
Folders tend to grow deeper and deeper. As soon as we have more than a handful of notions, or (beware!) more than one hierarchical level of notions, it gets hard for most brains to build a mental model of that information architecture. While it is common to have several hierarchy levels in applications and file systems, they actually don’t work very well. We are just not smart enough to deal with notional pyramids. Trying to picture notional systems with several levels is like thinking three moves ahead in chess. Everybody believes that they can, but only a few skilled people really can do it. If you doubt this, prove me wrong by telling me what is in each file menu in your browser…
A well-considered essay on the non-recursive design of folders in iCloud, how people think about organizing documents, the emotions of organizing documents, and how it comes together in an app like iCloud. Great reading.
A romantic comedy: OO and FP
My magic ball predicts that OO and FP are going to take something of a “romantic comedy” path of evolution.
Act I. OO and FP are introduced at dinner parties and they could not seem more dissimilar and hilarious arguments ensue. No one goes home together. Despite the initial miss, the end of Act I finds OO and FP separately talking to friends about how they want the same things.
Act II. OO and FP run into each other at the coffee shop, and then again at the gym. OO is reading a book on ideas that FP loves. One of their friends invites them both to a bar, they get a little sauced and end up making out a bit. OO starts wearing FP’s jacket around town, even finding it a little comfortable. Towards the end of Act II, OO and FP are a bonfide thing, both borrowing ideas from each other. It’s pretty cute.
Act III. Open with a fight between OO and FP. It seems they just can’t come to agree on some important topic like mutability or the nature of behavior and state. Unfortunate and emotional words are uttered. The internet is abuzz with talk of the drama. They go back to their respective friends and rant about the shortcomings of the other. But, late at night, OO finds that not having FP around is less awesome than having FP around. OO cooks up a cooky plan to get FP back into their life. Hilarity, and a little awkwardness ensue. In the end, FP and OO go great together and we end with a montage of “everyone lived happily after” and see a clip that alludes to an OO/FP baby on the way.
If you’re playing at home, we’re already in Act II. Ruby and Python borrow various ideas on iteration from FP languages. We might be towards the end of Act II; Scala is very much wearing ML’s jacket around town. Surely there will be fallout at some point, someone ranting about how OO FP hybrids are too large, too poorly designed, too complicated, etc. The dust will settle, and someone will build an even better OO FP hybrid. Act III will play repeatedly until no one thinks of languages as OO FP hybrids, they just think of them as another language.
Then something different from OO or FP will become obviously useful and this whole romantic comedy will play again. It’s the way of Hollywood, and the way of software development. Everything old is new again; everything new is old again. Rinse, repeat.
Rediscovery: OO and FP
I’ve noticed some of the sharpest developers I know are doing one or both of these things:
Rediscovering object oriented design. Practicing evolving a design, often driven by the pain points illuminated by automated tests. Thinking about coupling and cohesion. Trying to encapsulate the right behaviors and find decide which principles are the most appropriate to the languages and systems they’re using.
Rediscovering functions. Applying functional programming to everyday programming problems. Using the features of functional languages as an advantage to build concurrent and distributed systems. Finding the differences in functional design and writing more idiomatic code.
The first is a cyclical thing. It happened in Java, it happened in .NET, it’s happening in Ruby now. People come to a language for what makes it different, write a lot of stuff, and keep bumping into the same problems. They (re-)discover OO, start refactoring things and shaping their systems differently. Some people dig it, others dismiss it as too much effort or ceremony. A new thing comes along, and it all happens again.
The second is harder for me to read. I’ve spent a fair amount of time studying FP, though I have yet to apply it to production software. Despite that, I have come across a lot of good ideas that are already part of the code I work with daily, or that I wish was part of the code I work with. FP has good answers to composing systems, reasoning about state, and handling concurrency. It has often suffered from a lack of pragmatism, overly dense literature, and rough tooling. The ideas are worth stealing, even if they haven’t broadly succeeded.
Both of these trends are crucial to moving the practice of software development forward. We need to keep rediscovering and sharpening old ideas whilst experimenting with new ideas to find out which ones are good and which ones less so.
Three kinds of distributed systems
Little-d distributed systems: the accidental sort. You built a program, it ran on one server. Then you added a database, some caches, perhaps a job worker somewhere. Whoops, you made a distributed system! Almost everything works this way now.
Big-D distributed systems: you read the Dynamo paper, maybe some Lamport papers too, and you set out to build on the principles set forth by those who have researched the topic. This is mostly open source distributed databases, but other systems surely fall under this category.
Ph.D distributed systems: you went to a top CS school, you ended up working with a distributed systems professor, and you wrote a system. You then graduated, ended up at Google, Facebook, Amazon, etc. and ended up writing more distributed systems, on a team of even more Ph.D’s.
If you’re building a little-d distributed system, study the patterns in the Big-D distributed systems. If you’re building a Big-D distributed, study what the Ph. D guys are writing. If you’re a Ph. D distributed system guy, please, write in clear and concise language! No one knows or cares what all the little greek symbols are, they just want to know what works, what doesn’t work, and why.
Protect that state: locks, monitors, and atomics
You need to protect a piece of data, like a counter or an output stream, from getting garbled by multiple threads.
Three choices, hot shot:
- Explicit locks (aka mutexes): acquire a lock around the “critical section”, munge the data, release the lock. You have to manage the lock yourself. Multiple threads accessing the lock will not run concurrently anymore.
- Implicit locks (aka monitors): annotate methods that modify important data. The monitor library manages the lock for you. Threads still serialize around the lock, reducing concurrency.
- Atomic objects (aka compare-and-swap): use data structures that take advantage of runtime or processor semantics to guarantee that competing threads never interfere with each other. No locks! Much less serializing! Not broadly applicable, but I highly recommend them when you have the means.
Mutexes, aka lock “classic”
Mutexes are the lowest level of locks, at least in Ruby. They are the ur-locks, the most primitive of locks; everything is built on top of them. With any luck, you won’t ever need to use them directly, but it helps knowing how they work.
Eighty percent of what you need to know is synchronize
. You create a lock, and then you use it to protect a piece of code that would go sideways if multiple threads hit it at the exact same time. Here’s a little class that locks around printing to standard output:
class Output
def initialize
@lock = Mutex.new
end
def log(msg)
@lock.synchronize { puts msg }
end
end
Using Output#log
instead of puts
will prevent the output of your multithreaded program from getting jumbled and confused by everyone writing to stdout
at the same time. You could manually lock
and unlock
a Mutex if you had special needs.
Let’s talk counters
For the next couple examples, we’re going to implement a counter. Multiple threads will update said counter, so it needs to protect itself. Here’s how we use the counter:
require 'thread'
CORES=2
ITERS=1_000
threads = CORES.times.map do |n|
Thread.new do
ITERS.times do |i|
out.log("Thread #{n}: Iteration: #{i} Counter: #{counter.value}") if i % 100 == 0
counter.incr
end
end
end
threads.each(&:join)
p counter.value
My Macbook Air has two real cores (don’t believe the hype!) and we’ll increment the counter a thousand times in each thread. Every hundred times through the loop, we’ll show some progress. At the end, we join
each thread and then print the value of our counter. If all goes well, it will be CORES * ITERS
.
All would not go well with this naive implementation:
class WildCounter
def initialize
@counter = 0
end
def incr
@counter = @counter + 1
end
def value
@counter
end
end
If two threads execute incr
at the same time, they will misread @counter
or unintentionally overwrite a perfectly good value that was incremented behind their back.
We could protect this counter with a mutex, but I want to show you two other ways to go about it.
Monitors, aka intrinsic locks
Turns out, a well-designed class will tend to isolate state changes to a few methods. These “tell, don’t ask” methods are what you’ll likely end up locking. It would be pretty rad if you could just wrap a lock around the whole method without having to create variables and do a bunch of typing, don’t you think?
Those are a thing! They’re called monitors. You can read a bunch of academic stuff about them, but the crux of the biscuit is, a monitor is a lock around an entire instance of an object. You then declare methods that can only execute when that lock is held. Here’s a counter that uses a monitor:
require 'monitor'
class MonitorCounter
def initialize
@counter = 0
# No idea why this doesn't work inside the class declaration
extend(MonitorMixin)
end
def incr
synchronize { @counter = @counter + 1 }
end
def value
@counter
end
end
It doesn’t look too much different from our naive counter. In the constructor, we extend Ruby’s MonitorMixin
, which imbues this class with a lock and a synchronize
method to protect mutator methods. (Ed. if anyone knows why the extend
has to happen in the constructor instead of in the class declaration, I’m extremely stumped as to why!)
In incr
, where we do the dirty work of updating the counter, all we need to do is put the actual logic inside a synchronize
block. This ensures that only thread may execute this method on any given object instance at a time. Two threads could increment two counters safely, but if those two threads want to increment the same counter, they have to take turns.
A brief note on terminology: many Java concurrency texts refer to monitors as “intrinsic” locks because, in Java, they are part of every object. Mutexes are referred to as “extrinsic” locks because they aren’t tightly associated with any particular object instance.
Atomics, aka “wow that’s clever!”
It turns out that, in some cases, you can skip locks altogether. Amazing, right?!
Unfortunately, Ruby doesn’t have core support for atomic objects. Fortunately, Charles Nutter’s atomic
library provides just that. It exploits operations provided by the underlying platform (the JVM in the case of JRuby, atomic compare-and-swap operations on Intel in the case of Rubinius) to implement objects that are guaranteed to update within one processor clock cycle. These operations work by taking two parameters, the old value and the new value; if the current value matches the old value, it’s safe to update it to the new value. If it doesn’t match, the operation fails and you have to try again.
Phew! Now you know a lot about atomic processor operations.
“Show me right now, Adam!” you say. Much obliged.
require 'atomic'
class AtomicCounter
def initialize
@counter = ::Atomic.new(0)
end
def incr
@counter.update { |v| v + 1 }
end
def value
@counter.value
end
end
Luckily, Atomic
encapsulates all the business of comparing and swapping and knowing about how to use atomic instructions. It maintains the value of the object internally and handles all the swapping logic for you. Call update
, change the object in the block, and go on with your life. No locks necessary!
If that doesn’t make you love modern computer hardware, you are a programmer who does not know joy.
Tread carefully
Congratulations, you are now somewhat conversant on the topic of locking in concurrent Ruby programs. You know what the tools are, but, unfortunately, I haven’t the space to educate you on all the ways you are now equipped to shoot yourself in the foot. If you’re curious, you can read up on deadlock, livelock, starvation, priority inversion, and all the failure cases for dead processes left holding a lock.
The principle I try to follow, when I’m presented with a problem that needs locking, is to ask if I can work around the need for locking somehow. Could I use a Queue or atomic? Could I isolate this state in one thread and obviate the need for the lock? Is this state really necessary at all?
To anti-quote Ferris Buehler’s Day Off, when it comes to adding locks, “I highly unrecommend it, if you have the means”.
Future lies
It’s easy to delude yourself when writing software. Do these tests really describe what the application does? Does the documentation really describe how the system works now? Is this comment an accurate assertion on the state of affairs in the application?
My experience is that there’s little to solve this problem besides discipline. Always double check that you haven’t invalidated something that was written down in the margins. If there’s a way to encode something in code instead of prose, do it.
Vigilance against future-lies is an ever-mindful challenge.
Too eager to add code
I’m a little too eager to add code. If there’s a mess that needs refurbishing, rather than refactoring, I’m too quick to create a parallel world that is nice and tidy like I’d like it. Problem is, I don’t come back to the code in want of refurbishing enough. I know I should rejigger it to use the new shiny bits. For some reason, call it inertia, I don’t.
This is a shot across my own bow. Prefer refactoring to refurbishing. Prefer refurbishing to jumping into something new. Prefer shipping code to all of the previous tactics.
"Surround yourself with beautiful software"
Building an army of robots, Kyle Kneath on GitHub's internal tools. The closing line of this deck is "Surround yourself with beautiful software". One of the most compelling things I've looked at this year.
Etsy's rules of distributed systems
Architecting for change. Complex systems and change:
- Distributed systems are inherently complex.
- The outcome of change in complex systems is hard to predict.
- The outcome of small, frequent, measurable changes are easier to predict, easier to recover from, and promote learning.
I’d have thought all the useful things to say about Etsy were said, at this point, but I’d have thought wrong!
There’s a good saying about designing distributed systems that goes something like “avoid it as long as possible”. I think these three guidelines are worth adding to that saying. Iterate, examine, repeat. Don’t make big, tricky changes. In fact, large change you can’t recover from are nearly impossible to make anyway, so route around them entirely.
The last bit, “promote learning”, is great too. I follow distributed systems and database designers on Twitter and see tons of great papers and ideas in the exchange. More than that, always teach your teammates about the distributed systems you’re building. The more they know about the design and constraints of the system you’re making, the easier it is for them to work with those systems. If you can’t teach someone to use your system, you probably don’t understand it well enough.