Curated
Get in my ears, you dissonant chord
Petrushka chord. Two major chords played a half-tone apart. So, it sounds good, except it sounds grating. It's a motif throughout Stravinsky's ballet Petrouchka. Ergo, like everything Stravinsky, get in my ears! Listen to it and learn more about the chord from the awesome "Feynman of classical music" (I just made that up) Leonard Bernstein.
Ruthlessness and fighting for it
Being ruthless to yourself means every time you say “oh, I’ll just open up this internal bit over here…” use that moment to give yourself whatever negative feedback you need to go back and write the correct interface. Imagine the bugs you’ll get later. Give yourself a 12 volt shock through your chair. Picture the sleepless nights chasing down an issue that only happens for that one guy but it’s the guy who signs your paycheck.
I dropped this in my drafts folder months ago and came back to it today. It’s still something I need to hear. Get it working, and then ruthlessly edit and refactor until it’s code that won’t cause you to cringe when others bring it up.
In improv and code, I’ve recently come across the notion that there are things we need to fight for. Fight, not in the sense of conflict, but in the sense that there is an easy or natural way to do something, and then there is the way that maintains our sense of pride and quality. Not necessarily the “right” or high-minded way to do something, but the way that does not leave us feeling compromised as a creative person.
Your homework is to write down the qualities important to you, the ones that make you proud of your work and happy to share it. Then work from this checklist every day:
- Write the code, rehearse the scene, play the song, etc.
<li>Decide whether it expresses your qualities.</li>
<li>If it does, ship it. If it doesn't, edit ruthlessly until it does.</li>
Rinse/repeat/tell everyone about it.
Working with Ruby's GVL
Visualising the Ruby Global VM Lock. A nice commit-by-commit look at how extensions for Ruby 1.9 work with the GVL, what that looks like as tests run, and how to release the GVL to allow for better parallelism.
How Ruby IO is formed
Ruby's IO Buffering And You! Jesse Storimer screencasts his way through what happens when you read and write to files and sockets in Ruby, explaining the behavior and spelunking through Rubinius' implementation of IO. You'll learn stuff. If you want to learn even more stuff, check out Jesse's new book Working with TCP Sockets. Jesse is fantastic at describing Unixy things concisely; you'll like it.
A handful of useful project mantras
You could do a lot worse than following the heuristics set out by this Software Architecture cheat sheet. The tip I need to follow more often is "Is There Another Way"; I frequently get way too caught up in my first idea, which is usually too simplistic or requires too much architecture. The tip I often try to guide people towards is "What If I Didn't Have This Problem?"; routing around problems or trying to reduce them to problems that require less code is a super-powerful judo chop.
bitly's nsq has some good ideas
NSQ is a realtime message processing system designed to operate at bitly's scale, handling billions of messages per day.It promotes distributed and decentralized topologies without single points of failure, enabling fault tolerance and high availability coupled with a reliable message delivery guarantee.
No SPOFs and reliable message delivery, without relying on something like ZooKeeper, is a big claim. They have some novel approaches to these problems.
First, they run an intermediary daemon, nsqlookupd
, between the producers/consumers and the actual queues. These daemons monitor all the available queue servers and tell the clients what to connect to. No configuration of actual queue servers is known to applications. They then run multiple lookup daemons, which are stateless and don’t need to agree with each other in order for the system to operate properly.
Reliable message delivery is provided with at-least-once message delivery semantics. They require all consumers to de-duplicate messages or restrict their operations to idempotent operations. Not exactly legacy friendly, as many applications are coded with the assumption of a closed, one-shot world. But. Idempotence: I highly recommend it if you have the means.
If you need to prevent losing messages due to the FBI stealing your servers, which is something you definitely need to account for, you can set up redundant pairs of servers and rely on deduplication/idempotence to make sure you’re only processing messages once, even if you consume them multiple times.
In summary: lots of good ideas here. Perhaps some of them could be applied to how people are using Resque?
I got Clojure stacks
Here’s a Sunday afternoon hack. It’s a “stack” machine implemented in Clojure. I intended for it to be a stack machine, no airquotes, but I got it working and realized what I’d really built was a machine with two registers and instructions that treat those two registers as a stack. Pretty weird, but it’s not bad for a weekend hack.
I’m going to break my little machine down, and highlight things that will feel refreshingly different to someone, like me, who has spent the past several years in object-oriented languages like Ruby. What follows is observations; I’m still very new to Clojure, despite familiarity with the concepts, so I’ll pass on making global judgements.
Data structures as programs as data
I’ve seen more than one Rubyist, myself included, say that code-as-data, a concept borrowed from Lisp’s syntax, is possible and regularly practiced in Ruby. DSLs and class-oriented little languages accomplish this, to some degree. In my experience, this metaprogramming is really happening at the class level, using the class to hold data that dynamic code parses to generate new behaviors.
In contrast, Clojure, being a Lisp, programs really are data. To wit, this is the crux of my stack machine; the actual stack machine program is a Clojure data structure that in turn specifies some Clojure functions to execute:
(def program
[['mpush 1]
['mpush 2]
['madd]
['mpush 4]
['msub]
['mhalt]])
(run program)
If you’ve never looked at Clojure or Lisp code, just squint and I bet you’ll keep up. This snippet defines a global variable, of sorts, program
, whose value is a list of lists (think Arrays) specifying the instructions in my stack machine program. In short, this program pushes two values on the stack, 1 and 2, adds them, pushes another value 4, subtracts 4 from the result of the addition, and then halts, which prints out the current state of the “stack” registers.
I’ve got a function named run
which takes all these instructions, does some Clojure things, then hands them off to instruction functions for execution.
Some familiar idioms
Let’s look at run
. It’s really simple.
(defn run [instructions]
(reduce execute initial-state instructions))
This function takes one argument, instructions
, a Clojure collection (generally called a seq
; this one in particular is a vector
). Clojure has an amazing library of functions that operate on collections, just as Ruby has Enumerable
. In fact, reduce
in Clojure is the same idea as inject
in Ruby (reduce
is aliased to inject
in Ruby!). The way I’m calling it says “iterate over a collection instructions
, calling execute
on each item; on the first iteration, use initial-state
as the initial value of the accumulated collection”.
initial-state
is another global variable whose value is a mapping (in Ruby, a hash) that maintains the state of the machine. It has two keys, op-a
and op-b
, representing my two stack-ish registers.
(def initial-state
{:op-a nil :op-b nil})
Now you’d expect to find an execute
function that takes a collection plus a value and generates a new version of the collection, just like Ruby’s inject
. And here that function is:
(defn execute [state inst]
(let [fun (ns-resolve *ns* (first inst))
params (rest inst)]
(apply fun [params state])))
This one might require extra squinting for eyes new to Clojure. execute
takes two arguments, the current state of the stack machine, state
, and the instruction to execute, inst
. It then uses let
to create local variables based on the values of function’s parameters. I use Clojure’s mechanism for turning a quoted variable name (quoting, in Lisp, means escaping a variable name so the interpreter doesn’t try to evaluate it) into a function reference. Because the instruction is of the form [instruction-name arg arg arg ...]
, I use first
and rest
to split the instruction into the function name, bound to fun
and argument list, bound to params
.
The meat of the function “applies” the function I extracted in the let block to the arguments I extracted out of the instruction. Think of apply
like send
in Ruby; it’s a way to call a function when you have a reference to it.
The sharp reader would now start searching for a bunch of functions, each of which implements an instruction for our stack machine. And so…
Some boilerplate arrives
Here is the implementation for mpush
, madd
, and mhalt
:
(defn mpush [params state]
(let [a (state :op-a)
b (state :op-b)
v (first params)]
{:op-a v :op-b a}))
(defn madd [params state]
(let [a (state :op-a)
b (state :op-b)]
{:op-a (+ a b) :op-b nil}))
(defn mhalt [params state]
(println state))
Each instruction takes some arguments and the state of the machine. They do some work and return a new state of the stack machine. Easy, and oh-so-typically functional!
These instructions are where I’d introduce something clever-ish in Ruby. That let
where the register values are extracted feels really boilerplate-y. In Ruby, I know what I would do about that: a method taking a block, probably.
I’m not sure how I’d clean this up in Clojure. A macro, a function abstraction? I leave it as an exercise to the reader, and to myself, to find something that involves less copypasta each time a new instruction is implemented.
I found some pleasant surprises in this foray into Clojure:
- Building programs from bottom-up functions in a functional language is at least as satisfying as doing the same with a TDD loop in an object-oriented language. It is just a conducive to dividing a problem into quickly solved blocks and then putting the whole thing together. It does, however, lack a repeatable verification artifact as a secondary output.
- At first I was a little skeptical of the fact that Clojure mappings (hashes) can be treated as data structures, by passing them to functions, or as functions, by calling them using a key to extract as the parameter. In practice, this is a really awesome thing and it’s a nice way to write one’s own abstractions as well. There’s something to using higher-order functions more prevalently than Ruby does.
- The JVM startup isn’t quick in absolute terms, but at this point it’s faster than almost any Rails app, and many pure Ruby apps, to boot. Damning praise for the JVM and Ruby, but the take-away is I never felt distracted our out-of-flow due to waiting around on the JVM.
Bottom line: there’s a lot to like in Clojure. It’s likely you’ll read about more forays into Clojure in this space.
Faster, computer program, kill kill!
Making code faster requires insight into the particulars of how computers work. Processor instructions, hardware behavior, data structures, concurrency; it’s a lot of black art. Here’s a few things to read on the forbidden lore of fast programs:
Fast interpreters are made of machine sympathy. Implementing Fast Interpreters. What makes the Lua interpreter, and some JavaScript interpreters, so quick. Includes assembly and machine code details. Juicy!
Lockless data structures, the easy way. A Java lock-free data structures deep dive. How do those fancy java concurrent libraries work? Fancy processor instructions! Great deep dive.
Now is an interesting time to be a bottleneck. Your bottleneck is dead. Hardware, particularly IO, is advancing such that bottlenecks in code are exposed. If you’re running on physical hardware, especially if you have solid-state disks, your bottleneck is probably language-bound or CPU-bound code.
Go forth, read a lot, measure twice (beware the red herrings!), and make faster programs!
When to Sinatra, when to Rails
On Rails, Sinatra, and picking the right tool for the job. Pedro Belo, of Heroku fame, finds Rails is way better for pure-web apps and Sinatra is way better for pure-API apps. Most of it comes down to Rails has better tooling and Sinatra is better for scratching itches, which happens a lot more in APIs than applications. I’m not ready to pronounce this the final word, but what he’s saying lines up with much of my experience.
That said, you can get pretty far with a Rails API by segregating it from your application. That is, your app controllers inherit from ApplicationController
and your API controllers inherit from ApiController
. This keeps the often wildly different needs of applications and APIs nice and distinct.
Common sense code checks
Etsy’s Static Analysis for PHP. This isn’t as complicated as you might think. While Facebook’s HipHop is used, and is quite sophisticated, a lot of this is just common sense. Trigger code reviews when oft-misused functions are used or when functions that involve security things are introduced.
This stuff is great for an intern or new team member to get a quick win with. So next time you bring someone onto your team, why not turn them loose on these kinds of quick, big wins?
Cardinal sins
It is conceivable that a really good machine can learn our hash algorithm really well, but in the case of string hashing we still have to walk some memory to give us reasonable assurance of unique hash codes. So there's performance sin #1 violated: never read from memory.Avoiding Hash Lookups in a Ruby Implementation, on the quest to eliminate the use of ad-hoc hashes inside JRuby. I love that the cardinal sin of a runtime is to avoid memory reads. It makes avoiding random database lookups in web applications look like a walk in the park.
On the other hand, consider how much fun it is to write compilers; their cardinal sin is to avoid conditionals or anything that would stall the processor pipeline. If that seems pedestrian, then consider the cardinal sin of a processor designer: don’t do anything that will take longer than one clock cycle, or half a billionth of a second if you’re keeping score at home.
Three application growth stories
First you grow your application, then you grow your organization, and then you get down to the metal and eek out all the performance you can.
Evolution of SoundCloud’s Architecture, this is how you grow an application without eating the elephant too soon. I would love to send this back to Adam from two years ago. Note that they ended up using RabbitMQ as a broker instead of Resque as a job queue. This nuanced position put them in a pretty nice place, architecturally.
Addicted to Stable is equal parts “hey, you should automate things and use graphs/monitoring to tell you when things break” and “look at all of GitHub’s nifty internal tools”. Even though I’ve seen the latter a few times already, I like my pal John Nunemaker’s peak into how it all comes together.
High Performance Network Programming on the JVM explains how to choose network programming libraries for the JVM, some pro’s and con’s to know about, and lays out a nice conceptual model for building network services. Seems like this is where you want to start once you reach the point where your application needs to serve tens of thousands of client concurrently.
I’m going to keep posting links like these until, some day, I feel like I’m actually doing it right. Until then, stand on other people’s shoulders, learn from experience.
Know a feedback loop
TDD is one way to create a feedback loop for building your application. Spiking code out and then stabilizing it is another:
For most people, TDD is a mechanism for discovery and learning. For some of us, if we can write an example in our heads, our biggest areas of learning probably lie elsewhere. Since ignorance is the constraint in our system, and we’re not ignorant about much that TDD can teach us, skipping TDD allows us to go faster. This isn’t true of everything. Occasionally the feedback is on some complex piece of business logic. Every time I’ve tried to do that without TDD it’s stung me, so I’m getting better at working out when to do it, and when it’s OK to skip it.
TDD helps me a lot when I have an idea what the problem looks like. Spiking out a prototype and backfilling tests helps me when I don’t know what the problem looks like.
You’re possibly different in how you approach problems. If you’re flying more by the seat of your pants, or you aren’t including the composition and organization of the code in your feedback loop, I will probably insist you work on something that isn’t in the core layers of the application. That’s cool though; as long as you have any feedback loop that will nudge you towards better discovering and solving the core problem, we’re cool.
The test-driven astronaut
Don't Make Your Code "More Testable", make the design of your program better. Snappy test suites are all the vogue, but that misses the point of even writing tests: create a feedback loop to know when your program works and when your program is organized well. Listen carefully to the whispers in your code; if you're spending all your time writing tests or shuffling code instead of adding features, improving features, or shipping features then you're falling to the siren song of the test-driven astronaut.
Simplicators for sanity
For those rainy days when integrating with a not-entirely sane system is getting you down:
A Simplicator introduces a new seam into the system that did not exist when the service's byzantine API was used directly. As well helping us test the system, I've noticed that this seam is ideal for monitoring and regularing our systems' use of external services. If a widely supported protocol is used, we can do this with off-the-shelf components.
The Simplicator is a component that lives outside the architecture of your system. It exports a sane interface to your system. You test it separately from your system. Its only purpose in life is to deal with the insanity of others.
Hell is other people’s systems; QED this is a heavenly idea.
Smelly obsessions
Get Rid of That Code Smell - Primitive Obsession:
Think about it this way: would you use a string to represent a date? You could, right? Just create a string, let’s say "2012-06-25" and you’ve got a date! Well, no, not really – it’s a string. It doesn’t have semantics of a date, it’s missing a lot of useful methods that are available in an instance of Date class. You should definitely use Date class and that’s probably obvious for everybody. This is exactly what Primitive Obsession smell is about.
Rails developers can fall into another kind of obsession: framework obsession. Rails gives you folders for models, views, controllers, etc. Everything has to be one of those. Logic is shoehorned into models instead of put in objects unrelated to persistence. Controller methods and helpers grow huge with conditionals and accreted behavior.
This is partially an education and advocacy problem. Luckily, folks like Avdi Grimm, Corey Haines, Gary Bernhardt, and Steve Klabnik, amongst others, are spreading the word of how to use object oriented principles to design Rails applications without obsessing over the constructs in the Rails framework.
The second part is practice. Once you’ve educated yourself and bought into the notion that a Rails app isn’t all Rails classes, you’ve got to practice and struggle with the concepts. It won’t be pretty the first time; at least, it wasn’t for me. But with time, I’ve come to feel far better about how I design applications using both Rails principles and object-oriented principles.
How to think about organizing folders: don't.
Mountain Lion’s New File System:
Folders tend to grow deeper and deeper. As soon as we have more than a handful of notions, or (beware!) more than one hierarchical level of notions, it gets hard for most brains to build a mental model of that information architecture. While it is common to have several hierarchy levels in applications and file systems, they actually don’t work very well. We are just not smart enough to deal with notional pyramids. Trying to picture notional systems with several levels is like thinking three moves ahead in chess. Everybody believes that they can, but only a few skilled people really can do it. If you doubt this, prove me wrong by telling me what is in each file menu in your browser…
A well-considered essay on the non-recursive design of folders in iCloud, how people think about organizing documents, the emotions of organizing documents, and how it comes together in an app like iCloud. Great reading.
"Surround yourself with beautiful software"
Building an army of robots, Kyle Kneath on GitHub's internal tools. The closing line of this deck is "Surround yourself with beautiful software". One of the most compelling things I've looked at this year.