Logs from Finland

Not that kind of log

First, read all of this excellent distillation of distributed systems by Jay Kreps,  The Log: What every software engineer should know about real-time data’s unifying abstraction. Now, consider this.

There’s a moment, when you’re building services and web APIs, when you think you’ve pretty much got it under control. You’ve got an endpoint for every query, a resource for every workflow. All the use-cases seem to be under control. And then, the question appears:

“How can I get access to all the updates to all the data? You know, for [REASONS].”

For APIs exposed to external developers over the web, this is where you’d reach for web hooks or PubSubHubbub. It’s not the best solution, but it works. If you’re building an internal system, you could  use the same approaches, or…you could build a log.

No not that kind of log. An event log, like LinkedIn did with Kafka for their internal systems. Every time your data model changes, every create, update, or delete, you drop an event with all the metadata related to the change. The event goes into some kind of single-producer, multiple consumer queue. Then all the clients that want to know about all the changes to all the things can read events off the queue and do whatever it is they need to for those important REASONS.

If you find this intriguing, this is a lot like replication in database systems. Definitely read LinkedIn’s article on this, and definitely read up on how your database of choice handles replication. And if you’ve built this before and have a good answer to initially populating “replicas” of a database, let me know; I haven’t come up with anything better than “just rsync it”.

Synchronize icon

Dear Sync Diary

Brent Simmons is keeping a diary as he works through implementing sync for Vesper, an iOS note-taking app. Building this sort of thing isn’t easy; cf. it took Cultured Code multiple years to implement it for Things. Thus it’s pretty neat that Simmons is breaking it down into understandable chunks:

If you think you need to implement a synchronization system for your application, try to find a shortcut so that you don’t. If you can’t find a shortcut, you could do worse than starting with these notes.

Tony Romo media circuses over the years

2011: should not have thrown that pass
2012: should not have allowed that pass to be tipped and intercepted
2013: should not have allowed his back to become herniated
2014: should not have gone back for that last donut

The Romo singularity is nigh! Soon Romo’s every facial expression will trigger wild fluctuations in betting lines. Prepare yourself.

Cute aliens

Aliens ate my program’s state

So, Adam from a few years ago, you think you can build a distributed system?

Designing a concurrent system, one that runs across multiple processors using shared memory threads, is difficult because someone can move your cheese. One moment you’ve got food on your plate, and the next moment it’s on someone else’s plate.

Designing distributed systems, one that runs across multiple computers over networks that occasionally do weird things, is difficult because you can’t assume that the message will get through. One moment you’re telling your friend about an awesome new album and the next moment they don’t even know who you are anymore.

In both scenarios, you can’t assume the simplest case. You have to discard Occam’s razor; it’s entirely possible a perfectly devious alien is tinkering with your system, swallowing messages or preventing threads from running. For every operation in your system, you have to ask yourself what could go wrong and how will my system deal with that?

The first jarring thing about these sorts of problems is that you’re probably not using a framework. There are libraries to help you with logging, instrumentation, storage, coordination, and low-level primitives. There aren’t a lot of well-curated collections of opinions expressed as code. There’s no Rails, no Play, no Django, no Cocoa. There’s hardly even a Sinatra, a Celery, a JDBC.

That means you’re going to be designing, for real. Drawing on whiteboard, building prototypes and proofs of concepts. Getting feedback from as many smart people as you can corner. Questioning your assumptions, reading everything you can find on the topic. Looking for tradeoffs that decrease risk and simplify your problem domain.

Accept that you’re unlikely to ever get it entirely right. Those who are really good at it are more like Scotty than Spock. They’re surrounded by levers and measurements, tweaking and rebuilding the system as they go. It’s a fun puzzle!

NSHipster rainbow bar

NSHipster Rainbow bar

My favorite design touch in the Gowalla apps was the rainbow bar. Small wonder that fellow Gowalla alumni Mattt Thompson included one on the landing page for his collection of NSHipster essays that you can now buy as an eBook with cash money. Ed. apparently all the Gumroad pages have a rainbow bar. Mattt’s still awesome.

Database rivalry in the Valley

A couple years ago, Google released an embeddedable key-value database called LevelDB. There was much rejoicing. Recently, Facebook released their fork of LevelDB, RocksDB. There’s a slide deck comparing the two, but the amusing part is the “our way is better!” subtext. A little bit of Valley rivalry there. You can also learn a lot of interesting tidbits about the design of modern, high-performance database systems from the Architecture Guide and Table Format documentation of RocksDB.

Say what you will about tiny, highly-targeted ads. To some extent they are subsidizing a lot of interesting technology development and the open sourcing thereof.