Cassandra at Gowalla

Over the past year, I’ve done a lot of work making Cassandra part of Gowalla’s multi-prong database strategy. I recently spoke at Austin on Rails on this topic, doing a sort of retrospective on our adoption of Cassandra and what I learned in the process. You can check out the slide deck, or if you’re a database nerd like me, dig into the really nerdy details below.

Why does Gowalla use Cassandra?

We have a few motivations for using Cassandra at Gowalla. First off, it’s become out database of choice for applications with relatively fixed query patterns that, for us to succeed, need to handle a rapidly growing dataset. Cassandra’s read and write paths are optimized for these kinds of applications. It’s good at keeping the hot subset of a database in memory while keeping queries that require hitting disk pretty quick too.

Cassandra is also great for time-oriented applications. Any time we need to fetch data based primarily on some sort of timestamp, Cassandra is a great fit. It’s a bit unique in this regard, and that’s one of the main reasons I’m so interested in Cassandra.

Cassandra is a Dynamo-style database, which yields some nice operational aspects. If a node goes down over night, we don’t take an availability hit; the ops people can sleep through the night and fix it later. The Cassandra developers have also done a great job of eliminating all the cases where one need to an entire Cassandra cluster at one time, resulting in downtime.

When does Gowalla not use Cassandra?

I don’t think Cassandra is all that great for iterating on prototypes. When you’re not sure what your data or queries will end up looking like, it’s hard to build a schema that works well with Cassandra. You’re also unlikely to need the strengths that a distributed, column-oriented database offers at that stage. Plus, there aren’t any options for outsourced Cassandra right now, and early-stage applications/businesses rarely want to devote expertise to hosting a database.

Applications that don’t grow data quickly, or can fit their entire dataset in memory on a pair of machines doesn’t play to Cassandra’s strengths either. Given that you can get a machine with a few dozen gigabytes of memory for the cost of rent in the valley, sometimes it does pay out to scale vertically instead of horizontally as Cassandra encourages.

Cassandra applications at Gowalla

We have a handful of applications going that use Cassandra:

Audit: Stores ActiveRecord change data to Cassandra. This was our training-wheels trial project where we experimented with Cassandra to see if it was useful for us. It was incrementally deployed using rollout and degrade. Worked well, so we proceeded.
Chronologic: This is an activity feed service, storing the events and timelines in Cassandra. It started off life as a secondary index cache, but became a system of record in our latest release. It works great operationally, but the query/access model didn’t always jive with how web developers expected to access data.
Active stories: We store “joinability” data for users at a spot so we can pre-merge stories and prevent proliferation of a bunch of boring, one-person stories. This was built by Brad Fults and integrated in one pull request a few weeks before launch. The nice thing about this one was that it was able to take advantage of Cassandra’s column expiration and fit really nicely into Cassandra’s data model.
Social graph caches: We store friend data from other systems so we can quickly list/suggest friends when they connect their Gowalla profile to Facebook or Twitter. This started life on Redis, but the data was growing too quickly. We decoupled it from Redis and wrote a Cassandra backend over a few days. We incrementally deployed it and got Redis out of the picture within two weeks. That was pretty cool.

What worked?

Stable at launch. A couple weeks before launch, I switched to “devops” mode. Along with Adam McManus, our ops guy, we focused on tuning Cassandra for better read performance and to resolve stability problems. We ended up bringing in a DataStax consultant to help us verify we were doing the right things with Cassandra. The result of this was that, at launch, our cluster held up well and we didn’t have any Cassandra-related problems.
Easy to tune. I found Cassandra interesting and easy to tune. There is a little bit of upfront research in figuring out exactly what the knobs mean and what the reporting tools are saying. Once I figured that out, it was easy to iteratively tweak things and see if they were having a positive effect on the performance of our cluster.
Time-series or semi-granular data. Of the databases I’ve tinkered with, Cassandra stands out in terms of modeling time-related data. If an application is going to pull data in time-order most of the time, Cassandra is a really great place to start. I also like the column-oriented data model. It’s great if you mostly need a key-value store, but occasionally need a key-key-value store.

What would we do differently next time?

Developer localhost setups. We started using Cassandra in the 0.6 release, when it was a giant pain to set up locally (XML configs). It’s better now, but I should have put more energy into helping the other developers on our team getting Cassandra up and working properly. If I were to do it again, I’d probably look into leaning on the install scripts the cassandra gem includes, rather than Homebrew and a myriad of scripts to hack the Cassandra config.
Eventual consistency and magic database voodoo. Cassandra does not work like MySQL or Redis. It has different design constraints and a relatively unique approach to those constraints. In advocating and explaining Cassandra, I think I pitched it too much as a database nerd and not enough as “here’s a great tool that can help us solve some problems”. I hope that CQL makes it easier to put Cassandra in front of non-database nerds in terms that they can easily relate to and immediately find productivity.
Rigid query model. Once we got several million rows of data into Cassandra, we found it difficult to quickly change how we represented that data. It became a game of “how can we incrementally rejigger this data structure to have these other properties we just figured out we want?” I’m not sure that’s a game you can easily win at with Cassandra. I’d love to read more about building evolvable data structures in Cassandra and see how people are dealing with high-volume, evolving data.

Things we’ll try differently next time

More like a hash, less like a database. Having developed a database-like thing, I have come to the conclusion that developers really don’t like them very much. ActiveRecord was hugely successful because it was so much more effective than anything previous to it that tried to make databases just go away. The closer a database is to one of the native data structures in the host language, the better. If it’s not a native data structure, it should be something they can create in a REPL and then say “magically save this for me!”
Better tools and automation. That said, every abstraction leaks. Once it does, developers want simple and useful tools that let them figure out what’s going on, what the data really looks like, tinker with it, and get back to their abstracted world as quickly as possible. This starts with tools for setting up the database, continues through interacting with it (database REPL), and for operating it (logging, introspection, etc.) Cassandra does pretty well with these tools, but they’re still a bit nerdy.
More indexes. We didn’t design our applications to use secondary indexes (a great feature) because they didn’t exist just yet. I should have spent more time integrating this into the design of our services. We got bit a lot towards the end of our release cycle because we were building all of our indexes in the application and hadn’t designed for reverse indexes. We also designed a rather coarse schema, which further complicated ad-hoc querying, which is another thing non-database-nerds love.

What’s that mean for me?

Cassandra has a lot of strengths. Once you get to a scale where you’re running data through a replicated database setup and some kind of key-value database or cache, it makes sense to start thinking about Cassandra. There are a lot of things you can do with it, and it lets you cheat in interesting ways. Take some extra time to think about the data model you build and how you’ll change it in the future. Like anything else, build tools for yourself to automate the things you do repeatedly.

Don’t use it because you read a blog post about it. Use it because it fits your application and your team is excited about using it.