What about event sourcing?

I was chatting about Event Sourced data models with a pal last week. He is really taken by the idea and excited that perhaps its a “next big thing” in data modeling. Regretfully, I have an adverse reaction to “next big thing” thinking and pointed out that Event Sourced data models are more complex than the equivalent third-normal form data model. Thus, I said, tooling and education need to set in before Event Sourcing could achieve broad impact.

(Before I proceed, I need to put forth a lament of vocabulary. Events, in this context, are not fine-grained language constructs like in a continuation-passing-style asynchronous system. They are business events, a sale or page impression, or technical events, a request or cache hit. These are not callbacks.)

That said, there’s a few strings to pull from Event Sourcing that seem like possible trends:

  • Integration via event logs using something like Kafka. The low hanging fruit is to replace background jobs with messages on a Kafka stream. The next step is to think about messaging as reading from a database’s replication log.
  • Intermediate storage of historical event records in Hadoop. Once applications are publishing messages on changes to their data, you can slurp up each topic (one per domain model) into a Hadoop table. Then…
  • ETL of event logs in place of some messaging/REST integrations. Instead of querying another system or implementing a topic consumer, periodically query the event data in Hadoop. Transform it if necessary and load it into another application’s database. LinkedIn has extensive tooling for this and it seems like they have done their homework.
  • Data and databases modeled around the passage of time. Event Sourcing is sort of like introducing the notion of accounting to database records. We can go a step further and model our data such that we can travel forward or back in time, not just recalculate from the past. Git has a model of time. Datomic is modeled on time.
  • Event Sourcing as an extension of third-normal form. We still need normalized data models, and we still need the migration, ORM, and reporting tooling built on top of them. Event Sourcing gives us an additional facet to our data. Now, instead of just having the data model, we have the causality that created it. (If you’re curious, probably the enabling technology for storing all that causality is the diminishing cost of storage, adoption of append-only data structures, and data warehouses.)
  • Synchronization streams instead of REST for disconnected clients. When you store the events that brought data to where it is, and you have a total ordering on those events, you can keep disconnected applications up to date by sending them the events they’ve missed. This is way better than clever logic for querying the central database to update state without squashing local state. Hand-wavy analogy: think Git instead of SQLite (both are wonderful software).

In particular, the case for synchronization is when things started clicking for me. Hat tip to David Nolen’s talk on Om Next (start at 17:12) for this. As we continue building native and mobile web apps that are frequently disconnected, we may need an additional tool to augment resource-based workflows. In the same way that perhaps Event Sourcing is something we build as an extension of third-normal form data models, I’ll bet event logs as APIs will pop up more often. But we may see event logs entirely usurping resource workflows. Why implement consuming a log and implementing updates via REST when you could write a log producer and ship new events off to the server?

The developer impedance mismatch I’m finding with message logs is request-reply thinking. There’s a temptation to recreate REST semantics in Kafka topics. If a consumer fails to process a message, does it stop processing entirely, skip the message, discard the message? Does it notify another consumer via a separate topic, or does it phone home to its developers via an error notification? I haven’t found a satisfying answer to this, but I suspect its a matter of time, education, and tooling.

Adam Keys @therealadam