Event sourcing is a powerful architectural pattern that records all changes made to an application’s state, in the sequence in which the changes were originally applied. This sequence serves as both the system of record where current state can be sourced from, as well as an audit log of everything that happened within the application over its lifetime. Event sourcing promotes decentralized change handling and state querying, meaning the pattern scales well and is a good fit for many types of systems that already deal with event processing or are looking to gain some of the many benefits offered by such a design.

What is Event Sourcing?

Subject matter experts (SME) typically describe their systems as a combination of entities, representing containers of system state, and change events, representing how entities get altered as a result of processing a variety of inputs within defined business processes. Often events get triggered in response to commands issued by users of the system, background processes or external system integrations.

At its core, event sourcing focuses on system change events.

Many architectural patterns deal with entities as their primary concern; these patterns consider how entities get stored, accessed and modified. Within this architectural view, change events are often invisible consequences of entity mutation.

Typically the system of record within these architectural patterns is the entity store, such as a relational database or a document store. While events can still be present in such architectures, they’re often transient in nature, possibly decoupled from the entities they relate to and hidden behind layers of business logic.

Event sourcing flips this design view around, focusing on how events get represented, how they get persisted, and how a sequence of events can get used to derive entity state. The system of record is a sequential log of all events that have occurred during a system’s lifetime.

The following illustrates what an event vs an entity store would look like for the same system (keep reading to find out what these represent):

vs.

By promoting events as the primary architectural concept, event sourcing is also a domain modelling paradigm that encourages closer alignment with a SME’s view of the system. Designing systems with a focus on events and event logs provides several main benefits:

  • Helps reduce impedance mismatches and the need for concept mapping, allowing technology teams to ‘speak the same language’ as the business when discussing the system
  • Encourages command/query responsibility separation, allowing writes and reads to be independently optimized
  • Provides state temporality and audit history as a matter of course – answering the question of what did the system look like at specific times in the past, and what were all events up to that point

How does Event Sourcing work?

Consider a simple banking example of a current account. In such a system there is a single entity, representing a bank account. For simplicity, we assume there is a single account, so it does not need to be explicitly identifiable via an account number or otherwise. The account contains the current account balance.

Two commands are available – instructions to deposit and withdraw money, and these need to specify the amount of money to either deposit or withdraw. A business rule ensures a withdrawal command can only be processed if the amount requested is equal to or less than the current account balance.

Given this design, two events can be identified – Account Credited and Account Debited. These events include the amount of money that was deposited or withdrawn. This could even be simplified to a single event with either a positive or negative amount, but for the sake of this example they are kept separate.

The following diagram illustrates this data model universe:

Notice that the events are ‘past tense’ – they specify what happened in the system at the time they were recorded, and are only recorded if processing a command was successful. With designs like this, care needs to be taken so that commands are not confused with events, especially if they effectively mirror each other.

Given the following command sequence:

1. deposit { amount: 100 }
2. withdraw { amount: 80 }
3. withdraw { amount: 50 }

The most basic event sourcing implementation requires an event log, which is just a sequence of events. The system processing these commands would end up with an event log of:

The third command could not be processed as the requested amount exceeded the available balance.

To derive the current account balance, the system needs to process or ‘source’ the events in sequential order. Given the two events, this derivation processing output would look like:

  • bank account { current balance: 0 } (starting state)
  • bank account { current balance: 100 } (processed: Account Credited, +100)
  • bank account { current balance: 20 } (processed: Account Debited, -80)

The current balance gets determined by processing all events up to the current time, but as each event has an implicit timestamp of when it was recorded, the state of the bank account’s balance can be determined at any point in time by only processing events up to that time.

This is a complete (if trivial) event sourcing design. In a real system, this example would likely require a few more pieces.

Implementers may want to record the sequence of commands to be able to identify how an event came to be, as well as have a separate ‘error event’ log that can record command requests that failed to process, so that accurate error handling can take place and to maintain complete history of the entire system – successes and failures.

Over time as the number of commands increases, the system may also want a way to record a running tally of the current account balance, so that when a withdraw command gets received, the business logic doesn’t need to reprocess the full list of events everytime to determine if the command can get processed (i.e. that the account has sufficient balance available to allow the withdrawal). This is an example of a derived state store, and is effectively the same as what an entity store would be for the system.

The following illustrates what the entity store for this example would look like, once all commands have been processed:

It’s clear this is a lot simpler than an event store implementation – which is a big reason why many designers choose to only use entity stores. The current account balance is immediately available to read without having to process all historic events.

It is not however an either/or question between event sourcing and maintaining entity stores. It is often the case that entity stores are also present within event sourcing designs.

Event Sourcing Implementation options

Technically, the only requirement to implement event sourcing is needing a way to store and read from the event log.

This could be as simple as using an append-only file, with each line representing a new event. A sequence of files in a filesystem would also work, with each file representing an event. However, more robust options are usually a preferred choice when designing larger systems with greater concurrency and scalability requirements.

Logs are a very common pattern within the software industry, usable via technologies such as messaging integrations (brokers, MOMs) and event streaming systems. Event sourcing typically builds upon these technologies with some form of messaging middleware acting as the event log. If the complete event history is needed, these technologies can be configured to persist all messages indefinitely.

Relational or document models usually focus on system entities; current state is easily accessible by reading one or more rows or documents out of a store. It’s worth mentioning that event sourcing and relational modelling paradigms are not mutually exclusive; event sourcing systems often include both. The key difference with an event sourcing design is that entity stores no longer act as the system of record – they could easily get replaced or rebuilt as needed by reprocessing the event log.

More complex event sourcing systems need to consider derived entity state stores for read efficiency reasons, as processing the full event log in order to calculate current system state is not always scalable over time. The database technologies that back relational or document designs can get used as both an event log (for example via an append-only ‘Events’ database table), as well as derived entity stores to allow quick retrieval of the system’s current state. This separation of concerns is effectively CQRS – a derived state store provides a separated ‘query’ responsibility for the application, that can get optimized independently of writing to the event log.

While it’s clear there are several technical aspects that need to be considered, these are not the only challenges when implementing an event sourcing system.

Potential Event Sourcing Challenges

While an event sourcing design provides many benefits, it also comes with its own drawbacks.

The biggest issues are usually around the mindset of development teams looking to implement such a design. Teams need to think beyond traditional CRUD apps and their entity stores. In conceiving such a design, teams need to begin viewing events as the primary concept of value.

The majority of effort in implementing event sourcing is usually spent on accurately modeling events. Once an event is written to the log it should be considered immutable (otherwise the system’s history – and indeed its current state – could be corrupted or misrepresented). The event log is the system of record, meaning great care should be taken to ensure events contain all the information they need to represent the system’s full state, as per the business requirements at that point in time. Consideration also needs to be given so that events can get properly interpreted and possibly even re-processed in the future, as the system (and the business it represents) changes over time. Strategies are also needed to handle incorrect or poison events so that data validity issues can get properly corrected.

For simple domain models, this mindset flip can be easy to accomplish, but can get challenging with more complex models (especially those with many dependencies or relationships between entities). Larger systems that rely on integrations with and data sourced from external systems can also prove challenging – in cases where external systems aren’t able to provide point-in-time views of their data, event sourcing systems may need to consider a facade over their external integrations that can simulate fetching historic data. This significantly increases complexity as more integrations get added.

Event sourcing can work well in large systems, as the event log pattern naturally scales horizontally given enough partitioning in the system’s dataset. For example, if events are aligned with the entities they represent, the log for one entity does not necessarily need to coexist with that of any other entity. However, this ease of scalability brings further challenges in the form of asynchronous processing and eventually consistent data modifications. State change instructions could be received by any command processing node, after which the system needs to: identify which other nodes are responsible for the affected entities; route the command to those nodes; process the command; then lastly replicate the generated events across further log storage nodes. Only after this process completes is the new event available to be sourced as part of reading the latest available system state; this is how event sourcing designs effectively require command processing channels to be separated from state querying channels – i.e. CQRS.

Event sourcing systems therefore need a way to deal with the intermediate time between issuing a command and receiving notification that an event is successfully recorded in the log. The current state of the system that users see during this intermediate time may be ‘incorrect’ – or more accurately – slightly out of date; appropriate designs (UX, supporting technologies, etc) need to be put in place to mitigate this risk. Sufficient error handling processes are also needed for situations where commands fail to process, are cancelled while still pending, or even when they are superseded by later events as part of data correction.

Another challenge comes in the future once event sourcing systems have been recording events for a period of time. A way to deal with historic events becomes necessary – it’s one thing to record all events that a system has processed, but being unable to interpret that history means the event log loses its value entirely. This is especially relevant during system failure recovery events, or when migrating derived state stores, where the full event log may need to get reprocessed to bring the system’s data universe up to date. For systems dealing with huge numbers of events, where reprocessing the full event log would exceed any recovery time objectives, periodic system state snapshots may also be required so that recovery can begin from a more recent known good state.

Teams need to consider how events get structured; how that structure can evolve over time as the set of fields changes; and, given changes to how the business operates over time, how events with older structures could possibly get processed with the current business logic. Having a defined, extensible event schema may help with future-proofing when recording the events, but extra processing rules may also be needed in the latest business logic so that older event structures can still be understood. Periodic snapshots could also serve as dividers between major changes to the event structure, where historic events end up costing more to support than the inherent value they have within the log.

Event Sourcing Conclusion

Event sourcing is a powerful pattern offering several valuable benefits. Another benefit is simplified future expansion, given the event log also serves as a long-timeframe pub/sub mechanism. New, unforeseen processing components or integrations can easily get added at any point, where they can then process the event log to bring themselves up to current state.

However as with any large architectural design decision, great care needs to be taken to ensure it is appropriate for a particular use case. Constraints around domain model complexity, data consistency/availability requirements, data growth rates and system lifespan/long-term scalability all need to get considered (by no means an exhaustive list!). Equally importantly, consideration also needs to be given for the teams that will be developing and supporting such a system over its lifespan.

As always, the most valuable piece of software engineering wisdom applies – strive to keep things as simple as possible.

If you need help architecting your next enterprise application or improving your current architecture, please contact us to discuss how we can help!