Event Sourcing (12): Snapshots to Accelerate Aggregate Reads

Original link: https://teddy-chen-tw.blogspot.com/2022/07/12.html

July 04 21:23~23:39

▲Figure 1: Using snapshots to accelerate Aggregate read speed

foreword

The Event Sourcing system is very simple and fast on the writing side, but reading because it needs to use all the domain events to which the Aggregate instance belongs to get the latest status one by one, it often makes people feel “very slow” (illusion?). Therefore, when it comes to Event Sourcing, in addition to applying CQRS to separate the writing and reading models to speed up the reading speed, another common acceleration method is to create a snapshot for the state of the Aggregate (Snapshot) ; This episode is about that topic.

***

Snapshot principle

Please refer to Figure 1. The original data of Account Aggregate is stored in Event Stream. Every time AccountRepository loads Account, it must be passed from E1 to EN to Account to re-apply these events to calculate the latest state. In order to speed up the loading of the Account, generate a snapshot for a certain version of the Account and store the snapshot in the Snapshot Stream. The next time AccountRepository loads the Account, it will go to the Snapshot Stream to find the latest snapshot, and then write the snapshot data back to the Account, and then read the new events generated after the snapshot from the original Event Stream, and then store these snapshots. The event is passed to the Account and applied again.

In Figure 1, there are two snapshots in the Snapshot Stream, the first version=10, which is the snapshot of the first 10 events. The second snapshot is the latest snapshot, version=20, which means it is the snapshot of the first 20 events. When the AccountRepository loads the Account, it reads the last data of the Snapshot Stream to obtain the latest snapshot. Because the version of the snapshot is 20, the AccountRepository then reads events from the 21st position of the Event Stream until the end, and then stores these events. Reapply once. In this example, it was originally required to apply N events to obtain the latest status, but after the snapshot, only (N – 20) events need to be applied.

***

implement

There are many ways to implement snapshots. A common method is to let Aggregate apply the Memento design pattern to generate snapshots and restore the status from the snapshots by itself. Figure 2 is a category diagram of the Memento design pattern. This design pattern has three roles:

Originator : The object that needs to save the snapshot data, in the case of the ezKanban example to be introduced later, it is the Tag Aggregate.
Memento : Snapshot, in the Memento design pattern, the snapshot is called Memento (memorandum)
Caretaker : The negative is the object that stores the snapshot. In the case of ezKanban, you can directly modify the TagRepository code to make it act as a Caretaker. If you want to abide by the open and closed principle and don’t want to modify the TagRepository that is already movable, you can apply the Decorator design pattern to the “plug-in” snapshot function on the Repository.

▲Figure 2: Memento design pattern category diagram

***

Figure 3 shows the snapshot interface defined in ezKanban. It has only two methods, getSnapshot() and setSnapshot(). The former is used to generate snapshots, and the latter is used to restore the state from snapshots.

▲Figure 3: Memento interface

Figure 4 shows the code for Tag Aggregate to implement the Memento interface. The TagSnapshot record in line 14 is the snapshot itself , which represents the current state of the Tag. The getSnapshot method on line 25 directly returns a new TagSnapshot, and the setSnapshot method on line 30 writes the value of the incoming TagSnapshot into the Tag, which is equal to replying to the status of the Tag. Line 18 is a static factory method that can get a Tag object directly from TagSnapsht.

▲Figure 4: Tag Aggregate implements Memento interface

Figure 5 shows the save method of TagRepository that supports the snapshot version. Line 72 stores the domain events of the version, and lines 74 to 84 determine whether to save the snapshot. This version of TagRepository will determine how many domain events to record a snapshot according to the snapshotIncrement value set by the user. If snapshotIncrement is equal to 100, a snapshot will be recorded for more than 100 events.

▲Figure 5: The save method of TagRepository (support snapshot version)

Figure 6 shows the findById method of TagRepository that supports the snapshot version. Line 46 loads the old Snapshot field event from the Snapshot Stream. If the Snapshot does not exist, line 49 directly loads the Tag in the original way; if it exists, line 53 The Tag is generated by the Snapshot, and then the original field events of the Tag with the Snapshot version number plus 1 are loaded (Lines 54~55), and then they are applied one by one (Line 56).

▲Figure 6: TagRepository’s findById method (supports snapshot version)

The above method of generating and loading snapshots can be used not only on Aggregate, but also on Event Sourced Read Model. Please refer to the previous episode <Event Source (11): Writing JavaScript to Generate Custom Projection in EventStoreDB> mentioned in the previous episode. The accelerated EventStoreDB adopts the user-defined Projection as the Read Model.

***

Do you really need snapshots?

If the snapshot is implemented in the above way, because the Memento design pattern is applied, the Aggregate Root needs to implement the Memento interface. That is to say, from the perspective of Clean Architecture, not only the Repository implementation in the Adapter layer needs to be modified, but also the Aggregate in the Entity layer. This is not a big change, but it makes the system more complicated. Snapshots do not need to be taken unless necessary.

What is necessary? This should be judged from the length of the Aggregate’s life cycle. Taking ezKanban as an example, Card Aggregate represents a job on the kanban board. From birth to death (the work is completed and filed), there may be dozens of domain events on it. In this case, creating a snapshot for Card is actually unnecessary. However, if it is a bank’s Account object, it represents the user’s transaction record. Usually bank customers may exist for several years or even decades, and the accumulated transaction records are very impressive. In this case, there may be performance issues if you don’t accelerate through snapshots.

However, in the Event Sourcing system, snapshots and CQRS are not the only methods that can increase the reading speed. There is also an “annual settlement (fixed-time settlement)” approach that is similar to snapshots. Taking a bank as an example, at the beginning of each year, it is possible to “condense (compress)” each Account transaction record of the previous year into a new domain event, and move the old event from the original event stream to another event. stream (or a new event stream every year can also be generated).

Teddy once saw a saying in a book (or an article, forget the source): “If there are less than 10,000 events in the field, you don’t need to take a snapshot.” The villagers can refer to this data.

***

next episode preview

The next episode talks about another more advanced but important topic: Event Versioning .

***

Yuzo’s inner monologue: The Memento design pattern finally comes in handy.

This article is reprinted from: https://teddy-chen-tw.blogspot.com/2022/07/12.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment Cancel Reply