July 05 18:08~19:23; 20:07~21:06
▲Figure 1: Do cats need to do version control?
Because Event Sourcing treats events as a single source of truth , events are themselves an API. The event revision is like the API revision. It is a very important issue for the Event Sourcing system. If the events of different versions are incompatible, there may be problems when replaying events later, resulting in system errors or even unusability. This episode introduces the basic concepts and practices of Event Versioning.
Schema on Write and Schema on Read
In traditional relational databases, when data is written to a data table, the table schema must be known to successfully write data. This method is called schema-on-write . It is necessary to define the data format in advance, and then write to the database according to this format. The advantage is that the correctness of the data format can be verified when writing, but the disadvantage is that the data format is fixed at the time of writing, so it is relatively inflexible.
NoSQL databases (including EventStoreDB) and Apache Pulsar event brokers use another method called schema-on-read , which does not check the data format when writing (the data may be stored in the data in the form of byte array), so it is more flexible ; As for the correctness of the information, it is the responsibility of the reader.
In the Event Sourcing system, the formats of domain events are different, so it is almost impossible to use schema-on-write to store domain events, so schema-on-read is naturally used. But here comes the question, if you want to:
- In the case of schema-on-read, it is also possible to verify whether the data format is correct when the data is written . What should I do?
- In the case of schema-on-read, it is also possible to verify whether the data format is correct when the data is read . What should I do?
There are basically two ways to verify the correctness of event data in the case of schema-on-read, Strong Schema and Weak Schema , the former is the method used by event brokers such as Apache Kafka and Apache Pulsar, and the latter is Gregory The way Young suggested in Versioning in an Event Sourced System .
As shown in Figure 2, in the case of schema-on-read, if you want to verify the correctness of the written event data, one of the easiest ways is to store the schema definition of the event before the payload (event data) when the data is written. , so that the program that writes the event ( Serializer ) can automatically verify that the event is in the correct format. In addition, the program that reads the event ( De-serializer ) can also read data according to this schema.
▲Figure 2: Each event carries a schema definition, which is used to verify the correctness of the event format when writing, and the basis for parsing event data when reading.
Although the method in Figure 2 is simple, each event carries a schema definition. These repeated schema definitions will occupy bandwidth during network transmission and occupy space during storage. Therefore, the idea of ” Why not centralize these schema definitions and store them in one place, and serializer and de-serializer can be read here when needed? ” The idea was born, and Schema Registry was born. this software.
Figure 3 is a schematic diagram of the operation of Apache Pulsar Schema Registry. Schema Registry saves the event version and its schema that once appeared in each topic. When the Writer wants to write an event, it uses the Pulsar client program to ask the Schema Registry for the event’s schema (SchemaInfo object) whether the schema has been registered. into the event. When reading events, also provide a SchemaInfo object of Schema Registry, and then return a de-serializer and read events through it.
Basically, Schema Registry is a centralized table lookup service. After using Schema Registry, events written into the database do not need to be like Figure 2. Each event has a schema definition. As long as the event version and schema definition are registered with Schema Registry, the serializer returned by Schema Registry will know the version number of the event when the event is written, and only the version number needs to be added when writing to the database. As shown in Figure 3, the number in the green box in front of the event represents the version number of the event
▲Figure 3: Schematic diagram of the operation of Apache Pulsar Schema Registry, refer to ” Apache Pulsar in Action “
There are many details about the detailed operation of Schema Registry. Interested villagers can refer to Apache Pulsar or Apache Kafka, both of which use Schema Registry, but the default Schema Registry software is different. In the book ” Apache Pulsar in Action “, there is a very detailed description of the operation principle of Pulsar’s Schema Registry, and interested villagers can refer to it.
Strong Schema provides a mechanism for writing and reading automatically validating event formats in a schema-on-read case, which feels good and reduces human error. But Gregory Young in ” Versioning in an Event Sourced System ” recommends another approach called Weak Schema, whose characteristics are shown in Figure 4.
Weak Schema does not use Strong Schema’s deserialization method to read data, but uses mapping method. What is the method of mapping? It is a bit similar to the way you manually write a Mapper program to convert Domain Object into DTO when you convert the domain object into DTO (data transfer object) and pass it to the front end. It’s just that the mapping in Weak Schema is the process of mapping events from the database to domain event objects.
The Mapping principle is simple, with only three, as illustrated in Figure 4.
▲Figure 4: Weak Schema feature description
Using the Weak Schema method, the event itself does not need to add a version number, because as long as the rules of the Weak Schema are followed, no matter how many versions of the event are stored in the database, the reader must be able to use the mapping method to read the event. also. This approach also does not require a centralized Schema Registry.
However, Weak Schema has a very important limitation, that is, the field name of the event cannot be renamed (renamed), and if some required fields are canceled or forgotten in some event versions, the mapper program must do Special checks are shown in the program in the lower right corner of Figure 4.
There are still many details and corresponding methods on the issue of event version control. Teddy suggests that interested villagers can read Gregory Young’s ” Versioning in an Event Sourced System ” (this is an e-book).
Which one is better?
Which is better, Strong Schema or Weak Schema? To be honest, Teddy didn’t know either. ezKanbna currently prefers Weak Schema, and according to Teddy’s understanding, EventStoreDB does not seem to support Schema Registry. EventStoreDB was first developed by Gregory Young. Since “Grandfather” suggested using Weak Schema, it is reasonable that EventStoreDB does not support Schema Registry.
However, if today’s application scenario is changed to cross-microservice message communication instead of simple event sourcing, event broker software such as Kafka or Pulsar is usually used. It is very common to use Schema Registry (Strong Schema) in this case.
However, according to Gregory Young, even in the case of cross-microservice messaging, Weak Schema can still be used. As far as Teddy knows, the Topic of Apache Pulsar can be set not to check the schema when writing and reading. This method can support Weak Schema to read events by mapping.
Both methods have their own advantages and disadvantages, and it is left to the villagers to decide which method to use.
next episode preview
The next episode talks about a topic related to event versioning: behavior versioning .
Yuzo’s inner monologue: It took me a while to figure out this issue.
This article is reprinted from http://teddy-chen-tw.blogspot.com/2022/07/13.html
This site is for inclusion only, and the copyright belongs to the original author.