Unraveling the Concepts
You may also enjoy:
Change Data Capture (CDC) With Embedded Debezium and Spring Boot
I have previously discussed events on this blog, describing them as announcements of something that has occurred within the system domain, accompanied by relevant data about that occurrence. At first glance, this might seem identical to CDC — something changes in a system, and this needs to be communicated to other systems — which is precisely what CDC is about.
However, there is a crucial distinction to be made here. Events are defined at a much higher level of abstraction than data changes because they represent meaningful changes to the domain. Data representing an entity can change without having any “business” impact on the overall entity that the data represents. There can be several sub-states of an order that an order management system might maintain internally but which do not matter to the outside world.
An order moving to these states would not generate events, but changes would be logged in the CDC system. On the other hand, there are states that the rest of the world cares about (created, dispatched, etc.) and the order management system explicitly exposes to the outside world. Changes to or from these states would generate events.
case study on the intricacies of asynchronous programming. Updates to the command model are propagated to the read model, typically (but not necessarily) asynchronously. Can we leverage CDC for this purpose? or should the command module emit events that are read by the query module to build its data model?
ElasticSearch ), so utilizing DB-level change logs is not a bad idea. This, of course, is just an opinion. Using events here would not be bad either, especially if different teams manage the models.
imbue the overall architecture with evolutionary characteristics.
Kafka and can then be consumed by other systems that want to store this data. A very popular and efficient way of building CDC systems is by tailing the internal log files of databases (
MySQL and other relational DBs always have this for transaction management, ElasticSearch has a change stream in its newer versions) using something like
Filebeat and then publishing the logs over Kafka.
Logstash type plugins to ingest data into other systems that persist this change log. Consumers may also be
Spark/
Flink style streaming applications that consume and transform this data into a form suitable for other use cases.
In certain scenarios, tapping into database change logs may not be a viable option, as not all systems provide this functionality. In such cases, we need to integrate code into the application layer to generate change logs. Ensuring that data updates and log emissions occur simultaneously, or not at all, is an extremely challenging problem to solve (essentially an atomic update problem: how to guarantee that both database updates and Kafka event emissions occur or are rolled back). Data consistency is paramount in a Change Data Capture (CDC) system.
ChangeLog{“order number” : “12345”, “changed field” : “status”, “old value”: “in progress”, “new value” : “cancelled”}
OrderEvent {“order number” : “12345”, “event type” : “order cancellation”}