Why I Hate Microservices Part 3: The Identity Crisis 😵
Imagine going out to buy bread. And you know that there are two types of bakeries. One that gives you the bread right away because they know exactly when it's going to be done. The other kind takes the order from you and tells you to go home and it will deliver the bread to you once it's ready because it's unclear for them when it's going to be done. Then it's clear for you what to expect when dealing with both sorts of bakeries.
Now, imagine if there were a third type of bakeries. One that tells you that it's going to give it to you right away while it's actually unclear for them when the bread is going to done. So, you expect the bakery to behave in a certain way but it will actually behave in another. Well, imagine working in such a bakery.
Now, imagine if there were a third type of bakeries. One that tells you that it's going to give it to you right away while it's actually unclear for them when the bread is going to done. So, you expect the bakery to behave in a certain way but it will actually behave in another. Well, imagine working in such a bakery.
Eventual 📨 VS Transactional⚡
In my humble opinion, the problem that I'm discussing in this article is probably the most critical one when designing a system. It's a problem about the identity of your solution.It's crucial for your system to have a clear identity that enforces how everything interacts and behaves within the system and with external systems.
In any system design, the most important aspect of how every transaction behaves in your technical ecosystem is consistency. Is it eventual or transactional?
In order for me to explain the problem we had in our microservices project I have to clarify the difference between the two consistencies.
Going back to my brilliant bakery analogy let's take a closer look at both types.
The Transactional Bakery
The first bakery is a traditional bakery with limited resources. It serves only the few buildings around it. And it knows approximately how much time is needed in order to prepare a fresh batch of bread.
Thus, when someone walks in and makes an order, the customer and the bakery know how much time is expected for them to finish the transaction and the guy can go home and have his breakfast.
This type of bakeries has its ovens, cashier and products in the same shop, because this is a small shop that serves a limited number of people.
The Eventual Bakery
The other bakery however is a very famous bakery. It serves hundreds of customers each day, it has a big budget and it serves the whole city not only the few buildings around it.
In order for this bakery to accommodate these circumstances, it has a place that sells the bread, a place that bakes the bread and another place that takes the orders from the customers.
Because of the large business model and the variable factors, this bakery takes your order and tells you to either wait for your number, or go home and then your bread will be delivered to you once it's ready.
Thus, the customer knows for a fact that he will be notified once his order is ready and they will be able to claim it eventually. Not now, not after a certain amount of time, but eventually.
As you can see, it's either this or that. And the transparency in which you communicate the consistency to the end-user is everything. The end-user must know what to expect when dealing with your solution. Is it the "I will do what you want right away" approach or is it the "I will give you a call when I'm done doing what you asked me to do" approach?
What happens if you apply both consistencies and your system loses its identity?
Eventually Transactional😵
Going back to the microservices system that I have worked on. There was some sort of a form that the users of this system can fill to save a request. And I will show you the user's point-of-view VS what actually happens in the background.
User POV:
- Loading screen.
What's actually happening?
- Clicking on save calls an API in the request (Aggregate root) service that publishes a message to a Kafka topic.
- There's a service (in the same request service that publishes the message btw) that consumes this message (eventually) and saves the request in the database.
- The service continues to save the request details in all other concerned databases.
- The service prepares an aggregated document and publishes it to another Kafka topic.
- The Elasticsearch Kafka Connector consumes the message (eventually) and saves it in an index.
- The notification server service consumes this message too (eventually) and sends a notification through SignalR to the client-side to stop the loading screen and confirm that the request was saved successfully.
What's wrong with that?
- After the save button calls the API that publishes the message, the client-side has no idea what's going on after that. So the loading screen is just there for no other reason than to pacify the user.
- If any error occurred after step 1 there is no way the client-side would know unless there was a message that got published on the save failed topic. So, most of the time there was just the generic message of "Something went wrong. Please try again later.".
- The user has no idea that this is an asynchronous operation and is expected to wait indefinitely.
- If for example step 5 failed due to a schema conflict -for example- then if the user actually tries again, a duplicate data error will pop up because only step 5 failed not step 2. This is an issue I talked about in my previous post.
- Sometimes the notification service faces a schema conflict error too when consuming a message and can't tell the client-side that the request was saved successfully so of course the system says save failed but when the user goes to the requests page he would find his request because only step 6 failed.
The problem with the loading screen is that it's like a prison for the user and it prevents the user from doing anything but wait.
And if any of the steps fail for any reason then the loading goes on forever. And a horrible fix was introduced to make matters even worse. A static timeout for the loading screen. Which is the equivalent of giving someone with a gunshot a painkiller pill.
What should be done?
- As I said, this is an initial design problem. Your system has to have an identity. If you decide to go with eventual consistency, you have to commit to this identity throughout your whole solution.
- Transparency is very important with the user. They have to know what to expect to avoid user frustration.
- The user should not wait for the asynchronous transactions to be over while staring at a loading screen for an indefinite amount of time. They should understand that the request will not be saved immediately, rather the request will eventually be saved and that they will be notified once it's saved. If anything went wrong during the saving process, they have to understand what exactly went wrong so that they know what to avoid.
In all honesty, I have learned a lot from this overengineered project. All the problems I have faced and the interesting design decisions that were made, they made me a better software engineer. I am grateful for going through such an experience.
I will continue to learn from my mistakes and other people's mistakes to become better. I do not claim to be better than the people who have worked before me on this project and I'm only sharing my experience to help others and guide them not to repeat what made us struggle and suffer.
This will be my last article regarding this project but rest assure, if I ever have another nightmare that reminds me of another problem I have faced while working on this masterpiece, I will be back.
Comments
Post a Comment