Microservice Architecture Case Study: Hidden Category

Developing my side project I've bumped into an interesting challenge, relating to data synchronization in microservices. In short, I want to mark a category as hidden, simultaneously hiding items in a category from an unfiltered list. The challenge seems to be pretty common, still having practised microservices for a few years I couldn't find a standard solution. Here I will share some different options and my thoughts on them, finishing with the decision I've finally made. If that sounds interesting let me give you details of the problem first.

The Problem

The Finrir app, in which the challenge occurs, allows a user to track expenses. For better organization, the app has a thing called Contexts, which is essentially a folder for expenses. The app talks to a Gateway, which talks to other microservices, in this particular case the microservices we are interested in are Contexts and Expenses microservices. Contexts microservice stores a list of user contexts and their configuration, Expenses microservice stores various data about an expense, including a reference to a context to which the expense belongs.

The goal is to architect the handling of a "Hide Context" command. When a context is hidden it will not appear in the filter dropdown and, more interestingly, expenses from the context are not shown anywhere e.g. if a user picks "any" for contexts filter expenses from hidden contexts are still excluded from the list.

Here's a very simplified pseudocode of how this would work if it were an in-memory system.

ruby

hideContext
    theContext = contexts.by @contextId
    theContext.hidden = true

allContexts
    return contexts.where =>
        @context.hidden != true

allExpenses
    return expenses.where => 
        @expense.context.hidden != true

The root problem of the challenge is that the expenses microservice does not (and probably shouldn't) know if the context is hidden or not. The article will not discuss whether the list of services is correct - though I'm open for discussion in the comments. Other than that, we can architect the system however we want. Let me share with you the options I've come up with!

"Just use a Saga! 🤓"
Unfortunately, that alone doesn't solve anything. Things I say about every option below are as applicable to saga steps as it is to procedural calls. It may change the complexity difference, though. So your final choice may easily be different than mine.

Option 1. Gateway Filtering

The first option is to not store the hidden flag on expenses at all. Instead, if the expenses query doesn't filter them on a specific context do a cleanup in a gateway: get hidden contexts from contexts service and exclude expenses belonging to those services. Here's how the option looks in pseudocode:

ruby

Gateway
    hideContext
        contextsService.patch @context hidden:true
    
    getExpenses
        expenses = _expensesService.get @request.query

        if @request.query.context == null
            hiddenContexts = contextsService.get @userId hidden:true
            expenses = expenses.excluding =>
                hiddenContexts.contains @expense
        
        return expenses

Although this option is pretty straightforward, there are quite a few pitfalls in this approach. But let me start with what I like about it:

Benefits:

Data Consistency Guarantee. Since we don't need to synchronize data in any distributed storage we get a level of data consistency we'll if that would be a single service
Fast update. As there's an absolute minimum of operations in hideContext we'll get the fastest possible update.

Drawbacks:

Slow read. With this approach, we accept both reading expenses that will not end up in a result and doing a roundtrip to contexts service to fetch hidden contexts. This is a big overhead, especially knowing that in the app the read operation is way more common than the update operation, the performance of which we can avoid caring too much about.
Explicit orchestration. This problem is present for every option with the word Gateway in it. Let me first explain what I mean by "orchestration". My informal definition would be: ensuring that the data distributed among services is adequately represented i.e. expenses are hidden when parent context is marked hidden. When the orchestration is done explicitly i.e. by a particular service we fall into the risk of the service becoming too big, which seems quite contrary to the whole idea of creating microservices.
Overinformed read. This is closely related to explicit orchestration. In this case "read" operation has an implicit dependency on the "update" operation. Keeping in mind, that we talk about "read" and "update" operations of different entities (contexts and expenses) you probably can see how such dependency could result in a maintenance nightmare.

Although the option solves the root problem of the issue (data inconsistency) it brings a whole set of other unpleasant problems. Let's see what we'll get if we were to pick a more direct approach.

Option 2. Unchecked Sequential Actions In the Gateway

So how about we just update expenses after updating the context, just call the two services one by one? Something like this:

ruby

Gateway
    hideContexts
        contextsService.patch @userId @context hidden:true
        expensesService.patch
            by 
                context = @context
            set
                hidden = true

I assume if you are experienced in microservices you already see the main problem with that. If the second operation fails we'll have an inconsistent state: context is hidden, but the expenses in it are visible. But more on that later, let's start with something more optimistic.

Benefits:

Straightforwardness. I just couldn't think of any more obvious way to achieve the result. It's simple to write, simple to understand, and simple to reason about.
Fast read. We don't even touch the read operation in this case, so it's the simplest possible read - just read all expenses without the hidden flag.

Drawbacks:

Potential data inconsistency. This is the problem I've talked about before. If the second operation fails, we'll have an inconsistent state: context is hidden, but the expenses in it are visible.
Slow update. We'll now do virtually everything in the update section, which obviously increases the amount of resources we need to execute the operation.
Explicit orchestration.

The good news, about data inconsistency is that it's not that critical. A user will see an error occurring so the inconsistency will be at least somewhat expected and the user can restore the adequate state by switching the hidden toggle on and off, in other words, the idempotency of the operation makes data inconsistency not that terrible. But that's the thing about microservices: we lose atomicity, which we took for granted in the good old monolith. Let's find out what we can do about it.

Option 3. Sequential Actions With Compensation In the Gateway

Well, the first, easy thing that comes to mind is to perform a compensation action. Meaning if we weren't able to update expenses before returning an error also update the context to be visible again. Something along those lines:

ruby

Gateway
    hideContexts
        contextsService.patch @userId @context hidden:true
        try
            expensesService.patch
                by 
                    context = @context
                set
                    hidden = true
        catch
            contextsService.patch @context hidden:false
            throw

Benefits:

Lowered data inconsistency risk. This is the point of this option. In the scenario when the expenses update fails we restore a valid state (like it was before the operation).
Relatively straightforward. Although this option is slightly more complicated it's still relatively easy to comprehend.
Fast read.

Drawbacks:

Still potential data inconsistency. Compensations being a solution to "in case it fails" has an obvious risk of, "but what if the compensation fails". Although the risk of two independent services failing is much lower it's still surely possible if there's for example an internet connection problem.
Explicit orchestration.
Slow update.

Although the option seems to be more stable it's not an ultimate solution. Let's try something more extravagant!

Option 4. Sequential Actions With Filtering In the Gateway

We have late filtering, that guarantees data consistency but comes with a performance hit. At the same time, we have caught an update, that is relatively fast but is not certain. How about we do the late filtering only if we know that there was an error? This might look something like this:

ruby

Gateway
    hideContexts
        contextsService.patch @userId @context hidden:true
        try
            expensesService.patch
                by 
                    context = @context
                set
                    hidden = true
        catch
            errors.add @userId 'contextHiding' @context

    getExpenses
        expenses = _expensesService.get @request.query

        if @request.query.context == null
            if errors.has @userId 'contextHiding'
                hiddenContexts = contextsService.get @userId hidden:true
                expenses = expenses.excluding =>
                    hiddenContexts.contains @expense
        
        return expenses

We can also try patching the expenses if an error is present, before executing the filtering (so that error is not present the next time)

Benefits:

Supposed data consistency. If we are able to implement the error storage that approach doesn't seem to have a path to data inconsistency
Positive feedback. In case the second operation fails, it will be not visible to a user, we'll just handle the error under the hood.
Relatively fast read. For happy flow, the overhead would be roughly equal to the time we need to check if an error exists. If the error storage will be implemented to be fast the overhead shouldn't be too big.

Drawbacks:

Need for the complicated error storage. Note how every benefit started with "if". This is because we'll need the error storage to be distributed as well. So implementing it will probably be a great challenge on its own.
Slow update.
Explicit Orchestration.
Overinformed read.

I have very mixed feelings about this solution. It sort of solves all technical challenges, but it screams over-engineering. Although I can think of ways to make it even more sophisticated it's already beyond complexity I'll be ready to put in a real system. So let's not go deeper on that path and instead pivot our brainstorming a little bit.

Option 5. Untracked Events

Options 1 - 4 all have explicit orchestration as a drawback. But what's the alternative? Well, we can implement an implicit orchestration via events. The Gateway service will know nothing about the need to update expenses, but the expenses service will receive events about contexts being hidden and update its records accordingly:

ruby

Gateway
    hideContexts
        contextsService.patch @userId @context hidden:true

Contexts
    patchContexts
        updatedContext = db.contexts.update @contextId hidden:true
        bus.publish contextHiddenEvent

Expenses
    onContextHiddenEvent
        db.update
            by =
                context = @context
            set =
                hidden = true

Benefits:

Fast read. Reading doesn't receive any overhead
Fast update. The update stays as tiny as possible, too
Implicit Orchestration. In this configuration, the Gateway service controls nothing about the dependency between data of the services.

Drawbacks:

Data inconsistency. Publishing and handling events take time (in the background). Therefore, if read expenses request in send right after the hide contexts command it's virtually guaranteed to get an invalid result.
Hidden problems. Unlike in sequential calls approach if the second operation fails user will not just get inconsistent results, but also have no idea something wrong even happened.

Although the benefits of this approach are great the price seems unbearable. Let's check out one more fancy idea of overcoming the problem and call it an article.

Option 6. Events With Filtering In the Gateway

Well, the root problem of the previous approach is that Gateway service knows nothing about data synchronization and its status, and thus can not save users from data inconsistencies. How about we wait for a message indicating that data synchronization is complete and do the protective late filtering for our users before the synchronization is complete? It should go like this:

ruby

Gateway
    hideContexts
        contextsService.patch @userId @context hidden:true @traceId
        awaitedOperations.add 'contextHiding' @userId @traceId
        
    onExpensesUpdated
        awaitedOperations.remove @traceId

    getExpenses
        expenses = _expensesService.get @request.query

        if @request.query.context == null
            if awaitedOperations.has 'contextHiding' @userId
                hiddenContexts = contextsService.get @userId hidden:true
                expenses = expenses.excluding =>
                    hiddenContexts.contains @expense
        
        return expenses

Contexts
    patchContexts
        updatedContext = db.contexts.update @contextId hidden:true
        bus.publish contextHiddenEvent @traceId

Expenses
    onContextHiddenEvent
        db.update
            by = context:@context
            set = hidden:true
        bus.publish expensesUpdated @traceId

Benefits:

Supposed data consistency.
Relatively fast read.
Relatively fast update.

Drawbacks:

Need for a complicated awaited operations storage.
Semi-explicit orchesration. Why the "semi", how is it different from all the other orchestrations? Well, unlike in other explicit orchestrations removing the orchestrator will not make the system unusable, instead, we sort of degrade to option 6.
Overinformed read. With all said above, it is worth mentioning that this option produces a lot of tricky dependencies.

This was the last option. I feel like that covers at least the main options I can come up with. Now, how about I give you my final take on the options?

What Option Did I Choose?

I've decided to go with option 2 (unchecked sequential actions in the gateway). That approach requires the least effort and although it may produce data inconsistency, that inconsistency is: a. Anticipated, b. Repairable. In other words, for a corner case asking a user to toggle the hidden toggle back and forth doesn't seem like too high a price, provided that the user will see that some error has occurred.

Regarding the danger of Gateway becoming "too big", I'm indeed not happy about the solution from this perspective. However, my experience tells me that making a decision, based primarily on a thing that may happen is the dominant reason for over-engineering, it is sort of a premature optimization of architecture. So the thing I tell myself on "it may become too big!" is "yes, but it may not!".

If you would take a different path I would be happy to discuss it in the comments! And, as always, claps are appreciated! 👏

Microservice Architecture Case Study: Hidden Category ​

The Problem ​

Option 1. Gateway Filtering ​

Option 2. Unchecked Sequential Actions In the Gateway ​

Option 3. Sequential Actions With Compensation In the Gateway ​

Option 4. Sequential Actions With Filtering In the Gateway ​

Option 5. Untracked Events ​

Option 6. Events With Filtering In the Gateway ​

What Option Did I Choose? ​

Microservice Architecture Case Study: Hidden Category

The Problem

Option 1. Gateway Filtering

Option 2. Unchecked Sequential Actions In the Gateway

Option 3. Sequential Actions With Compensation In the Gateway

Option 4. Sequential Actions With Filtering In the Gateway

Option 5. Untracked Events

Option 6. Events With Filtering In the Gateway

What Option Did I Choose?