Taking advantage of CQRS in legacy applications



This content originally appeared on DEV Community and was authored by Carlos Gándara

When we think about Command Query Responsibility Segregation (CQRS), chances are our mental model is the one for an application built around dispatching events in a write model, process them to update a read model, event stores, message buses, projections,…

In this post we will explore how we can benefit from CQRS without all that tooling, which may seem impossible to introduce in legacy systems. To do so, we will explore Pattern CQRS as a low level approach to differentiate command and query operations.

I will refer to operations or use cases that change state as command or write. After one of these, the system is different as it was before. Example: activate a new user.

For operations or use cases that read data I will use query or read. No matter how many times you run them, the system stays unchanged as the operation has no side effects. Example: listing all users.

Architectural CQRS vs Pattern CQRS

CQRS in its essence is about separating the code responsible of changing state from the code responsible of reading data. Which is literally what the pattern name means.

This separation may influence the whole application architecture or apply at a more granular way. Both options with their ups and downs.

Note Architectural and Pattern CQRS are just names I made up for this post. This is not standardized terminology.

Architectural CQRS implies the architecture is built around it and maximizes the pattern benefits. In exchange, it requires specific tooling (like the mentioned before: events, buses, etc.). Since changing the architecture of an application is usually a costly process, even when iteratively, applying this approach to existing codebases may mean a significant effort.

Pattern CQRS is a lower level thingy, and therefore easy to apply in a case-by-case basis. Therefore, it will be probably easier to refactor an existing application with this approach, even when the benefits are coming in smaller bits.

There is this excellent post from Alberte Mozo, explaining the pattern initial formulation and how none the architectural or pattern approaches (and everything in between) are necessarily better or more pure, if being pure is even a relevant thing.

Here we are interested in Pattern CQRS in a brownfield project, and we will use a simplified example to illustrate it.

A familiar scenario

Let’s assume an existing codebase with the following:

  • The application is organized in use cases.
  • There is some abstraction to fetch entities from our domain model, like repositories.
  • The entities are modeled to cover both command and query operations.
  • Some read operations involve many entities.

For instance, we have a use case to activate users and another one to list all users. The User entity is used in both, but listing users requires to show the subscription plan users have so we need to involve the Subscription entity for the listing as well.

class User {
    public function __construct(
        //Many other properties
        public string $email,
        private Status $status,
    )

    public function activate(): void {
        if ($this->status.canBeActivated()) {
            $this->status = $this->status->toActive();
        }
    }

    public function getStatus(): string {
        return $this->status->toString();
    }
}

class Subscription {
    public function __construct(
        //many properties unrelated to listing users
        private Type $type,
    )

    public function typeAsString(): string {
        return $this->type->asString();
    }
}

The activation use case is a command and uses User to change state:

class ActivateUser {
    public function run(
        UserRepository $userRepository,
        UserId $userId
    ): void {
        $user = $userRepository->getWithId(userId);
        $user->activate();
    }
}

While listing users uses both entities as the data required is scattered among them:

class ListUsers {
    public function run(
        UserRepository $userRepository,
        SubscriptionRepository $subscriptionRepository,
    ): ListUserCollection {
        $users = $userRepository->findAll();

        $response = new ListUserCollection();        
        foreach ($users as $user) {
            $userSubscription = $subscriptionRepository->forUser($user->id());
            $response = $response->add(
                [
                    'email' => $user->email,
                    'subscription_type' => $userSubscription->typeAsString(),
                    //many other fields
                ]
            );
        }

        return $response;
    }
}

There are a number of potential problems here.

Entities are messed up. They cover both types of operations and chances are some of their properties are there just to show data, while some others are used also to perform business logic. The properties used for reading data use cases are accessible either via getters or public properties in order to return some response. This breaks the encapsulation OOP principle and harms the ability to refactor and evolve our domain model. Not to mention that the entities are bloated with properties not intended for business logic.

Our example is simplified, but we can assume User has far too many properties. Subscription as well, and it’s specially painful here because we only care about the string representation of the type from it. We need to include getters for a string representation of user status and subscription type, which makes changing them more complex than being an internal concern of the entities.

Another potential problem is performance and wasted resources. Overall, read use cases are expected to be served faster because there is somebody waiting for a response in the other side of the wire. Since we use entities to access data, a read use case may require many of them to pick just small bits from each. Entities will include many other properties that are irrelevant for the current needs, but the resources and time are invested to fetch them anyway.

In our example, if building a Subscription requires many joins at database level, we are incurring on that overhead just to get the type.

CQRS benefits

By applying CQRS we prevent the above issues by creating different models for the write and read operations. Entities are freed of read-only data and we introduce a separate read model that is only a data structure with no business logic involved.

Our write model can evolve in a more flexible way because there are not extra properties and the amount of public interface exposed is reduced.

The read model is ad-hoc for the query use cases and the data access operations can be optimized for faster responses.

Both Architecture and Pattern CQRS help for a better entity modelling, while the optimized reads are way more powerful in Architecture CQRS than in the Pattern version, which may still require some query gymnastics to access all the data.

Implementing Pattern CQRS

In Architecture CQRS the write model (the entities) publish events when there is a state change. These events are processed to update the persistence dedicated to the read model only and to read operations we use simple SELECT queries. We can even use a different persistence technology to favor read speed. As commented, it influences the whole architecture and needs specific tooling to accommodate it.

In Pattern CQRS we give up some of the benefits of the Architectural version, but its implementation is more straightforward. Introducing it in existing applications is overall simpler and gives immediate benefits.

  • Create a new model for the read use cases consisting only of the data they require.
  • Create a new data access abstraction that fetches this model.
  • Replace the usage of the write model repositories with the new abstraction, and use the read model to produce the response.
  • Profit.

Applied to our example, the activate use case stays the same but the read is simplified to:

class ListUsers {
    public function run(
        ListUsersQuery $query,
    ): ListUserCollection {
        return $query->findAll();
    }
}

Since the ListUsersQuery will take care of doing the query to get the exact data needed.

The User entity no longer requires data intended to read operations and has a smaller public interface:

class User {
    public function __construct(
        //Other properties, but none for read operations
        private Status $status,
    )

    public function activate(): void {
        //same as before
    }
}

Changing its internal representation involves less changes since it’s not exposed anymore.

Upsides

As we see in our naive example we don’t need to change the command use cases, and for the query use case we introduce a new model and the abstraction to get it from persistence. This can be done in an iterative way and with a reduced amount of changes introduced each time.

While we migrate read use cases to Pattern CQRS, we can incrementally get rid of unused properties in our entities. They become less verbose and include only data used for command operations. A smaller public interface means better encapsulation and makes refactoring and evolving them easier than with a bigger public surface.

Actually, the more we attach entities to read operations the more we are inclined to mimic the database when designing them, which harms the domain design flexibility. CQRS helps to reduce that temptation.

The performance can potentially improve as well, reducing the queries to fetch only what we need.

Finally, by applying Pattern CQRS we end up with a use case design prepared to transition to Architectural CQRS easily since the read use cases will be basically the same (many other changes are needed, but in other places; the transitions is easier but not trivial)

Downsides

There is always a tradeoff and Pattern CQRS is no exception. We used a simple example in this post, but sometimes (always) reality is not that straightforward.

If our read use case requires data that is computed through business logic, we would need to fetch the entities anyway to run the calculations (or to replicate the business logic in the queries, which is a very bad idea).

For instance, if listing orders shows the order total and this value is not persisted but calculated in the Order entity (by adding each order line total, apply taxes, discounts, etc.) we will need to fetch and use Order to build our response.

We should evaluate whether adopting CQRS is worth the effort, as it may reduce clarity and even degrade performance due to fetching entities on top of the read model.

From a maintenance perspective, by adopting Pattern CQRS we are introducing a new model without fully breaking the persistence coupling between write and read models. We should expect certain maintenance overhead. Simple domains without relevant business logic may be totally fine mixing up write and read data since they tend to not change much, and we may be investing resources in a separation that may not reap any benefits.

Conclusion

We have seen how we can benefit from CQRS by introducing it in existing applications without a big bang architectural change. How beneficial this can be should be analyzed in each case, since it is not a silver bullet, and it can sometimes do more harm than good.

If you, as it happens to me, are annoyed by unreadable entities cluttered with getters and so attached to the database design that it is cumbersome to evolve them, consider giving Pattern CQRS a try. A single well-chosen use case refactor can show the potential of it right away and convince skeptics and those wary of over-engineering.

[Cover image](https://commons.wikimedia.org/wiki/File:Fountain_pen_writing(literacy).jpg) by PetarM, under Creative Commons Attribution-Share Alike 4.0 International license._


This content originally appeared on DEV Community and was authored by Carlos Gándara