UMLBoard

repository

a domain-driven design pattern

Separate your domain logic from your persistence logic.

PNG

SVG

The repository Design Pattern

Frequency

Complexity

The Repository is a central design pattern of the Domain-Driven-Design (DDD) philosophy. As such, it's relatively common in larger enterprise code bases. The idea behind DDD is to make the business model and its domain concepts the central aspect of your software design -- instead of earlier strategies where the main focus was the technology and the implementation. In this context, the Repository aims to decouple the domain logic from the underlying database layer, thus making your architecture more flexible and easier to maintain.

We will see in a minute how this works, but let's start with some basics first.

Database Access

Almost every app needs to store data, be it the customer's address for an online shop or the current world state in a video game. For your data store, there is a variety of technologies available: Relation SQL, document-based NoSQL, or even your own proprietary binary format -- it's totally up to you.

The tricky part comes when you want to integrate your data store logic into your application code: Often, your database and programming language use different paradigms (relational vs. object-oriented, schemaless vs. statically-typed...the variations are endless), and that is not where it stops: There is also a mismatch on a conceptual level: Your application is about domain concepts and business logic, and your database is about tables, rows -- you know, all that low-level stuff.

Bridging both worlds is a challenging task. But luckily, there are solutions to these problems, let's look at some of them:

Direct Access

The easiest way is to incorporate database access directly into your application code, e.g., by sending SQL statements from your business service or application controller. While this can be a reasonable solution for smaller applications or rapid prototyping, it can soon become unmanageable when your application starts to grow (and trust me, almost every application does this). Some problems you're going to face then are:

Your data access code gets scattered around the whole application, increasing the risk that query or update logic is duplicated at several places.
Business logic and database technology become strongly coupled, making it more challenging to update or replace one of them.
Breaking domain rules. Your domain model may have invariants that must always hold. Direct database access would allow developers to bypass these constraints and save entities with an inconsistent state.

The larger the application gets, the more severe these issues can get.
Doesn't sound too good? Yeah, we better check out some other options...

Data Mapper

Instead of invoking the database directly, you can use a Data Mapper that controls all database calls. That way, enforcing domain rules and avoiding duplications is easier. But from a conceptual point of view, this pattern is still close to the database, as it's basically just a wrapper around your database API, often with a direct mapping between domain entities and database tables.

Active Record

The Active Record pattern follows a similar approach: Here, a single object in memory represents a single row in a database table. Accordingly, all edit operations performed on this object will be propagated straight to the database.

Data Access Object

And last but not least, the Data Access Object. Similar to the Data Mapper, it provides an object-oriented interface that wraps access to your database. The most well-known implementations of these patterns are OR-Mappers like EntityFramework: You interact with your database through objects provided by the mapper, while in the background, it generates and executes all relevant database queries to synchronize your object state with your database.

These patterns help to decouple the database from the application and thus make our code easier to understand and maintain. Still, they don't solve all our problems: Conceptually, they are on the technical level and often map domain entities directly to database tables. This means our implementation still largely depends on our database layer instead of being focused on our domain.

The Repository Pattern

The Repository pattern aims at providing all the benefits of the above patterns without the disadvantage of being database-centric. At first glance, it looks similar to the Data Access Object, also providing a data-access interface. However, the main difference is the level of abstraction where the pattern operates: The Repository shifts the focus from the technology to the domain. Its API defines queries and update methods that operate on domain objects rather than database functionality. The idea is that by using a Repository, application developers can stay in the domain and write their business code without coping with lower-level database details. -- the interface provided by the Repository is tailored to the use cases of the domain, and all database handling is hidden inside its implementation. This also makes our code independent of the used database technology -- whether a document-based storage or a relational database -- the repository interface stays the same!

The Repository translates between business and database concepts. — The Repository pattern does not only provide an interface to the database layer, it also translates between domain-and storage concepts.

Implementing a repository for simple queries is straightforward: All you have to provide is a set of APIs that access the database layer and translate the results into your domain objects. Unfortunately, business domains don't tend to stay simple: Sooner or later, your queries become more complex, requiring you to join dozens of tables and filter the results by various criteria. To achieve this, you can basically try two different strategies:

Implementation Details

A generic repository API with additional filter criteria

One way is to keep your APIs quite general but add a way to define additional filter options. This can either be through a parameter object that lets you specify search criteria (like, for instance, some MongoDB APIs do) or by returning a query object you can use to construct more complex queries (that's how Entity Framework does it). No matter which way you choose, your API's signature stays relatively simple, and your application code has to do the heavy lifting of constructing the filter object or querying the result set.

For example, consider the following code:

public interface ProductRepository {
    List<Product> findByCriteria(ProductFilterCriteria criteria);
}

public class ProductFilterCriteria {
    private String name;
    private BigDecimal minPrice;
    private BigDecimal maxPrice;
    private String category;
    // Getters and setters for the fields
}

This repository only has a single query method, and the callers are responsible for constructing the filter criteria specific to their use case. While this makes the repository very flexible, this approach has some drawbacks:

The caller is responsible for setting up the correct filter criteria. Sure, that's still better than writing your SQL queries in your application code directly. Still, it already goes in the same direction as the application developer needs to know all the details of your filter language.

But there is another, more severe issue: Today, most database APIs or ORMs already provide very convenient query interfaces, and if you want your repository to support that logic, you end up reimplementing all of that again. Unfortunately, in most cases, in a less sophisticated way -- see the Inner Platform Effect for details on this anti-pattern.

But as I said, there are two ways. Hence let's check out another implementation strategy:

A repository with explicit query logic

This is the opposite of the previous approach. Rather than having a general interface, we provide a dedicated method per use case. Our example from above would then have the following interface instead.

public interface ProductRepository {
    List<Product> findByName(String name);
    List<Product> findByCategories(String category);
    List<Product> findByPriceRange(BigDecimal minPrice, BigDecimal maxPrice);
    Product findProductWithHighestPrice();    
}

There is no separate filter object anymore. Instead, every use case comes with its own API. This is very convenient for the application developer, as they can now pick the method they need without any configuration. All possible use cases are made explicit through the Repository's API.

However, while you now save much time reimplementing the filter logic of your database, you may already see where this is going: The more use cases you have, the more cluttered your repository interface gets. Take, for instance, the findProductWithHighestPrice() method, I wonder if every developer in your team needs to know about this query. But since it's in the repository's public interface, it's available to anyone. Also, it can become really tedious to implement a new query method for every use case...

Which approach is better now? Well, both have their pros and cons. But luckily, you don't have to stick with a single approach, as combining them is easily possible: For easy cases, provide a simple filter logic, and implement the more complex queries explicitly.

Some frameworks can also help you here, for instance, the Spring Data JPA provides a nice convention-based solution: If you follow a specific naming pattern in your repository's API, the framework generates the relevant query code automatically. The naming convention is your filter criteria, but your API stays explicit.

FAQ

A repository for every database table?

It may be tempting to implement one repository class per database table, but that's not the idea behind that pattern: The repository's purpose is to reflect your domain, where such things as tables usually don't exist. Instead, we have a concept called Aggregate Roots, which represent the main entities of our domain. These root objects can have several child entities (like the OrderItems organized within an Order), but access to these subelements occurs exclusively through the Aggregate Roots. While these sub-entities will likely be stored in their separate database table, access to them should only happen through their parent entity. Hence, in most cases, they don't require a dedicated repository.

Generic repositories yes or no?

Whether a generic repository is a good thing or not is the subject of heated debates. On the pro side, people argue that most repositories require some CRUD functionality, so having a generic base class that provides all these methods saves a lot of time. On the other side, critics say that repositories should only implement concrete use cases, and a generic method like GetAll is usually not one of them. Also, since you always inherit all these generic methods, they clutter your repository's interface, whether you need them or not.

So using generic repositories is a trade-off between code reuse and expressiveness, what's better may depend on your individual scenario.

Transaction Handling

Again, not a trivial problem: Sometimes, there are several update operations where you must ensure that either all of them work or, in case of an error, get reverted together. Two group these operations together, you must execute them within a single transaction. But the details of this process depend on your concrete database technology, so how can we use this mechanism without exposing database details to the domain layer?

One approach is to use a UnitOfWork object that controls the beginning and end of your transaction. Also, instead of retrieving our repositories directly (or via dependency injection), we must request them from our Unit of Work instance. Only this ensures that all repositories are operating within the same database context.

A UnitOfWork class is connected with two repositories and is accessed by a Client. — The UnitOfWork ensures that all repositories operate on the same database context.

To now process several dependent updates, we access the repositories through the Unit of Work, call their update methods, and at the end, invoke an operation of our UnitOfWork that completes the transaction (like saveChanges() or commit() or something like that). That way, you can always ensure that all relevant updates are happening within your transaction.

You can even go one step further and provide a single method that internally makes all calls within a transaction. Clients then only have to call this method with all the entities they want to update together.

Performance Considerations

While repositories help focus on the domain, they also put another level of abstraction on top of your database layer. Every abstraction creates a slight overhead, and sometimes, writing the queries directly instead can be more efficient, especially if you have a complex mapping logic between your domain and the persistence layer. Still, in these cases, you can reduce the abstraction, for example, by calling a stored procedure from within your repository, minimizing the overhead.

References