The repository pattern explained and implemented

The pattern documented and named “Repository” is one of the most misunderstood and misused. In this post we’ll implement the pattern in C# to achieve this simple line of code:

var customers = customers.Matching(new PremiumCustomersFilter())

as well as discuss the origins of the pattern and the original definitions to clear out some of the misrepresentations.

The formal description

My first contact with the repository pattern was through the study of Domain Driven Design. In his book[DDD] (p. 147), Eric Evans simply states that;

Associations allow us to find an object based on it’s relationships to another. But we must have a starting point for a traversal to an ENTITY or VALUE in the middle of it’s life cycle.

My interpretation of the section on repositories is simple, the Domain do not care about where the objects are stored in the middle of it’s life cycle but we still need a place in our code to get (and often store) them.

In Patterns of Enterprise Application Architecture[PoEAA] (p. 322), repositories is described as:

Mediates between the domain and the data mapping layers using a collection-like interface for accessing domain objects.

Examining the both chapters; we’ll quickly come to understand that the original ideas of the repository is to honor and gain the benefits of Separations of Concerns and hide infrastructure plumbing.

With these principles and descriptions this simple rule emerges:

Repositories are the single point where we hand off and fetch objects. It is also the boundary where communication with the storage starts and ends.

A rule that should guide any and all attempts of following the Repository pattern.

Implementations

There are several ways of implementing a repository. Some honor the original definitions more then others (and some are just blunt confusing). A classic implementation looks a lot like a DAL class:

public class CustomerRepository
{
       public IEnumerable<Customer> FindCustomersByCountry(string country) {…}
}

Using this implementation strategy; the result is often several repository classes with a lot of methods and code duplicated across them. It misses the beauty and simplicity in the original definitions. Both [PoEAA] and [DDD] uses a form of the Specification Pattern (implemented as Query Object in PoEAA) and asks the repository for objects that matches that, instead of named methods.

In code this gives the effect of having several small classes instead of a couple of huge ones. Here is a typical example:

public IEnumerable<Customer> Matches(IQuery query) { …. }
var premiumCustomers = customers.Matches(new PremiumCustomersFilter())

The above code is a great improvement over the named methods strategy. But let’s take it a little further.

The Generic Repository

The key to a generic repository is to think about what can be shared across different entities and what needs to be separate. Usually the initialization of infrastructure and the commands to materialize is sharable while the queries are specific. Let’s create a simple implementation for Entity Framework:

public class Repository<T> : IRepository<T> 
    where T : class
{
    protected ObjectContext Context;
    protected ObjectSet<T> QueryBase;

    public Repository(ObjectContext context)
    {
        Context = context;
        QueryBase = context.CreateObjectSet<T>();
    }

    public IEnumerable<T> Matches(ICriteria<T> criteria)
    {
        var query = criteria.BuildQueryFrom(QueryBase);
        return query.ToList();
    }

    public void Save(T entity)
    {
        QueryBase.AddObject(entity);
    }
}

Using Generics in the definition of the repository, allows us to reuse the basics while still allowing us to be specific using the criteria. In this naïve implementation there is not much that would be shared, but add things like logging, exception handling and validation and there is LoC to be saved here. Notice that the repository is executed and returns an IEnumerable<T> with the result as we expect all communication with the store to go through the repository.

The query objects then implement the ICriteria<T> interface and adds any filtering needed. An example query can look like this:

public class WarehousesWithReservableQuantitiesFor : ICriteria<Warehouse>
{
    private readonly string _itemCode;
    private readonly int _minimumRemaingQuantity;

    public WarehousesWithReservableQuantitiesFor(string itemCode, 
                                                int minimumRemaingQuantity)
    {
        _itemCode = itemCode;
        _minimumRemaingQuantity = minimumRemaingQuantity;
    }

    IQueryable<Warehouse> 
        ICriteria<Warehouse>.BuildQueryFrom(ObjectSet<Warehouse> queryBase)
    {
        return (from warehouse in queryBase
                from stock in warehouse.ItemsInStock
                where stock.Item.Code == _itemCode 
                        && (stock.Quantity - stock.ReservedQuantity) 
                                > _minimumRemaingQuantity
                select warehouse)
                .AsQueryable();
    }
}

There is a couple of things to notice here. First of all the interface is implemented explicit, this “hides” the method from any code that isn’t, and shouldn’t be, aware that there is the possibility to create a query here. Remember: …It is also the boundary where communication with the storage starts and ends….

Another thing to note is that it only handles the query creation, not the execution of it. That is still handled by the Generic repository. For me, using the above type of repository / query separation achieves several goals.

There is high reuse of the plumbing. We write it once and use it everywhere:

var customers = new Repository<Customer>();
var warehouses = new Repository<Warehouse>();

This makes it fairly quick to start working with new entities or change plumbing strategies.

Usage creates clean code that clearly communicates intent:

 

var reservables = warehouses.Matching
	(new WarehousesWithReservableQtyFor(code, 100));

 

Several small classes with one specific purpose instead of a couple of huge classes with a loads of methods.

image

It might seem like a small difference. But the ability to focus on just the code for a single query in one page and the ease of navigating to queries (especially if you use R#’s Find type) makes this an enormous boost in maintainability.

The above example is based on Entity Framework, but I’ve successfully used the same kind of implementation with NHibernate and Linq To SQL as well.

Composing Critierias

By utilizing Decorators or similar composition patterns to build the criteria’s, it’s possible to compose queries for each scenario. Something like:

var premiumCustomers = customers.Matching(
	new PremiumCustomersFilter( 
		new PagedResult(page, pageSize) 
);

Or:

var premiumCustomers = customers.Matching(
	new PremiumCustomersFilter(),  
	new FromCountry("SE")
);

The implementation of the above examples is outside the scope of this post and is left as an exercise to the reader for now.

Repositories and testing

In my experience there is little point of Unit testing a repository. It exists as a bridge to communicate with the store and therein lies it value. Trying to unit test a repository and/or it’s query often turns out to test how they use the infrastructure, which has little value.

That said, you might find it useful to ensure that logging and exception handling works properly. This turns out to be a limited set of tests, especially if you follow the implementation above.

Integration tests is another story. Validating that queries and communication with the database acts as expected is extremely important. How to be effective in testing against a store is another subject which we won’t be covering here.

Making repositories available for unit testing to other parts of the system is fairly simple. As long as you honor the boundary mentioned earlier and only return well known interfaces or entities (like the IEnumerable<T>), mocking or faking repositories will be easy and technology agnostic (ex. using rhino mocks):

ProductListRepository = 
	MockRepository.GenerateMock<IRepository<ProductListRules>>();

 ProductListRepository.Expect(
		repository => repository.Matching(new RuleFilter("PR1"))
                                 .Return(productListRules);

 

In summary

The repository pattern in conjunction with others is a powerful tool that lowers friction in development. Used correctly, and honoring the pattern definitions, you gain a lot of flexibility even when you have testing in the mix.

To read more about repositories I suggest picking up a copy of [PoEAA] or [DDD].

Read more patterns explained and exemplified here

 

Performance differences between LINQ To SQL and NHibernate

In my current project one of the actions I’ve taken is to have the project and team move away from Linq To Sql to NHibernate. There was a multitude of issues that was the basis for this move, some of the main reasons are outlined in my post:“Top 10 reasons to think twice about using Linq To Sql in your project” but there was also others, like the inability to tweak Linq To Sql to perform in different scenarios which often lead to stored procedures. NHibernate have a multitude of buttons to push and can tweak almost every aspect of the data access which gives us more options.

This post is not about the lack of optimizing options in LTS, nor the options in NHibernate. It’s just to illustrate a simple truth; NHibernate is more mature and have had time to optimize the core functionality like object creation (materialization). Compare these two images:

 

LTSProfiler_small

Linq To Sql Materialization 703ms -  DB access 7 ms

 

NhibernateProfiler_small

NHIbernate materialization 159 – DB access 7 ms

Unfortunately I can’t show the real model that we loaded but it was an aggregate with a couple of lists which where 3 levels deep.

Another thing to note here: Linq To Sql needed a stored procedure to load this (due to the fact that load spans work really really bad in depths deeper then 2 levels, read more here: LINQ To SQL: Support for Eager Loading, Really? ) NHibernate usages a couple of batched select queries that was built using NHibernate Criteria and the Future feature.

Look at the database execution times, both are 7ms. So can we please kill the “stored procedures are faster then ORM generated queries”-debate now? Each of these scenarios is run on the same database with the same indexes and both had a database that where equally warmed up before the runs.

Managing Parent/Child relationships with NHibernate (Inverse management)

When working with parent/child relationships in object models it is important to know what kind of Inverse Management your ORM technology have. Inverse management means handling all the relationships and keys shared between the parent and the child. This post will help you understand how NHibernate manages these relationships and what options you have.

Standard parent / child with inverted properties

The standard parent / child object model usually looks something like the below picture;

image figure 1, Standard parent / child

In figure 1 you see that the comment entity has an inverse property back to product.

Note:
NHibernate requires you to manually set the product property on the comment to the correct object, it has no automatic inverse management of properties. This is usually done by adding an “AddComment” method on the product that encapsulate the logic needed to get the relationship right.

The below represents how the foreign key constraint in the database looks. Figure 2 shows the standard parent / child;

image

figure 2, parent child database model

In this case the inverted property ensures that the comment object itself will “contain” a copy of the product Id to insert into the database. You tell NHibernate about this relationship and how to handle the keys by setting up the mapping like listing 1;

<class name="Product" table="Products">
    ...
  <bag name="Comments" inverse="true" cascade="all">
    <key column="ProductId" />
    <one-to-many class="Comment"/>
  </bag>
</class>

<class name="Comment" table="Comments">
  ...
  <many-to-one name="Product" column="productId" />
</class>

listing 1, standard parent / child mapping

Using the above xml NHibernate will have enough to figure out that the product id should be persisted into the comments table together with the comment itself.

The bag mapping tells the product that there is an inverse property on the comment and instructs NHibernate to let the comment handle the relationship on it’s own.

Variation 1, no inverse property on the child

A common approach in object modeling is to use aggregate roots and just let the relationship flow from the parent to the child, not an inverse back. This makes sense when you think about the object behaviors; comment will never stand on its own, it will always be accessed through the product.

Figure 3 illustrates how the such a model looks like;

imagefigure 3, Aggregate model

This approach leaves NHibernate a bit dry. In this variation; comment can’t stand on its own and will not be able to deliver the product id to the database. It will instead rely on the comments list from the product to provide that. NHibernate needs to be told that this is your intention the bag declaration has to be changed into;

<bag name="Comments" inverse="false" cascade="all">

NHibernate now knows that the comment entity doesn’t have a parent property that points back.

There is a caveat with this though, NHibernate waits a bit to insert the identity of the product into the comment. Figure 4 shows the statements NHibernate sends to the database;

image figure 4, Statements sent to the database

As you can see, the product id is sent in a separate statement after the rows have been inserted. This means that the product Id column in the comments table has to be nullable. As long as this save will be in a transaction and the amount for rows are small, this will be a viable solution. Just be aware of the mechanics NHibernate uses.

Variation 2, the hybrid solution

If you don’t want the inverse property and can’t set the foreign key to nullable the two above solutions won’t help you. For this variation you need to put a hybrid solution together.

This is a similar to the standard parent / child, but instead of the full entity we will only use a protected field on the comment. The field you want to add would look something like the following;

protected int _MAP_productId;

which then would be mapped like a regular property, not an object reference;

<property name="_MAP_productId" access="field" />

Note: It’s usually a very good idea to name the field with an awkward name like the one above, this ensures that developers after you will think twice before using it for any other purpose then mapping. This is also a place where I would consider adding a code comment.

To set the field you could either create a constructor or expose an internal property that the product can use. Don’t try to write to the field from the outside directly, NHibernate has issues with internal fields and making it public will just be ugly.

The drawback with this approach is that NHibernate won’t be able to automatically set the identity on the comment. This means that you have one of two options for getting that product id:

  1. Don’t use auto-generated Id’s, make sure you assign one to the product before adding any comments.
  2. Save the product first, before adding any comments to it. This way the Id will be set in time.

I’m sure there is an extension point somewhere in NHibernate that would allow for the above variation to be automatically handled. I will get back to you when and if I find it.

Summary

The object model and relational model are different schemas and as such compromises have to happen. NHibernate makes a very good job in hiding those compromises in most cases, but when it comes to inverse management you the developer need to take a stand on what compromise is the right one for your solution. Now you know your options, choose wisely.

Resources

Nhibernate project website:

http://www.nhforge.org

NHibernate documentation about parent / child:

https://www.hibernate.org/hib_docs/nhibernate/html/example-parentchild.html

NHProf application by Ayende that was used to inspect the queries sent:

http://www.nhprof.com

Top 10 reasons to think twice about using Linq To Sql in your project

I love ORM technology. I use a lot of it when building applications. I never did get completely in love with Linq To Sql though.  I’ve been using it a lot to teach ORM fundamentals just because the learning curve is really low. It just hasn’t appealed to me for production systems.

In my current project I’ve had reason to revisit the reasons why Linq to Sql failed to appeal me and this is the top 10 list.

10) A lot of the generated T-SQL queries are unnecessary complex. Take a look at the query generated by a simple  “firstName like ‘%a%’” kind of query:

SELECT [t0].[id], [t0].[firstname], [t0].[lastname], [t0].[streetaddress],
[t0].[city], [t0].[zipcode], [t0].[email]
FROM [dbo].[Director] AS [t0]
WHERE (
    (CASE
        WHEN (DATALENGTH(@p0) / 2) = 0 THEN 0
        ELSE CHARINDEX(@p0, [t0].[firstname]) - 1
     END)) > @p1

9) Associations between entities in most cases means keys has to be expressed twice in the model (once in the child entity and once in the parent entity). This leads to awkward entity design.

8) There is no batch fetch or batch insert/update for when you want to send a lot of rows to the database in one go. This seriously cripples Linq To Sql’s ability to handle anything but smaller object graphs.

7) Linq To Sql lacks the ability to do many-to-many relationships.

6) Inheritance is limited to one of the least useful scenarios; discriminator columns.

5) You can’t break up a big table into a nice object graph. There is no support for components so you can’t have a table that look like this:

directors

And have an entity that looks like this:

directorsModel 

4) You can’t map your own value objects. So instead of a reusable class like this:

 public class Director
    { ...
       public Email Email { get; set; }
    ... }    

You need to do this:

    public partial class Director
    {   ...
        partial void OnEmailChanging(string value)
        {
            _validator.ValidateEmail(value);
        }        ...
    }

3) There is insufficient support for loading N+1 relationships. Linq To Sql only supports joins of 1+1 level graphs, for N+1 it starts throwing out selects. I have blogged about this before: Linq To Sql: Support for Eager Loading, Really?

2) It’s table driven and enforces the database design upon your entity model, leaving you no room for clever modeling in code that aren’t available in relational table structure models.

1) Microsoft has announced that Entity Framework is their recommended data access strategy.

http://blogs.msdn.com/adonet/archive/2008/10/31/clarifying-the-message-on-l2s-futures.aspx 

So to sum up, Linq To Sql just don’t meet my needs for a competent ORM that allows me to model my application as I wish. It doesn’t give me the high performance from the data access I want either.

If the limitations I see don’t worry you are aren’t applicable in your projects, by all means use it.