Your Repository is not a Data Access Layer

This post is more than 7 years old.

Posted at 09:00 on 10 July 2018

The Repository pattern has come in for a lot of criticism over the past few years by high-end .NET developers. This is understandable, because in most projects, the Repository layer is usually one of the worst-implemented parts of the codebase.

Now I've been critical of badly implemented Repositories myself, but to be fair, I don't think we should ditch the pattern altogether. On the contrary, I think that we could make much more effective use of the Repository pattern if we just abandoned one popular misconception about it.

Your Repository is (mostly) not a DAL.

If you're wondering what I mean, here is an example of a typical Repository method. It comes from BlogEngine.net, an open source ASP.NET blogging platform, and it is typical of the kinds of Repository methods that you and I have been working with on a daily basis for years:

public CommentsVM Get()
{
    if (!Security.IsAuthorizedTo(Rights.ViewPublicComments))
        throw new UnauthorizedAccessException();

    var vm = new CommentsVM();
    var comments = new List<Comment>();
    var items = new List<CommentItem>();

    var all = Security.IsAuthorizedTo(Rights.EditOtherUsersPosts);
    foreach (var p in Post.Posts)
    {
        if (all || p.Author.ToLower() == Security.CurrentUser.Identity.Name.ToLower())
        {
            comments.AddRange(p.Comments);
        }
    }
    foreach (var c in comments)
    {
        items.Add(Json.GetComment(c, comments));
    }
    vm.Items = items;

    vm.Detail = new CommentDetail();
    vm.SelectedItem = new CommentItem();

    return vm;
}

Now this isn't bad code. It's actually quite clean code. It's clear, well-formatted, and easy to understand, even if returning a ViewModel from your Repository does make me twitch a bit. But where is the data access logic?

There is not a single line in this code that tells me which underlying persistence mechanism is being used. Are we talking to Entity Framework? To NHibernate? To RavenDB? To a web service? To Amazon DynamoDB? Or to a program for comparing human and chimp genomes? In just about every .NET project that I've encountered, the Repository classes are all populated with methods just like this one. They may contain some LINQ queries, but these won't give me any indication either. Yet in every single case, they've been in projects called My.Project.DAL or something along those lines.

We're sometimes told that the role of the Repository layer is to abstract away your data access logic from your business logic. But in methods such as this, the data access logic appears pretty thoroughly abstracted to me already.

No, this is business logic, pure and simple.

Why we've been thinking of the Repository as a DAL

The reasons why Repositories are viewed as a data access layer are purely historical. The classic three-layer architecture dates back to the late 1990s, when everybody thought that stored procedures were the One True Best Practice™, and that moving your BLL and DAL onto separate hardware was the right approach to scalability problems that almost nobody ever had to face in the Real World. Back in the early days of .NET 1.0, your typical Repository contained method after method that looked something like this:

public User GetUser(int userID)
{
    using (SqlConnection cn = new SqlConnection(connectionString))
    using (SqlCommand cmd = new SqlCommand("usp_GetUser", cn)) {
        cmd.Parameters.Add(new SqlParameter("@UserID", userID));
        cmd.CommandType = CommandType.StoredProcedure;
        using (SqlDataReader reader = cmd.ExecuteReader()) {
            if (reader.Read()) {
                return new User(
                    (int)reader["UserID"],
                    (string)reader["UserName"],
                    (string)reader["DisplayName"],
                    (string)reader["Email"]
                );
            }
            else {
                return null;
            }
        }
    }
}

It was pretty much in-your-face that this was data access code. It was also very, very tedious and repetitive to maintain. It was this tedium that gave rise to modern O/R mappers, and in fact, in the early days, offerings such as LLBLGen Pro and NHibernate were sometimes actually referred to as "generic DALs." Then, eventually, Microsoft got in on the act with Entity Framework.

In a nutshell, your data access layer is now Entity Framework itself.

Your Repository is first and foremost business logic

The problem with viewing modern-day Repositories as a DAL is that it demands that you draw a clear distinction between data access logic and business logic while obfuscatinig that very distinction.

I'm yet to see a clear, coherent definition of where the distinction lies. The nearest I can get is a vague and woolly concept of LINQ code as being data access on the grounds of equally vague and woolly concepts of IQueryable<T> being tight coupling. Now Mark Seemann makes some valid points in his blog post -- LINQ is indeed a leaky abstraction -- but what that means in practice is that if you run up against the leaks in the abstraction, you are dealing with inseparable concerns, which simply can't be categorised cleanly as either business logic or data access logic, and have to be tested using integration tests rather than unit tests. Another example of inseparable concerns is where you have to bypass Entity Framework altogether to go directly to the database, for example, for performance reasons.

In fact, LINQ may be a leaky abstraction, but it's a much better abstraction than any alternative you're going to come up with. Once again, LINQ code gives you no indication whatsoever of what underlying data access mechanism you are actually using, and in many cases you can -- and should -- test anything you do with IQueryable<T> without hitting the database. In any case, query construction implements business rules and is therefore well and truly a business concern.

So what is the Repository pattern, as implemented in most projects, best for? Simple: a query layer. While query objects are a better choice for more complex queries, and extension methods on IQueryable<T> should be considered seriously for cross-cutting concerns such as paging and sorting, for simpler queries with only a few arguments each, a Repository is not a bad choice.