james mckay dot net
because there are few things that are less logical than business logic

Posts tagged: n-tier deconstructed

Inseparable concerns

Separation of concerns is often cited as the reasoning behind the traditional three-layer architecture. It is important, otherwise you will end up with a Big Ball of Mud.

However, in order to separate out your concerns, you must first categorise them correctly as either business concerns, presentational concerns or data access concerns. Otherwise you will end up with unnecessary complexity, poor performance, anaemic layers, and/or poor testability.

Unfortunately, most three-layer applications completely fail to categorise their concerns correctly. More often than not this is because it is simply not possible to do so, as some concerns fall into more than one category and can’t be refactored out without introducing adverse effects. I propose the term inseparable concerns for such cases.

The key to separation of concerns is to let it be driven by your tests. Under TDD, the first thing you would do if a particular line of code contained a bug would be to write a failing unit test that would pass given the expected correct behaviour. It is what this test does that tells you whether the code under test is a business concern, a data access concern, or a presentational concern.

It is a presentational concern if the test simulates raw user input, or examines final rendered output. For example, mocking any part of a raw HTTP request (GET or POST arguments, cookies, HTTP headers, and so on), verifying the returned HTTP status code, or examining the output generated by a view. In general, if it’s your controllers or your views that you’re testing, it’s a presentational concern.

It is a business concern if the test verifies the correctness of a business rule. Basically, this means that queries are business concerns, period. If they are not returning the correct results, then they have not implemented some business rule or other correctly. Other examples of business concerns include validation, verifying that the data passed to the database or a web service from a command is correct, or confirming that the correct exception is thrown in response to various failure modes.

It is a data access concern if the test requires the code to hit the database. Note that this is where the so-called “best practice” that your unit tests should never hit the database breaks down: if you are adhering to it strictly, sooner or later you will encounter a bug where it stops you from writing a failing test. Most people, when confronted with such cases, skip this step. Don’t: TDD should take precedence. Set up a test database and write the test already.

It is an inseparable concern if it falls into more than one of the above categories. Pretty much any performance-related optimisation that you do will be an example here. For example, if you have to bypass Entity Framework and drop down to raw SQL, you will have to hit the database to verify that business logic is correct. Therefore, it is both a business concern and a data access concern.

Inseparable concerns are much more prevalent than you might expect. IQueryable<T> is the best that we’ve got in terms of making your business and data access layers separable, but, as Mark Seemann points out, it still falls short because NotSupportedException. Another example is calling .Include() on a DbSet to include child entities. Although this is a no-op on Mock<IDbSet<T>>, you can’t verify that you are making the correct calls to .Include() in the first place without hitting the database. Besides which, if you’re mocking DbSet<T> instead of IDbSet<T>, as you’re supposed to be able to do with EF6, calling .Include() throws an exception.

I would just like to stress here that inseparable concerns are not an antipattern—they are a fact of life. All but the simplest of code bases will have them somewhere. The real antipattern is not introducing them, but trying to treat them as if they were something that they’re not.

Moving a problem from one part of your codebase to another does not eliminate it

This is, of course, a statement of the obvious, but I’ve come across quite a few “best practices” in recent years that violate it.

People come up with some design pattern or other, telling you that it solves some problem or other. At first sight, it appears that it does eliminate the problem from one part of your codebase, but on closer inspection it turns out that it merely shifts it to another, and sometimes even introduces other problems in the process.

I first noticed this in a Web Forms application, where our resident Best Practices Guy berated me for using inline data binding expressions in the .aspx files. These were actually simple data binding expressions, with no business logic, a bit like this:

<asp:Repeater id="rptData" runat="server">
  <p>
    <asp:Label Text="<%# Eval("Text") %>" runat="server" />
  </p>
</asp:repeater>

Just like you’ve seen in every Web Forms tutorial since 2001, but he said I should have been looking up the label in the DataBound event and assigning it there instead:

void rptData_DataBound(object sender, RepeaterItemEventArgs e)
{
    var label = e.Item.FindControl("lblParagraph") as Label;
    if (label != null)
    {
        label.Text = ((LineItem)e.Item.DataItem).Text;
    }
}

He claimed that it would prevent problems if I’d mistyped the property name in the .aspx file, because the C# compiler would catch it.

The reason this is a fallacy is that it just moves the problem into your code-behind file. You’re just as likely to mistype the name of the control — lblParagraph — in the string and end up with exactly the same problem. Only it’ll be easier to miss it in testing because the null check means that it will fail silently. On top of that, you’re using more than twice as many lines of code spread over two different files rather than just one to do the same thing.

I noticed a similar problem when I was evaluating OOCSS — a design pattern that’s supposed to reduce duplication in your CSS, by having you declare separate CSS classes for different functional aspects such as “button” or “highlighted” or “media”. Twitter Bootstrap uses it fairly heavily. Its selling point is that it’s supposed to make your CSS more maintainable and lightweight without using a pre-processor by reducing duplication in your stylesheets. Unfortunately, in the process, it introduces a lot of duplication and weight into your HTML because you now have to set additional class declarations on a huge number of elements.

Then of course there’s our old friend, the Repository Facade, whose proponents tell you that it reduces tight coupling between your business layer and your ORM. Of course a generic Repository Facade does this at the expense of making it impossible to optimise your queries for performance, but with a specialised one — where you’re moving your queries into your Repository Facade itself — you’re just moving the tight coupling from one part of your codebase to another. It doesn’t reduce the amount of work that you would have to do to switch your data source in the slightest, and in the process it prevents you from unit testing your business logic independently of the database.

The Repository Facade

Most developers use the term “Repository” to refer to a wrapper or abstraction layer around your O/R mapper, supposedly to let you switch out one persistence mechanism for another. However, if you look at its definition in its historical context, you’ll see that this isn’t what it refers to at all.

The Repository pattern is a part of your O/R mapper itself.

The Repository pattern was first described as follows in Martin Fowler’s Patterns of Enterprise Application Architecture:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

Patterns of Enterprise Application Architecture was written in 2003, at a time when O/R mapping technology was in its infancy. Most ORMs were commercial products, very simple by today’s standards — more akin to the likes of Dapper or PetaPoco than to modern heavyweights like NHibernate or Entity Framework. Hand-rolled data access layers were very much the order of the day. Furthermore, many of the patterns described in P of EAA — Table Data Gateway, Row Data Gateway, Data Mapper, Unit of Work, Identity Map, Lazy Load, and so on, all catalogue what are now different components of modern-day ORMs.

So when the Repository pattern talks about mediating between the domain and “data mapping layers,” it isn’t referring to your ORM as a whole, as most developers seem to assume, but to just one component of your ORM — specifically, the component that copies data from the results of the generated SQL query into your entities. This mediating layer is also an element of functionality provided by modern ORMs.

For example, Entity Framework’s DbSet<T> is a Repository. So too is NHibernate’s ISession, with methods such as QueryOver<T>().

So what is the wrapper class that people write around their ORMs then, the one that they tend to refer to as a Repository? A more accurate term for this is, in actual fact, a Repository Facade.

It’s important to draw the distinction, especially with the debate around whether this pattern has any value or not. Referring to your ORM itself as a Repository makes it easy for people to make the conceptual leap that allows them to just plug Entity Framework straight into their business service classes without the additional layer of abstraction, but on the other hand it can cause a bit of confusion if you then start saying that “the Repository pattern is harmful.” That’s why I’m now being careful to use the term “Repository” to refer to Entity Framework, NHibernate or the RavenDB client itself, and the term “Repository Facade” to refer to the practice of adding an extra abstraction layer around it.

Query Objects: a better approach than your BLL/repository

If you’ve been following what I’ve been saying here on my blog and on the ASP.NET forums over the past month or so, you’ll no doubt realise that I’m not a fan of the traditional layered architecture, with your presentation layer only allowed to talk to your business layer, your business layer only allowed to talk to your repository, only your repository allowed to talk to your ORM, and all of them in separate assemblies for no reason whatsoever other than That Is How You Are Supposed To Do It. It adds a lot of friction and ceremony, it restricts you in ways that are harmful, its only benefits are unnecessary and dubious, and every implementation of it that I’ve come across has been horrible.

Here’s a far better approach:

public class BlogController : Controller
{
    private IBlogContext _context;
 
    public BlogController(IBlogContext context)
    {
        _context = context;
    }
 
    public ActionResult ShowPosts(PostsQuery query)
    {
        query.PrefetchComments = false;
        var posts = query.GetPosts(_context);
        return View(posts);
    }
}
 
[Bind(Exclude="PrefetchComments")]
public class PostsQuery
{
    private const int DefaultPageSize = 10;
 
    public int? PageNumber { get; set; }
    public int? PageSize { get; set; }
    public bool Descending { get; set; }
    public bool PrefetchComments { get; set; }
 
    public IQueryable<Post> GetPosts(IBlogContext context)
    {
        var posts = Descending
            ? context.Posts.OrderByDescending
                (post => post.PostDate)
            : context.Posts.OrderBy(post => post.PostDate);
        if (PrefetchComments) {
            posts = posts.Include("Comments");
        }
        if (PageNumber.HasValue && PageNumber > 1) {
            posts = posts.Skip
                ((PageNumber - 1) * (PageSize ?? DefaultPageSize));
        }
        posts = posts.Take(PageSize ?? DefaultPageSize);
        return posts;
    }
}

A few points to note here.

First, you are injecting your Entity Framework DbContext subclass (the implementation of IBlogContext) directly into your controllers. Get over it: it’s not as harmful as you think it is. Your IOC container can (and should) manage its lifecycle.

Secondly, your query object follows the Open/Closed Principle: you can easily add new sorting and filtering options without having to modify either the method signatures of your controllers or its own other properties and methods. With a query method on your Repository, on the other hand, adding new options would be a breaking change.

Thirdly, it is very easy to avoid SELECT n+1 problems on the one hand while at the same time not fetching screeds of data that you don’t need on the other, as the PrefetchComments property illustrates.

Fourthly, this approach is no less testable than your traditional BLL/BOL/DAL approach. By mocking your IBlogContext and IDbSet<T> interfaces, you can test your query object in isolation from your database. You would need to hit the database for more advanced Entity Framework features of course, but the same would be true with query methods on your repository.

Fifthly, note that your query object is automatically created and populated with the correct settings by ASP.NET MVC’s model binder.

All in all, a very simple, elegant and DRY approach.

(Hat tip: Jimmy Bogard for the original inspiration. This version simply adds the twist of having your query objects created and initialised by ASP.NET MVC’s model binder.)

If your tests aren’t hitting the database, you might as well not write tests at all

Out of all the so-called “best practices” that are nothing of the sort, this one comes right up at the top of my list. It’s the idea that hitting the database in your tests is somehow harmful.

I’m quite frankly amazed that this one gets as much traction as it does, because it’s actively dangerous. Some parts of your codebase require even more attention from your tests than others — in particular, parts which are:

  1. easy to get wrong
  2. tricky to get right
  3. not obvious when you’re getting it wrong
  4. difficult to verify manually
  5. high-impact if you do screw up.

Your data access layer, your database itself, and the interactions between them and the rest of your application fall squarely into all the above categories. There are a lot of moving parts in any persistence mechanism — foreign key constraints, which end of a many-to-many relationship you declare as the inverse, mappings, migrations, and so on, and it’s very easy to make a mistake on any of them. If you’ve ever had to wrestle with the myriad of obscure, surprising and gnarly error messages that you get with both NHibernate and Entity Framework, you’ll know exactly what I mean.

If you never test against a real database, but rely exclusively on mocking out your data access layer, you are leaving vast swathes of your most error-prone and business-critical functionality with no test coverage at all. You might as well not be testing anything.

Yes, tests that hit the database are slow. Yes, it’s off-putting to write slow tests. But tests that don’t hit the database don’t test things that need to be tested. Sometimes, there are no short cuts.

(Incidentally, this is also why you shouldn’t waste time writing unit tests for your getters and setters or for anaemic business services: these are low-risk, low-impact aspects of your codebase that usually break other tests anyway if you do get them wrong. Testing your getters and setters isn’t unit testing, it’s unit testing theatre.)

“But that rule just applies to unit tests. Integration, functional and regression tests are different.”

I agree there, and I’m not contradicting that. But if you’re saying “don’t hit your database in your unit tests” and then trying to qualify it in this way, you’re just causing confusion.

Regardless of what you are trying to say, people will hear “don’t hit the database in your tests, period.” People scan what you write and pick out sound bites. They see the headline, and skip over the paragraph about integration tests and so on as if it were merely a footnote.

By all means tell people to test their business logic independently of the database if you like, but phrase it in a way that’s less likely to be misunderstood. If you’re leaving them with the impression that they shouldn’t be testing their database, their data access layer, and the interaction between them and the rest of your application, then even if that isn’t your intention, you’re doing them a serious disservice.