james mckay dot net

Blah blah scribble scribble waffle waffle

If your tests aren’t hitting the database, you might as well not write tests at all

Out of all the so-called “best practices” that are nothing of the sort, this one comes right up at the top of my list. It’s the idea that hitting the database in your tests is somehow harmful.

I’m quite frankly amazed that this one gets as much traction as it does, because it’s actively dangerous. Some parts of your codebase require even more attention from your tests than others — in particular, parts which are:

  1. easy to get wrong
  2. tricky to get right
  3. not obvious when you’re getting it wrong
  4. difficult to verify manually
  5. high-impact if you do screw up.

Your data access layer, your database itself, and the interactions between them and the rest of your application fall squarely into all the above categories. There are a lot of moving parts in any persistence mechanism — foreign key constraints, which end of a many-to-many relationship you declare as the inverse, mappings, migrations, and so on, and it’s very easy to make a mistake on any of them. If you’ve ever had to wrestle with the myriad of obscure, surprising and gnarly error messages that you get with both NHibernate and Entity Framework, you’ll know exactly what I mean.

If you never test against a real database, but rely exclusively on mocking out your data access layer, you are leaving vast swathes of your most error-prone and business-critical functionality with no test coverage at all. You might as well not be testing anything.

Yes, tests that hit the database are slow. Yes, it’s off-putting to write slow tests. But tests that don’t hit the database don’t test things that need to be tested. Sometimes, there are no short cuts.

(Incidentally, this is also why you shouldn’t waste time writing unit tests for your getters and setters or for anaemic business services: these are low-risk, low-impact aspects of your codebase that usually break other tests anyway if you do get them wrong. Testing your getters and setters isn’t unit testing, it’s unit testing theatre.)

“But that rule just applies to unit tests. Integration, functional and regression tests are different.”

I agree there, and I’m not contradicting that. But if you’re saying “don’t hit your database in your unit tests” and then trying to qualify it in this way, you’re just causing confusion.

Regardless of what you are trying to say, people will hear “don’t hit the database in your tests, period.” People scan what you write and pick out sound bites. They see the headline, and skip over the paragraph about integration tests and so on as if it were merely a footnote.

By all means tell people to test their business logic independently of the database if you like, but phrase it in a way that’s less likely to be misunderstood. If you’re leaving them with the impression that they shouldn’t be testing their database, their data access layer, and the interaction between them and the rest of your application, then even if that isn’t your intention, you’re doing them a serious disservice.


Interchangeable data access layers == stealing from your client

This is one of those so-called “best practices” that crops up a lot in the .net world:

You need to keep the different layers of your application loosely coupled with a clean separation of concerns so that you can swap out your data access layer for a different technology if you need to.

It all sounds right, doesn’t it? Separation of concerns, loose coupling…very clean, SOLID, and Uncle Bob-compliant, right?

Just. A. Minute.

The separation of concerns you are proposing is high-maintenance, high-friction, usually unnecessary, obstructive to important performance optimisations and other requirements, and, as this post by Oren Eini aka Ayende Rahien points out, usually doesn’t work anyway.

In what universe is it a best practice to allocate development time and resources, for which your client is paying, towards implementing a high-maintenance, high-friction, broken, unnecessary, non-functional requirement that they are not asking for, at the expense of business value that they are?

In the universe where I live, that is called “stealing from your client.”

Nobody is saying here that separation of concerns is bad per se. What is bad, however, is inappropriate separation of concerns — an attempt to decouple parts of your system that don’t lend themselves to being decoupled. Kent Beck has a pretty good guideline as to when separation of concerns is appropriate and when it isn’t: you should be dealing with two parts of your system which you can reason about independently.

You can not reason about your business layer, your presentation layer, and your data access layer independently. User stories that require related changes right across all your layers are very, very common.

Every project that I’ve ever seen that has attempted this kind of abstraction has been riddled with severe SELECT n+1 problems that could not be resolved without breaking encapsulation.

(Nitpickers’ corner: I’m not talking about test mocks here. That’s different. It’s relatively easy to make your test mocks behave like Entity Framework. It’s orders of magnitude harder to make NHibernate or RavenDB behave like Entity Framework.)

If you can present a valid business case for making your persistence mechanism interchangeable, then it’s a different matter, of course. But in that case, you need to implement both (or all) the different options up-front right from the start, and to bear in mind that the necessary separation of concerns almost certainly won’t cleanly follow the boundary between your business layer and your DAL. You should also warn your client of the extra costs involved, otherwise you won’t be delivering good value for money.


The Anaemic Business Layer

The three-layer architecture, with your presentation layer, your business layer and your data access layer, is a staple of traditional .net applications, being heavily promoted on sites such as MSDN, CodeProject and the ASP.NET forums. Its advantage is that it is a fairly canonical way of doing things, so (in theory at least) when you get a new developer on the team, they should have no trouble in finding where everything is.

Its disadvantage is that it tends to breed certain antipatterns that crop up over and over and over again. One such antipattern is what I call the Anaemic Business Layer.

The Anaemic Business Layer is a close cousin of the Anaemic Domain Model, and often appears hand in hand with it. It is characterised by business “logic” classes that don’t actually have any logic in them at all, but only shunt data between the domain model returned from your ORM and a set of identical model classes with identical method signatures in a different namespace. Sometimes it may wrap all the calls to your repository in catch-log-throw blocks, which is another antipattern in itself, but that’s a rant for another time.

The problem with the Anaemic Business Layer is that it makes your code much more time consuming and difficult to maintain, since you have to drill down through more classes just to figure out what is going on, and you have to edit more files to make a single change. This in turn increases risk because it’s all too easy to overlook one of the places where you have to make a change. It also makes things restrictive, because you lose access to certain advanced features of your ORM such as lazy loading, query shaping, transaction management, cross-cutting concerns or concurrency control, that can only properly be handled in the business layer.

The Anaemic Business Layer is usually symptomatic of an over-strict and inflexible insistence on a “proper” layered architecture, where your UI is only allowed to talk to your business layer and your business layer is only allowed to talk to your data access layer. You could make an argument for the need for encapsulation — so that you can easily change the implementation of the methods in the business layer if need be — but that’s only really important if you’re producing an API for public consumption by the rest of the world. Your app is not the rest of the world, and besides, those specific changes tend not to happen (especially for basic CRUD operations), so I’d be inclined to call YAGNI on that one.

The other reason why you might have an Anaemic Business Layer is that you’ve got too much going on in your controllers or your data access layer. You shouldn’t have any business logic in either, as that hinders testability, especially if you’re of the school of thought that says your unit tests shouldn’t hit the database. But if that’s not the case, then it’s time to stop being so pedantic. An Anaemic Business Layer serves no purpose other than to get in the way and slow you down. So ditch your unhelpful faux-“best practices,” bypass it altogether, and go straight from your UI to your repository.


Your best practices are (probably) nothing of the sort

Now I have nothing against best practices per se.

But if you are going to tell me that something is a “best practice,” please first make sure that it really is a best practice. The software development world is plagued by so-called “best practices” that are nothing of the sort, that just introduce friction, ceremony and even risk without offering any benefits whatsoever in return. Some of them were once perfectly valid but have been superseded by developments in technology; some of them were based on widely held assumptions that have since been proven to be incorrect; some of them are based on total misunderstandings of something that someone famous once said; and some of them are just spurious.

I’ll give one example here, which came up in a discussion on Twitter the other day. It’s quite common for people to put their interfaces in one assembly, their business logic in a second, their repositories in a third, their models in a fourth, their front end in a fifth, and so on. This is all done in the name of “having a layered architecture.” The problem with this is that it makes dependency management harder (in fact in the pre-NuGet days it was an absolute nightmare) and forces you to jump around all over the place in your solution when making changes to related classes. It just adds friction, without even solving the problem it claims to solve: separate assemblies are neither necessary nor sufficient for a layered architecture. Oh, and it also violates the Common Closure Principle, which states that classes that change together must be packaged together.

Unfortunately, these so-called “best practices” proliferate because most developers lack the courage to question them, for fear of being viewed as incompetent or inexperienced by those with the authority to hire, fire or promote them. The people who promote garbage “best practices” tend to have Many Years Of Experience At Very Impressive Sounding Companies, and if you’re not that experienced (or confident) yourself, that can be quite intimidating. You don’t agree that we should put our interfaces, enums, business classes, repositories and presentation layers in separate assemblies? You obviously don’t understand a layered architecture!

Don’t let that intimidate you though. When somebody tells you that “you’re not following best practices,” it’s an indication that in their case, Many Years Of Experience At Very Impressive Sounding Companies actually means one year of experience repeated many times building run of the mill CRUD applications on outdated technologies at places that store users’ passwords in plain text. They are almost certainly not active on GitHub, or Twitter, or Stack Overflow, they are very unlikely to have hobby projects, and they probably never discuss programming with experts from outside their own team, let alone from other technology stacks.

In other words, The Emperor Has No Clothes.

But when something really is a best practice, it’ll be quite different. For starters, they will cite the practice concerned by name. They won’t tell you that “you’re not following best practices” but that “you’re violating the Single Responsibility Principle” or “you’re making test driven development harder” or “You Ain’t Gonna Need It” or something else specific. Another hallmark of a genuine best practice is that it will have tangible, enumerable benefits that are actually relevant to your situation. Here are some questions you can and should ask about it:

  1. Does it make it easier to get things right?
  2. Does it make it harder to get things wrong?
  3. Does it make it easier to back out when things go wrong?
  4. Does it make it easier to diagnose problems?
  5. Does it make it easier to get things done faster and with less effort without compromising points 1-4?
  6. Does it deliver the benefits that it claims to deliver? What evidence do you have that it does?
  7. Does it solve problems that you are actually likely to face, or is it one big YAGNI-fest?
  8. Are the problems that it solves still relevant, taking into account the current state of technology, market forces, security threats, and legislative or regulatory requirements?
  9. What alternative approaches have you considered, and how did they compare? It’s nonsensical to talk about “best practices” when you have nothing to compare them against, because the word “best” is meaningless in a sample size of one.
  10. Do its benefits actually outweigh its costs? In practice? In your situation?
  11. Have you understood it correctly, and are you sure you’re not confusing it with something else?

Any best practice that is worth following will stand up to scrutiny. And scrutinised it should be. Because blindly doing something just because somebody cries “best practices” is just cargo cult. And cargo cult programming is never a best practice.