james mckay dot net
because there are few things that are less logical than business logic

How often should you check in code?

There’s a lot of confusion among developers about how often to check in code to source control. Many projects have histories riddled with huge commits making sweeping changes to dozens of files, often with only a vague commit summary or even no commit summary at all. Those projects that have guidelines and policies in place usually don’t have a clear justification for those policies, and some of them are downright unhelpful, such as, “at least once a day,” or “whenever you come to a natural break in your workflow, such as lunchtime.”

The problem is that if you’re all doing everything in a single branch, typically trunk, it is not possible to come up with a straight answer to the question.

Should you check in early, check in often, as Jeff Atwood once described as the golden rule of source control? This ensures that you never lose much code, you keep up to date with everyone else, and you don’t go dark. However, if you’re all working on different tasks on the same branch, you will end up with two sets of unrelated revisions tangled up together in your history, and if one needs to go live, like, yesterday, and the other has had to be put on hold for any reason, as happened to us at the end of our last sprint, you’ll run into difficulties.

Alternatively, you could check in only completed units of work. However, this causes other problems. Deferring check-in until a unit of work is complete often results in huge, monolithic commits that increase the risk of integration conflicts. Furthermore, if you get into a mess attempting to resolve said integration conflicts, there is no way to back out to where you were before you ran svn update. I’ve had colleagues in this situation end up with no option but to roll back to the latest revision in source control, losing days of work that only existed in their working copy in the process.

Furthermore, large, monolithic commits are impossible to describe comprehensively and accurately in a commit summary, and they cause problems when carrying out a binary search of your history for the revision that introduced a bug.

Of course, you should be dividing your work up into smaller units as much as possible anyway to minimise the risk of this happening, but this isn’t always possible. Whichever of the two options you choose, you’re going to run into problems sooner or later.

Having a separate branch for each feature resolves this dilemma neatly. This sounds scary at first if you aren’t used to branching and merging, but providing your tooling supports it, it isn’t as bad as it sounds, since feature branches are usually fairly short, so you don’t get as many Big Scary Merges as you would expect. Besides, even when you do get a Big Scary Merge, it’s better than an otherwise identical Big Scary Commit, because if your attempts to resolve the conflicts go wrong, you can at least roll back to what you had before you attempted the merge and try again.

With that in mind, we can come up with some more sensible guidelines on how often to commit to source control.

1. Every commit should serve one, and only one, purpose.

This is a straightforward corollary to the Single Responsibility Principle. If you have to use the word “and” or “also” in your commit summary, you’re probably checking in too much.

If you have two unrelated changes in your working copy, you need to break them up. This is called the “tangled working copy problem,” and modern SCMs give you tools to sort it out. If you’re using Subversion or TFS, on the other hand, well, you should have been more careful. Unfortunately, it can be pretty hard, or in some cases even impossible, to avoid.

Needless to say, you should never check in code to two separate branches, let alone to two separate products, in a single commit, even if your source control allows you to do so.

2. Every commit should be small enough to be described in detail in the summary.

Your commit message won’t necessarily cover every last line of code in your change. If you’ve added a whole bunch of stuff, as long as it’s reasonably self-explanatory and isn’t riddled with meaningless method names such as doIt(), a single line commit message may suffice. But the combination of your code and your commit message should explain every line that has changed. And if you’ve removed or edited existing code, that will all need explaining in your commit summary too, particularly if it’s counterintuitive or at first sight could be mistaken for a bad practice, such as changing an encoding from UTF-8 to 7-bit ASCII.

If your commit is too large to make this practical, your commit is too large, period.

3. Every commit should build and (usually) pass all your unit tests.

Some DVCS users may disagree with me on this one, insisting that you can use your local history as a sandbox for your commits, so it doesn’t matter, but I stand by it. Broken builds have to be marked as untestable by your bisect tool, which complicates pinpointing the change that introduced the bug. A string of broken builds in succession makes matters worse. Besides, both Git and Mercurial provide mechanisms to allow you to resolve this situation by combining breaking changesets with ones that fix them — namely, interactive rebase (or git commit --amend) and Mercurial Queues respectively.

The only exception to the rule that every commit should pass your unit tests is when you are working in a test-driven manner, where you write a failing test then write code to make it pass. Here, you may want to consider checking in the new test separately from the code to fulfil its requirements, in order to audit just how test-driven your development really is.

4. Use feature branches liberally, and merge to your main development branch only when the task is complete.

This guideline is a more sensible version of “check in only completed units of work.” Single-responsibility, easily describable commits are obviously fairly small and frequent (a few lines of code, representing less than an hour’s work), and usually do not represent a completed unit of work.

That’s why feature branches are so important if you are to observe best practices with source control. In this case, “check in only completed units of work” becomes “integrate only completed units of work,” and the conflict between the two different best practices is thereby resolved. When you merge, always say what you are merging, with an issue number in your bug tracker where appropriate. Don’t just write “Merge.”

In an ideal world, every feature should be developed on a separate branch. With a modern DVCS, this is of course the default, and very easy. With centralised source control, however, it can take considerably more effort depending on your tool and your project setup, but it is by no means impossible. In cases such as these, you may need to make some compromises, and decide on a threshold above which to create a feature branch. But in general, it’s best to keep this threshold as low as you can get away with, or possibly even lower it gradually as you and your team-mates become more confident with branching and merging. Certainly, if you’re doing exclusively trunk-based development, you’re denying yourself a straight answer to the question of how often to check in code, and asking for problems sooner or later. Whatever SCM tool you are using, if you don’t know how to branch and merge with it, you should learn how to do so.