Solving the tangled working copy problem with hunk selection and Mercurial Queues

This post is more than 13 years old.

Posted at 07:00 on 30 September 2010

Programming is full of dilemmas.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, so you pop in the fix and carry on with your new functionality.

A few minutes later, you notice that another method is pulling in an RSS feed from a hard coded, and outdated, source, so you stop to extract it to a configuration setting and use the more up to date feed.

You finish fixing up your new functionality, then you come to check in your code. Now, you have a problem. You have three separate changes tangled up in your working copy.

Most developers would simply bundle all three changes into a single commit, possibly only leaving a commit summary (you do fill in your commit summaries, don’t you?) saying “Added some new payment options to application.” This is misleading, because it doesn’t say anything about the race condition or the RSS fix.

You could say “Added some new payment options to application, fixed a race condition and used a more up to date feed.” But this doesn’t make it all that clear which part of your commit fixes which problem. Someone looking through your history six months later might see your race condition fix has introduced a regression, not realise that it is there to fix a race condition, and revert it to what it was before.

You really need to observe the Single Responsibility Principle, and split the three tasks into separate commits.

So, what do you do?

With traditional source control tools, you are likely to be told, “You should have shelved your changes, reverted your working copy, and performed these tasks as separate commits. Or, if your source control tool doesn’t support shelving, you should save a patch, then revert your working copy, then make the new change, then re-apply the patch.”

There’s just one problem with this bit of advice. It is inefficient, and a total mismatch to the way your average programmer's brain works.

To see why, let’s rewind your last half hour of coding and start again.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, but you need to keep these changes separate.

So you shelve your changes, revert your working copy, getting prompted to save/reload/merge your files in the process, and then Visual Studio insists on reloading your entire solution because you had changed something in the .sln file. And since your solution contains more than three projects and they reference more than two assemblies that aren’t in the GAC, it takes forever to reload and you've got distracted onto something else while you're waiting.

By the time you manage to start editing your project again, you’ve been completely knocked out of the zone, and you’ve forgotten why you shelved your changes in the first place.

You see? All the so-called best practice advice about shelving, reverting your working copy, and all that, overlooks one very important fact about programming, namely that it is a mentally intensive discipline that often requires you to juggle several complex details in your mind at once, and even small diversions, such as having to save files and wade through menus to find your shelving tool then think of a name for your shelve set, can have a detrimental effect on your workflow. It adds to the mental burden on you and makes your job more difficult. It's not a best practice at all, but a workaround for the fact that you don't have the right tools for the job.

Wouldn't it be better to just to get the changes down as you notice them and then use a tool that lets you sort out your commits later, going through all the changes you’re checking in, cherry-picking them into a series of separate patches?

Git users wax lyrical about the index, or staging area, because it is designed to solve just this problem. It provides an intermediate store between your working copy and your history, where you can stage your changes, not just one file at a time, but one hunk at a time, using the command git add -p. Once you’ve staged your changes in this way, you can then commit them as a separate, logical change set.

Mercurial has a similar feature in TortoiseHg called “hunk selection.” By double-clicking on a change in the “Hunk selection” tab on the commit dialog, you can include or exclude it from the check-in. If you’re a command line freak, the record extension does something similar, and the crecord extension allows you to take it down to the line-by-line level.

image

You can click on “Commit preview” once you’re done to see what’s going to go in your commit.

There’s just one problem with all this though. As Eric Sink points out, you’re checking in a version of your code which you’ve never tested. This is a bad practice, and it can bite you if you ever need to run git/hg bisect to track down a regression.

So let’s sum up what your options are so far.

  • Check in everything in a single commit at once. This is bad practice.
  • Use git add -p or hg record/TortoiseHg’s hunk selection to separate out your changes into separate commits. This is also bad practice.
  • Use shelving and patches to separate out your changes. This is a hack, which slows you down and risks knocking you out of the zone and making you lose track of your changes altogether.

So is there anything we can do to fix this? As a matter of fact, it turns out that there is.

One of my favourite features of Mercurial is the mq (Mercurial Queues) extension. This may sound a little esoteric, but what it does is quite simple. You can put a whole series of commits into a separate staging area, where you can edit them, reorder them, apply them, unapply them, chop and change them, split them up or combine them together, and of course, most importantly, run your unit tests on them, to your heart’s content before applying them to your master repository.

Let’s just say I am working on some changes to my Comment Timeout WordPress plugin. I’ve done two different things: updated the version number to 2.1.2, and tidied up some code formatting. I want to separate these into two different commits. First of all, I select the hunks that I want to go into the first commit, and then I type a name for the patch into the “QNew” box (keep this short, a couple of words should do):

image

You’ll note that the “Commit” button changes to “QNew” to indicate that your next commit goes into the patch queue. Clicking this will automatically show you the patch queue and change the button to “QRefresh”:

image

You can change the message, or edit the files, or select and unselect hunks to your heart’s content, then click QRefresh. Then you can add a second commit by typing another name into the QNew box:

image

Clicking the “QNew” button creates a second patch:

image

Okay, so now we have a whole series of patches. It’s a bit like the Git index, except that rather than having just one staging area, you have several, all stacked one top of each other. In the Repository Explorer, these revisions appear as a regular part of your DAG:

image

The yellow label “qparent” indicates the parent revision on top of which the patch queue is being applied; “qbase” indicates the first patch in the queue; “qtip” indicates the last; and the blue labels give the names of the patches. You could push them to another repository if you wanted, but I don’t recommend this. Keep them on your own machine for the time being.

Now that we’ve separated out our commits into a series of patches, we can get on with the job of placating the people who are worried about best practices. Namely: testing each patch before applying it.

First, double click on “[qparent]”:

image

You’ll note that our two patches have both dropped below the line, and they’re now greyed out. If you take a look at the repository explorer, you’ll see that there’s no sign of them:

image

The last revision has been marked in bold to indicate that that’s the one where your working copy is at.

If you double click on “tidy up” it will move above the line and turn blue again, to indicate that your working copy has been updated to this version:

image

That patch is now where your working copy is at. Do whatever testing you want to do on it, then click on the next one to apply it:

image

Once you’re satisfied that all your patches are ready, right-click on any of them and choose “Finish applied”:

image

Hey presto! Your work is now all committed to your repository, ready to be pushed, pulled or otherwise shared with the big wide world.

image

There are other things you can do with patches in your queue which I haven’t covered here, such as reordering them, or combining two or more of them into one.

Patch queues and hunk selection are two extremely powerful features of Mercurial. While they require a little bit of care and attention in order to adhere to best practices, this is no more arduous than the discipline needed for any source control tool, and they can provide a significant productivity boost, simply because they let your tools work around you rather than forcing you to work around your tools.