james mckay dot net

because there are few things that are less logical than business logic

September 2010

30
Sep

Solving the tangled working copy problem with hunk selection and Mercurial Queues

Programming is full of dilemmas.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, so you pop in the fix and carry on with your new functionality.

A few minutes later, you notice that another method is pulling in an RSS feed from a hard coded, and outdated, source, so you stop to extract it to a configuration setting and use the more up to date feed.

You finish fixing up your new functionality, then you come to check in your code. Now, you have a problem. You have three separate changes tangled up in your working copy.

Most developers would simply bundle all three changes into a single commit, possibly only leaving a commit summary (you do fill in your commit summaries, don’t you?) saying “Added some new payment options to application.” This is misleading, because it doesn’t say anything about the race condition or the RSS fix.

You could say “Added some new payment options to application, fixed a race condition and used a more up to date feed.” But this doesn’t make it all that clear which part of your commit fixes which problem. Someone looking through your history six months later might see your race condition fix has introduced a regression, not realise that it is there to fix a race condition, and revert it to what it was before.

You really need to observe the Single Responsibility Principle, and split the three tasks into separate commits.

So, what do you do?

With traditional source control tools, you are likely to be told, “You should have shelved your changes, reverted your working copy, and performed these tasks as separate commits. Or, if your source control tool doesn’t support shelving, you should save a patch, then revert your working copy, then make the new change, then re-apply the patch.”

There’s just one problem with this bit of advice. It is inefficient, and a total mismatch to the way your average programmer’s brain works.

To see why, let’s rewind your last half hour of coding and start again.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, but you need to keep these changes separate.

So you shelve your changes, revert your working copy, getting prompted to save/reload/merge your files in the process, and then Visual Studio insists on reloading your entire solution because you had changed something in the .sln file. And since your solution contains more than three projects and they reference more than two assemblies that aren’t in the GAC, it takes forever to reload and you’ve got distracted onto something else while you’re waiting.

By the time you manage to start editing your project again, you’ve been completely knocked out of the zone, and you’ve forgotten why you shelved your changes in the first place.

You see? All the so-called best practice advice about shelving, reverting your working copy, and all that, overlooks one very important fact about programming, namely that it is a mentally intensive discipline that often requires you to juggle several complex details in your mind at once, and even small diversions, such as having to save files and wade through menus to find your shelving tool then think of a name for your shelve set, can have a detrimental effect on your workflow. It adds to the mental burden on you and makes your job more difficult. It’s not a best practice at all, but a workaround for the fact that you don’t have the right tools for the job.

Wouldn’t it be better to just to get the changes down as you notice them and then use a tool that lets you sort out your commits later, going through all the changes you’re checking in, cherry-picking them into a series of separate patches?

Git users wax lyrical about the index, or staging area, because it is designed to solve just this problem. It provides an intermediate store between your working copy and your history, where you can stage your changes, not just one file at a time, but one hunk at a time, using the command git add -p. Once you’ve staged your changes in this way, you can then commit them as a separate, logical change set.

Mercurial has a similar feature in TortoiseHg called “hunk selection.” By double-clicking on a change in the “Hunk selection” tab on the commit dialog, you can include or exclude it from the check-in. If you’re a command line freak, the record extension does something similar, and the crecord extension allows you to take it down to the line-by-line level.

image

You can click on “Commit preview” once you’re done to see what’s going to go in your commit.

There’s just one problem with all this though. As Eric Sink points out, you’re checking in a version of your code which you’ve never tested. This is a bad practice, and it can bite you if you ever need to run git/hg bisect to track down a regression.

So let’s sum up what your options are so far.

  • Check in everything in a single commit at once. This is bad practice.
  • Use git add -p or hg record/TortoiseHg’s hunk selection to separate out your changes into separate commits. This is also bad practice.
  • Use shelving and patches to separate out your changes. This is a hack, which slows you down and risks knocking you out of the zone and making you lose track of your changes altogether.

So is there anything we can do to fix this? As a matter of fact, it turns out that there is.

One of my favourite features of Mercurial is the mq (Mercurial Queues) extension. This may sound a little esoteric, but what it does is quite simple. You can put a whole series of commits into a separate staging area, where you can edit them, reorder them, apply them, unapply them, chop and change them, split them up or combine them together, and of course, most importantly, run your unit tests on them, to your heart’s content before applying them to your master repository.

Let’s just say I am working on some changes to my Comment Timeout WordPress plugin. I’ve done two different things: updated the version number to 2.1.2, and tidied up some code formatting. I want to separate these into two different commits. First of all, I select the hunks that I want to go into the first commit, and then I type a name for the patch into the “QNew” box (keep this short, a couple of words should do):

image

You’ll note that the “Commit” button changes to “QNew” to indicate that your next commit goes into the patch queue. Clicking this will automatically show you the patch queue and change the button to “QRefresh”:

image

You can change the message, or edit the files, or select and unselect hunks to your heart’s content, then click QRefresh. Then you can add a second commit by typing another name into the QNew box:

image

Clicking the “QNew” button creates a second patch:

image

Okay, so now we have a whole series of patches. It’s a bit like the Git index, except that rather than having just one staging area, you have several, all stacked one top of each other. In the Repository Explorer, these revisions appear as a regular part of your DAG:

image

The yellow label “qparent” indicates the parent revision on top of which the patch queue is being applied; “qbase” indicates the first patch in the queue; “qtip” indicates the last; and the blue labels give the names of the patches. You could push them to another repository if you wanted, but I don’t recommend this. Keep them on your own machine for the time being.

Now that we’ve separated out our commits into a series of patches, we can get on with the job of placating the people who are worried about best practices. Namely: testing each patch before applying it.

First, double click on “[qparent]”:

image

You’ll note that our two patches have both dropped below the line, and they’re now greyed out. If you take a look at the repository explorer, you’ll see that there’s no sign of them:

image

The last revision has been marked in bold to indicate that that’s the one where your working copy is at.

If you double click on “tidy up” it will move above the line and turn blue again, to indicate that your working copy has been updated to this version:

image

That patch is now where your working copy is at. Do whatever testing you want to do on it, then click on the next one to apply it:

image

Once you’re satisfied that all your patches are ready, right-click on any of them and choose “Finish applied”:

image

Hey presto! Your work is now all committed to your repository, ready to be pushed, pulled or otherwise shared with the big wide world.

image

There are other things you can do with patches in your queue which I haven’t covered here, such as reordering them, or combining two or more of them into one.

Patch queues and hunk selection are two extremely powerful features of Mercurial. While they require a little bit of care and attention in order to adhere to best practices, this is no more arduous than the discipline needed for any source control tool, and they can provide a significant productivity boost, simply because they let your tools work around you rather than forcing you to work around your tools.

20
Sep

On code generation

A lot of developers love code generation. Some pretty smart ones hate it. The following is my observation on the matter.

Code generation done right can save masses of time. Code generation done wrong has the exact opposite effect.

Here is what code generation done wrong looks like. I am asked to help out with your project, and told that it’s corrupting data. After a bit of rootling around, I find two hundred business service classes, each containing numerous very similar looking methods that all follow this pattern:

public bool SaveWidget(Widget widget) {
    try {
        dataSession.Save(widget);
        return true;
    }
    catch (Exception) {
        return false;
    }
}

Pffft. Pokémon exception handling all over the place. No wonder it’s corrupting data.

If you’d left your templates in your solution, with a comment explaining where to find them, I could get rid of Pikachu and Bulbasaur and friends in five minutes. But since you didn’t, I have the thankless task of spending the rest of the week going through the whole lot by hand.

Here’s how to do code generation right:

  1. Do include a header at the top of each autogenerated file, indicating (a) that it has been autogenerated, and (b) what it was autogenerated by.
  2. Do include your template files and/or scripts in source control.
  3. Do re-generate your autogenerated files as part of your build process, before running your unit tests.
  4. Don’t edit autogenerated files manually. Use partial classes and partial methods instead.

By following these four simple rules, you can ensure that if someone comes to your project at a later date and finds that there is a fundamental flaw in your generated code, they can easily fix it.

17
Sep

Diaspora

I was hoping that Diaspora would prove to be a Facebook killer, but somehow, I don’t think that’s going to happen. Their choice of technology stack — Ruby on Rails with a MongoDB back-end — has seen to that.

Rails is something of a darling among geeks, and it is great for web applications that you only install on your own servers. So too are the plethora of NoSQL databases that seem to be all the rage among that kind of crowd these days. However, they are totally unsuitable for distribution to the general public.

Why? Simply because most cheap’n’nasty web hosts do not support them.

This is something that the WordPress core developers understood. Their decision to continue supporting PHP 4 long after its end of life brought howls of derision from WordPress plugin developers everywhere, but they were thinking first and foremost about their users — people with only limited computer skills who would deploy WordPress on their own shared hosting accounts. They wanted to get the widest possible audience.

Diaspora should be targetting run-of-the-mill Facebook users, not geeks. If it had been written in PHP with a MySQL back end, it would be more likely to see widespread adoption. However, by opting for Ruby on Rails, they’ve placed a barrier to adoption in front of the kind of technically-capable-but-not-particularly-geeky users on a tight budget who would otherwise set up Diaspora seeds for local schools, churches, clubs and so on. They risk turning it into a geek ghetto.

Sigh. Perhaps I’ll just have to put up with Faceborg after all.

04
Sep

Send patchbombs to the mailing list, not pull requests to the project lead

I sent a pull request to the lead developer of an open source project that I’m starting to contribute to. This is the default that both github and bitbucket give you, and the first thing that newbie open source contributors on those sites will think of doing.

I got this response from him:

I’d *massively* prefer patch submission via patchbomb to the … list – that’s generally where they come in, and sometimes people other than me find potential problems in the patch. Can you do that?

D’oh! My bad! It’s the same kind of thing as the rule that you should ask the whole community, not just one of its members. Admittedly I had noticed the existence of the mailing list, but only after I sent off the pull request, so his response did not surprise me.

Moral of the story: if you want to contribute to an open source project, look for a mailing list before you do anything else. And if there is one, use it.