james mckay dot net
because there are few things that are less logical than business logic

December 2010

Finding bugs with a binary search of your source control history

Mercurial’s bisect command is a fantastically useful tool when you’re faced with a bug.

It’s a very simple idea. You start off with your latest revision, which you know has the bug, go back to a revision that you know didn’t have the bug, and do a binary search until you find the revision that introduced it.

So let’s say your latest revision was number 500. You’d mark that one as bad, then test, say, revision 100, find that it works as expected, and mark that as your last known good revision. Mercurial will then automatically update to revision number 300 (halfway in between) for you to test. Mark as good or bad as appropriate, lather, rinse and repeat until you find the change that introduced the bug.

With every test that you make, the difference between the “good” and “bad” revisions decreases by a half, quickly narrowing the gap:

bisect1

Consequently you will be able to pinpoint the breaking change after approximately log2 n tests, so a thousand revisions would only take one more test than 500, and a million would only take one more test than 500,000. Once you’ve found the offending change, you can very easily zoom right in on the problematic lines of code, rather than having to spend ages stepping through it all in the debugger.

You don’t need to be using Mercurial to apply this technique. You can do it manually with any version control tool, though you will need to keep a manual note of what’s what if it doesn’t provide you with the necessary tools to do it. It can also be pretty slow with centralised tools, since you have to hit the network for every test.

There are a couple of points to note with this procedure however.

First, bisect is most effective when your revisions are small and serve a single purpose. If the breaking revision changes a lot of code, and tackles too many things at once, it may be difficult to identify the source of the problem once you have located the offending change. This is why it is important to “check in early, check in often.” This is also why good, informative commit summaries are important.

Second, remember that you’re looking for the revision that introduced a specific bug. If a revision does not have this specific bug but has other problems, you should mark it as good nonetheless.

Revisions that don’t compile, or have other problems that prevent you from determining whether the bug exists in the first place, should not be marked as either “good” or “bad” but should be flagged to be skipped. In this case, your “last known good” and “first known bad” revisions won’t be updated, and the number of tests you have to make will increase, slowing down your search. Consequently it is good practice to ensure that every commit that you make to source control should build correctly and ideally also pass all your unit tests where possible. When you’re using a DVCS it can be tempting to disregard this altogether, but if hg bisect reports that your error is somewhere in a string of twenty successive revisions, none of which compiles, you’ll have more of a headache sorting out what’s what. Certainly, broken check-ins should be very much the exception rather than the rule.

Understanding Planning Poker

We’ve been using Planning Poker for a couple of months now in our sprint planning meetings, and on the whole it’s been quite a success. If you’ve never come across it before, Planning Poker is an estimation technique that’s been gaining popularity among agilists recently: it’s not only surprisingly easy and accurate, but it’s also really good fun.

The procedure is as follows. Each participant is given a deck of numbered cards in the first few elements of the Fibonacci series: 1, 2, 3, 5, 8, 13, and so on. One developer chairs the meeting, and a project manager comes along to explain the requirements for each task that needs to be tackled. The developers then discuss what is involved and break the task down into sub-tasks.

Once the task has been broken down in this way, each sub-task is taken in turn. Every member of the team selects a card indicating how long they think it will take (in our team, we measure it in hours, though some teams use somewhat more vague units of measurement). They place it face down on the table, then once everyone has selected their estimate, they all turn their selected cards face up at the same time.

Sometimes, there will be a consensus; on other occasions, there may be one or two dissenters from the majority opinion. In this case, the dissenters are asked to justify their estimates to the rest of the group. If the estimate comes to more than eight hours, the group then further divides that subtask in to smaller tasks.

The “poker” element is a great leveller. It means that the discussion isn’t dominated by the most dominant, or the most senior, member of the group. It means that the introverts are brought into the discussion. But there are some points that some people seem to have difficulty understanding.

First, proposing actual estimates while breaking down the task is strictly forbidden. The time to make your estimate is when you come to put down your card, not before. There is a good reason for this: if you start quoting figures, you will be “anchoring” the estimate and influencing your team mates. Even if they consciously ignore what you have to say, they may subconsciously alter their estimates to bring them into line with yours, whereas they may otherwise have taken into consideration factors that you hadn’t thought about. Of course, if you actually want to influence your team mates in their estimates, you have no business whatsoever doing so. The idea of planning poker is that estimates should be made on the basis of evidence, not subjective opinion. If you do have a strong opinion that differs from the team consensus, you will be given a platform to justify it after everyone has revealed their cards.

If people persist in proposing estimates prematurely, it might be a good idea to institute a “penalty box” into which offenders would be required to forfeit, say, 50 pence per violation, which could then be donated to charity.

Secondly, don’t get too hung up on the accuracy of your estimates. Planning poker uses the Fibonacci sequence for a very good reason: your estimates will be subject to an uncertainty of perhaps 50% or more. Some estimates will be too small and others will be too large, so in the end of the day it will all balance out. Four is not an option because it is well within the uncertainty range of both three and five. Similarly, don’t get into an argument about it: that just wastes time. Besides, if you can’t come to a consensus for a task within about a minute or so, you’ve probably not broken it down clearly enough.

Finally, even if the details are wrong, the estimate may still be right. A few weeks ago, our team estimated for one particular task using Planning Poker when I was on holiday. Since I had the necessary domain knowledge for this particular task, I ended up working on it when I returned. They got the implementation details of the task concerned completely wrong when estimating, but incredibly, the time that I took to complete it turned out to be within about five percent of what was estimated.

Programmer jargon: Blub

About a month or so ago, elite C# gurus such as Eric Lippert and Jon Skeet were discussing the new async and await keywords that are forthcoming in C# 5. Eric Lippert introduced them with a series of blog posts on continuation passing style programming that I haven’t yet got round to reading properly.

People who are into continuation passing tend to wax lyrical about it, portraying it as the latest cure for cancer. But I’m wondering just how much of my time I should spend on it. To me, it looks a bit whacked out — isn’t it just some kind of nonlocal goto, and aren’t goto statements considered harmful? How will it help me write code to keep the Great British Public informed about what their elected representatives are discussing in Select Committees?

You never know. No doubt in a few years’ time, I too will be passing continuations around like sweets at a children’s party. I had similar thoughts a few years back when I first read about LINQ. Why would anyone want to embed SQL statements in their C# code, I thought? It sounded a bit like the bad old days of tag soup in classic ASP. But nowadays I use LINQ all the time and I’ve long since figured that LINQ is not LINQ to SQL. In fact, I don’t see how you can get anything done without LINQ. It’s just so much cleaner than writing a whole bunch of for loops. And it gives you lazy evaluation, which can give you massive performance benefits if you use it properly.

This is a common paradox in programming. Once you get to grips with a new programming language feature, concept or tool, you wonder how on earth you managed to write any code at all without it. But on the other hand, when you look at all the concepts, features and tools that people are waxing lyrical about in those terms, they all look weird, if not completely scary. And when you wax lyrical about something or other, as sure as eggs are eggs, someone will tell you your tool is the weird and scary one.

In his essay, “Beating the Averages,” Paul Graham calls this the Blub Paradox:

Blub falls right in the middle of the abstractness continuum. It is not the most powerful language, but it is more powerful than Cobol or machine language.

And in fact, our hypothetical Blub programmer wouldn’t use either of them. Of course he wouldn’t program in machine language. That’s what compilers are for. And as for Cobol, he doesn’t know how anyone can get anything done with it. It doesn’t even have x (Blub feature of your choice).

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he’s looking down. Languages less powerful than Blub are obviously less powerful, because they’re missing some feature he’s used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn’t realize he’s looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn’t even have y.

I guess the best definition of “Blub” is the collection of programming language features, concepts and tools that you know and understand — or at least, that you can see might come in useful, so you really ought to get round to learning about them. It’s what makes sense from your vantage point. But when you look up and see someone waxing lyrical about something unfamiliar, don’t just write it off as weird. They may quite possibly be right.

How to host WCF services in ASP.NET applications without bloating your web.config

If you add a WCF service to a web project in Visual Studio, it will dump a whole lot of garbage in your web.config file to make it work. If you want to keep your web.config file slim and clean (and you should), there is an alternative. You can of course create the services programmatically in a console application, but how do you do it in a web application?

Simply open the .svc file itself (not the .svc.cs codebehind file) in a text editor (you can right-click on the file and choose “Edit markup” on the context menu), and add the following attribute to the <%@ServiceHost %> tag:

Factory="System.ServiceModel.Activation.WebServiceHostFactory"

This will give you something like this:

<%@ ServiceHost Language="C#" Debug="true" Service="My.Wcf.Service" CodeBehind="Service.svc.cs" Factory="System.ServiceModel.Activation.WebServiceHostFactory" %>

You can then remove all the extraneous cruft that Visual Studio adds to the <system.serviceModel> section in your web.config, thereby keeping it cleaner and more manageable.

The benefits of the command line

The question of the command line is a controversial one, with some people declaring that it should be a core competency without which you should be considered an automatic “no hire,” and others claiming that it’s an anachronistic irrelevance in these days of graphical user interfaces, drag and drop, and integrated everything in Eclipse and Visual Studio.

Now I’ve repeatedly described command line instructions as a bad marketing strategy, and I stand by that. The command line is pretty intimidating if you aren’t already familiar with it: it has a much steeper learning curve than GUI tools, it lacks discoverability, and it only gives you a narrow view of what you’re doing that can seem like peering down the wrong end of a telescope at first. Consequently, there are also some tasks for which it is simply not suited. Furthermore, its support on Windows is abysmal.

But these sentiments of mine are only about marketing, and do not extend to day-to-day use. Over the years, I’ve actually come to appreciate the benefits of the command line. This is probably because I did a lot of work with Linux at my last job, so I had to get to grips with it. (One will recall that I was thrown in right at the deep end by being told to install Gentoo on a spare server.) And in fact, once you’re familiar with it, it offers some significant advantages for many tasks. These are they.

1. Ergonomics and speed.

The command line is keyboard-centric. GUI tools, on the other hand, are mouse-centric.

Mouse-centric environments have a nasty habit of forcing you to reach to and fro between your keyboard and mouse the whole time, which not only slows you down considerably, but can also cause repetitive strain injury. The fact that your keyboard puts a numeric keypad in the way doesn’t help much, and with some otherwise ergonomic keyboards such as the Microsoft Natural 4000, the stretch can be horrendous. That’s why many power users prefer command line instructions and keyboard-focused text editors such as emacs or vim: your hands stay in one place and let you work a lot more quickly and fluently.

Distraction is another factor. When you have to move your hand between the keyboard and the mouse all the time, it can be tempting to focus on mouse-centric tasks such as sorting out your e-mail inbox and surfing the web. On the other hand, when you’re spending most of your time on the keyboard, it’s much easier to stay focused on what you have to do without getting distracted.

2. Scriptability.

Most command-line haters think nothing of writing a long, complex howto document outlining the instructions needed to carry out a particular repetitive task, such as setting up a new project in source control, or building your project, or adding a new component from a template. The only problem with this is that when someone new comes onto the team, they have to follow these instructions through step by step, and the more steps there are to follow through, the more likely they are to misunderstand them, get one of them wrong or out of order, and end up deleting all their files by mistake.

You could write a PowerShell script to automate it — provided, of course, that the tool concerned actually exposes enough functionality through PowerShell or the command line to let you do so — but they generally don’t because that would require going to the trouble of learning a whole new language for something that you don’t do very often. But on the other hand, if you’re used to using the command line (or PowerShell) all the time in the first place, you already have a suitable language at your fingertips, and you don’t need to learn anything new. Scripting your repetitive tasks becomes the easy, logical and natural thing to do.

3. Cut and paste.

Command line instructions are much easier than screenshots to copy and paste into an e-mail, or a howto guide, or a blog post, or an instant messaging conversation. While this is inappropriate for introductory tutorials, it can save a lot of time when you need to Get Things Done.

For the past few days I’ve had to work from home because of the snow. At one point, I had to give a colleague some instructions on how to do one or two simple things with Mercurial. If we had only had TortoiseHg, I would have had to set up a remote assistance session with him, which would have been overkill for what he needed to do. However, with the command line, I was able to simply type the necessary command line instructions straight into Microsoft Office Communicator, and it was all very simple, quick and painless.

In conclusion.

The command line is not a golden hammer. There are some tasks for which it is not well suited, or which it makes considerably more difficult. But for a lot of tasks, it can provide significant benefits. Agility with the command line can still be a useful skill in a developer’s toolkit, and it is quite wrong to write it off completely.