james mckay dot net

because there are few things that are less logical than business logic

February 2011

22
Feb

On named branches in Mercurial

There seems to be a common misconception among some Git users that in order to branch your code in Mercurial, you have to clone your repository. While some Mercurial users prefer to work that way, it isn’t actually necessary, and Mercurial does provide you with a much more lightweight alternative. The easiest way to branch your code is simply to hg update to the revision off which you wish to branch, then when you next hg commit, it will implicitly create a new branch for you. Similarly, when you hg merge, it will implicitly close the branch off. I tend to use a mixture of the two approaches, with repository clones for longer-running feature branches, and in-place branching for ad-hoc experimentation, smaller features, and the like.

A lot of confusion seems to centre round the concept of named branches though. If you’re used to the way Git works, you’d be forgiven for thinking that pulling from a remote repository would replace your “foo” branch with the incoming one, sending your work off to be garbage collected unless you merge immediately after pulling. Mercurial doesn’t actually work that way — what you get is two parallel branches, both called “foo”, which you can then merge, rebase or strip out as appropriate. This is because Mercurial tends to view the DAG as more immutable than Git does, and if you want to remove branches that are no longer needed, you do it explicitly using hg strip (a part of the Mercurial Queues extension).

For what it’s worth, I don’t like the way Mercurial uses the word “branch” here, since it doesn’t accurately reflect what you expect the word “branch” to mean: a single code line where every node in the DAG has exactly one parent and exactly one child. It seems to me that it’s something of a leftover from centralised, line-based tools such as Subversion and Perforce, where every branch has to have a name because of the need to place it somewhere in the file system.

But I don’t find it a big deal. I find the best way to handle branching and merging in Mercurial is to view your branches as essentially anonymous. Branch names, tags and bookmarks then become purely a documentation layer added on top of the DAG. I personally view branch names in particular as largely vestigial and almost never use them — I always commit exclusively to default, and generally recommend others to do the same unless they have a valid use case for them. If you need to keep track of which head is which, the bookmarks extension provides similar functionality to Git branches, and is far less confusing.

Incidentally, one DVCS that does seem to require you to clone your repository in order to create a new branch is Bazaar. I’ve spent a few hours tinkering with Bazaar on and off over the past few months and I haven’t yet been able to find a way to branch in-place similar to hg update/edit/hg commit or git branch. Perhaps someone could enlighten me?

16
Feb

Google is not your doctor

Now if you start getting symptoms that make your heart miss a beat, it’s tempting in these days of instant information to turn to Google. Or Wikipedia.

Don’t do it!

I’ve learned this the hard way the past couple of months. Some of the symptoms I’ve been getting during and since my recent particularly nasty bout of flu have made me wonder whether there was something serious going on.

So I went to Google, typed in my symptoms, only to be led to various articles that told me I could drop dead at any moment. I turned to Wikipedia, and the information it presented from Reliable Sources told me exactly the same.

Then yesterday I saw my doctor and he told me that this is unlikely. In fact, he was quite firm that he doesn’t think I have any significant, immediate, life-threatening concerns. He wants to do some tests, but he put me at ease about everything. He’s good at that.

If you’re worried about your health, avoid Google. Avoid Wikipedia. Go to your doctor.

Your doctor knows your medical history. He knows what you actually have. Google doesn’t. Google only knows what you think you have, and even then, it conflates that with things that your loved ones have, things you think your loved ones have, things you think your favourite celebrities have, and things that the characters in the novel that you’re trying to write think they have. Google will lead you onto discussion forums frequented by (a) people who actually do have significant, immediate, life-threatening concerns, (b) people who haven’t a clue what they’re talking about, (c) conspiracy theorists, and (d) spammers.

Google will present you with scare stories about misdiagnoses that make you mistrust your doctor. While these are tragic, they are very much the exception rather than the rule, and they only make the headlines because they are unusual.  Google will lead you to articles presenting worst-case scenarios alongside everyday ailments that can be treated with over-the-counter remedies. Google will present all this to you in a blunt, deadpan, just-the-facts-ma’am manner. You’ll end up diagnosing yourself with mutually contradictory disorders. And then, to add insult to injury, for the next three weeks, every website you visit will carry adverts for quack remedies for syndromes that kids you don’t have don’t have.

Your doctor, on the other hand, will filter out all the irrelevant stuff for you. He won’t tell you about incurable life-threatening diseases when all you have is Team Foundation Syndrome, which can be easily treated by switching to Mercurial. And because you are in a face-to-face interaction with someone who is trained to help worried people, and you have the non-verbal communication element, even if the news is bad or esoteric, it will be easier to handle.

If you want an online resource that can give you advice on what to do, go to a site such as the NHS symptom checker. It will ask you a series of questions about your symptoms, and based on your responses, it will tell you whether to call an ambulance, go to the chemist for an over-the-counter remedy, or see your doctor.

14
Feb

Team Foundation Server is the Lotus Notes of version control tools

tl;dr: Advocates of Team Foundation Server, Microsoft’s ALM suite, respond to criticism by saying that TFS is not just source control but an end-to-end integrated ALM suite. This completely misses the point of our criticism of TFS in the first place: that we find it restrictive, bureaucratic, unreliable, and extremely difficult to use. End to end integration does not justify unreliability or a poor user experience.

About a year ago, Martin Fowler conducted a survey of ThoughtWorks developers to find out what they thought about various source control tools. Not surprisingly, Git came out top. The one that came out bottom? Team Foundation Server. In fact, TFS was unique in getting no positive responses at all: out of 54 respondents who had used it, every single one of them rated it as either “problematic” or “dangerous.”

Team Foundation Server advocates claim it’s unfair to compare TFS to other source control tools, since it’s not just source control, but an integrated end-to-end application lifecycle management solution. Comparing TFS to, say, Subversion, is like comparing Microsoft Office to Notepad, so they say.

Now where have I heard something like that before? Oh yes, Lotus Notes:

The main focus for frustration is Notes’s odd way with email, and its unintuitive interface. But to complain about that is to miss the point, says Ben Rose, founder and leader of the UK Notes User Group (www.lnug.org.uk). He’s a Notes administrator, for “a large automotive group”.

“It’s regarded by many as an email program, but it’s actually groupware,” Rose explains. “It does do email, and calendaring, but can host discussion forums, and the collaboration can extend to long-distance reporting. It will integrate at the back end with huge systems. It’s extremely powerful.”

The thing is, it wasn’t the detractors who were missing the point. It was the Lotus Notes guys. You see, e-mail is right at the heart of any groupware application. It’s the part of the application that users interact with the most. It’s where usability matters the most. And it’s what Notes got wrong the most.

It’s exactly the same with ALM tools. Source control is the part of your ALM tool that is most visible to developers. It’s source control rather than, say, work item tracking or continuous integration, that can make or break your workflow. It is source control where a zero-friction experience is most important.

Team Foundation Server is not zero-friction. Not by a long shot.

I guess if you have only ever used TFS, Visual SourceSafe, and perhaps exclusively trunk-based development in merge-paranoid Subversion teams that use if statements and configuration settings to avoid branching, you would be happy enough with it, since that’s all that you know source control to be capable of. But once you’ve actually used one of the alternatives that offers you fluent, unrestricted branching and merging, a local sandbox, flexible workflows, self-consistent best practices, and source control as an extension of your undo button, the limitations of TFS become so massive that it’s not even funny any more. (Incidentally, if you tot up the figures in Fowler’s survey, you’ll find that his respondents had, on average, experience with six different tools.)

But even if you’ve never used a DVCS and are only comparing it to Subversion, it’s still a usability disaster. Subversion may have pitfalls and gotchas and limitations of its own, but once you know your way around it, you can at least work as fluently with it as is possible with a primarily trunk-based, centralised tool. In TFS, even the simplest tasks become Herculean undertakings. How do you back out a changeset that isn’t the latest, for instance? Why can’t I have a check-in screen that shows only the files that have actually changed since my last commit? Why does it take me half a dozen mouse clicks for each file in my check-in screen to find out that it doesn’t have any changes? Why is it asking me to check in files that don’t have any changes in the first place? Why does it turn Visual Studio into a Berlin Wall around my code with these awful read-only files? Why does it lobotomise the branching and merging experience with baseless merges, making feature branches — pretty much a must-have for a pain-free ALM experience these days — impractical for all but the largest tasks? Why can’t it cache my login credentials to a server on a different domain like Subversion does? Why does the command line interface bring up dialog boxes? Is it a command line interface or isn’t it? And that’s barely scratching the surface of its usability problems. It doesn’t even have a search tool to speak of.

Furthermore, the source control component is the one part of TFS that you can’t swap out for something else. You can use TFS source control with Trac, Mantis, FogBugz or Jira, or with TeamCity or FinalBuilder, but you can’t use TFS work items or TFS build servers with Subversion, or Git, or Mercurial. As far as TFS is concerned, source control is their way or the highway.

End-to-end integration is all very well, but it is hardly a killer feature, and when the most visible component that it integrates is difficult to use and gets in the way, it ceases to be an asset and it becomes a liability. It’s far better to have a selection of separate tools, each of which is designed to do its job well, than a single monolithic application that does everything badly.

07
Feb

How often should you check in code?

There’s a lot of confusion among developers about how often to check in code to source control. Many projects have histories riddled with huge commits making sweeping changes to dozens of files, often with only a vague commit summary or even no commit summary at all. Those projects that have guidelines and policies in place usually don’t have a clear justification for those policies, and some of them are downright unhelpful, such as, “at least once a day,” or “whenever you come to a natural break in your workflow, such as lunchtime.”

The problem is that if you’re all doing everything in a single branch, typically trunk, it is not possible to come up with a straight answer to the question.

Should you check in early, check in often, as Jeff Atwood once described as the golden rule of source control? This ensures that you never lose much code, you keep up to date with everyone else, and you don’t go dark. However, if you’re all working on different tasks on the same branch, you will end up with two sets of unrelated revisions tangled up together in your history, and if one needs to go live, like, yesterday, and the other has had to be put on hold for any reason, as happened to us at the end of our last sprint, you’ll run into difficulties.

Alternatively, you could check in only completed units of work. However, this causes other problems. Deferring check-in until a unit of work is complete often results in huge, monolithic commits that increase the risk of integration conflicts. Furthermore, if you get into a mess attempting to resolve said integration conflicts, there is no way to back out to where you were before you ran svn update. I’ve had colleagues in this situation end up with no option but to roll back to the latest revision in source control, losing days of work that only existed in their working copy in the process.

Furthermore, large, monolithic commits are impossible to describe comprehensively and accurately in a commit summary, and they cause problems when carrying out a binary search of your history for the revision that introduced a bug.

Of course, you should be dividing your work up into smaller units as much as possible anyway to minimise the risk of this happening, but this isn’t always possible. Whichever of the two options you choose, you’re going to run into problems sooner or later.

Having a separate branch for each feature resolves this dilemma neatly. This sounds scary at first if you aren’t used to branching and merging, but providing your tooling supports it, it isn’t as bad as it sounds, since feature branches are usually fairly short, so you don’t get as many Big Scary Merges as you would expect. Besides, even when you do get a Big Scary Merge, it’s better than an otherwise identical Big Scary Commit, because if your attempts to resolve the conflicts go wrong, you can at least roll back to what you had before you attempted the merge and try again.

With that in mind, we can come up with some more sensible guidelines on how often to commit to source control.

1. Every commit should serve one, and only one, purpose.

This is a straightforward corollary to the Single Responsibility Principle. If you have to use the word “and” or “also” in your commit summary, you’re probably checking in too much.

If you have two unrelated changes in your working copy, you need to break them up. This is called the “tangled working copy problem,” and modern SCMs give you tools to sort it out. If you’re using Subversion or TFS, on the other hand, well, you should have been more careful. Unfortunately, it can be pretty hard, or in some cases even impossible, to avoid.

Needless to say, you should never check in code to two separate branches, let alone to two separate products, in a single commit, even if your source control allows you to do so.

2. Every commit should be small enough to be described in detail in the summary.

Your commit message won’t necessarily cover every last line of code in your change. If you’ve added a whole bunch of stuff, as long as it’s reasonably self-explanatory and isn’t riddled with meaningless method names such as doIt(), a single line commit message may suffice. But the combination of your code and your commit message should explain every line that has changed. And if you’ve removed or edited existing code, that will all need explaining in your commit summary too, particularly if it’s counterintuitive or at first sight could be mistaken for a bad practice, such as changing an encoding from UTF-8 to 7-bit ASCII.

If your commit is too large to make this practical, your commit is too large, period.

3. Every commit should build and (usually) pass all your unit tests.

Some DVCS users may disagree with me on this one, insisting that you can use your local history as a sandbox for your commits, so it doesn’t matter, but I stand by it. Broken builds have to be marked as untestable by your bisect tool, which complicates pinpointing the change that introduced the bug. A string of broken builds in succession makes matters worse. Besides, both Git and Mercurial provide mechanisms to allow you to resolve this situation by combining breaking changesets with ones that fix them — namely, interactive rebase (or git commit --amend) and Mercurial Queues respectively.

The only exception to the rule that every commit should pass your unit tests is when you are working in a test-driven manner, where you write a failing test then write code to make it pass. Here, you may want to consider checking in the new test separately from the code to fulfil its requirements, in order to audit just how test-driven your development really is.

4. Use feature branches liberally, and merge to your main development branch only when the task is complete.

This guideline is a more sensible version of “check in only completed units of work.” Single-responsibility, easily describable commits are obviously fairly small and frequent (a few lines of code, representing less than an hour’s work), and usually do not represent a completed unit of work.

That’s why feature branches are so important if you are to observe best practices with source control. In this case, “check in only completed units of work” becomes “integrate only completed units of work,” and the conflict between the two different best practices is thereby resolved. When you merge, always say what you are merging, with an issue number in your bug tracker where appropriate. Don’t just write “Merge.”

In an ideal world, every feature should be developed on a separate branch. With a modern DVCS, this is of course the default, and very easy. With centralised source control, however, it can take considerably more effort depending on your tool and your project setup, but it is by no means impossible. In cases such as these, you may need to make some compromises, and decide on a threshold above which to create a feature branch. But in general, it’s best to keep this threshold as low as you can get away with, or possibly even lower it gradually as you and your team-mates become more confident with branching and merging. Certainly, if you’re doing exclusively trunk-based development, you’re denying yourself a straight answer to the question of how often to check in code, and asking for problems sooner or later. Whatever SCM tool you are using, if you don’t know how to branch and merge with it, you should learn how to do so.