james mckay dot net

because there are few things that are less logical than business logic
30
Sep

Solving the tangled working copy problem with hunk selection and Mercurial Queues

Programming is full of dilemmas.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, so you pop in the fix and carry on with your new functionality.

A few minutes later, you notice that another method is pulling in an RSS feed from a hard coded, and outdated, source, so you stop to extract it to a configuration setting and use the more up to date feed.

You finish fixing up your new functionality, then you come to check in your code. Now, you have a problem. You have three separate changes tangled up in your working copy.

Most developers would simply bundle all three changes into a single commit, possibly only leaving a commit summary (you do fill in your commit summaries, don’t you?) saying “Added some new payment options to application.” This is misleading, because it doesn’t say anything about the race condition or the RSS fix.

You could say “Added some new payment options to application, fixed a race condition and used a more up to date feed.” But this doesn’t make it all that clear which part of your commit fixes which problem. Someone looking through your history six months later might see your race condition fix has introduced a regression, not realise that it is there to fix a race condition, and revert it to what it was before.

You really need to observe the Single Responsibility Principle, and split the three tasks into separate commits.

So, what do you do?

With traditional source control tools, you are likely to be told, “You should have shelved your changes, reverted your working copy, and performed these tasks as separate commits. Or, if your source control tool doesn’t support shelving, you should save a patch, then revert your working copy, then make the new change, then re-apply the patch.”

There’s just one problem with this bit of advice. It is inefficient, and a total mismatch to the way your average programmer’s brain works.

To see why, let’s rewind your last half hour of coding and start again.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, but you need to keep these changes separate.

So you shelve your changes, revert your working copy, getting prompted to save/reload/merge your files in the process, and then Visual Studio insists on reloading your entire solution because you had changed something in the .sln file. And since your solution contains more than three projects and they reference more than two assemblies that aren’t in the GAC, it takes forever to reload and you’ve got distracted onto something else while you’re waiting.

By the time you manage to start editing your project again, you’ve been completely knocked out of the zone, and you’ve forgotten why you shelved your changes in the first place.

You see? All the so-called best practice advice about shelving, reverting your working copy, and all that, overlooks one very important fact about programming, namely that it is a mentally intensive discipline that often requires you to juggle several complex details in your mind at once, and even small diversions, such as having to save files and wade through menus to find your shelving tool then think of a name for your shelve set, can have a detrimental effect on your workflow. It adds to the mental burden on you and makes your job more difficult. It’s not a best practice at all, but a workaround for the fact that you don’t have the right tools for the job.

Wouldn’t it be better to just to get the changes down as you notice them and then use a tool that lets you sort out your commits later, going through all the changes you’re checking in, cherry-picking them into a series of separate patches?

Git users wax lyrical about the index, or staging area, because it is designed to solve just this problem. It provides an intermediate store between your working copy and your history, where you can stage your changes, not just one file at a time, but one hunk at a time, using the command git add -p. Once you’ve staged your changes in this way, you can then commit them as a separate, logical change set.

Mercurial has a similar feature in TortoiseHg called “hunk selection.” By double-clicking on a change in the “Hunk selection” tab on the commit dialog, you can include or exclude it from the check-in. If you’re a command line freak, the record extension does something similar, and the crecord extension allows you to take it down to the line-by-line level.

image

You can click on “Commit preview” once you’re done to see what’s going to go in your commit.

There’s just one problem with all this though. As Eric Sink points out, you’re checking in a version of your code which you’ve never tested. This is a bad practice, and it can bite you if you ever need to run git/hg bisect to track down a regression.

So let’s sum up what your options are so far.

  • Check in everything in a single commit at once. This is bad practice.
  • Use git add -p or hg record/TortoiseHg’s hunk selection to separate out your changes into separate commits. This is also bad practice.
  • Use shelving and patches to separate out your changes. This is a hack, which slows you down and risks knocking you out of the zone and making you lose track of your changes altogether.

So is there anything we can do to fix this? As a matter of fact, it turns out that there is.

One of my favourite features of Mercurial is the mq (Mercurial Queues) extension. This may sound a little esoteric, but what it does is quite simple. You can put a whole series of commits into a separate staging area, where you can edit them, reorder them, apply them, unapply them, chop and change them, split them up or combine them together, and of course, most importantly, run your unit tests on them, to your heart’s content before applying them to your master repository.

Let’s just say I am working on some changes to my Comment Timeout WordPress plugin. I’ve done two different things: updated the version number to 2.1.2, and tidied up some code formatting. I want to separate these into two different commits. First of all, I select the hunks that I want to go into the first commit, and then I type a name for the patch into the “QNew” box (keep this short, a couple of words should do):

image

You’ll note that the “Commit” button changes to “QNew” to indicate that your next commit goes into the patch queue. Clicking this will automatically show you the patch queue and change the button to “QRefresh”:

image

You can change the message, or edit the files, or select and unselect hunks to your heart’s content, then click QRefresh. Then you can add a second commit by typing another name into the QNew box:

image

Clicking the “QNew” button creates a second patch:

image

Okay, so now we have a whole series of patches. It’s a bit like the Git index, except that rather than having just one staging area, you have several, all stacked one top of each other. In the Repository Explorer, these revisions appear as a regular part of your DAG:

image

The yellow label “qparent” indicates the parent revision on top of which the patch queue is being applied; “qbase” indicates the first patch in the queue; “qtip” indicates the last; and the blue labels give the names of the patches. You could push them to another repository if you wanted, but I don’t recommend this. Keep them on your own machine for the time being.

Now that we’ve separated out our commits into a series of patches, we can get on with the job of placating the people who are worried about best practices. Namely: testing each patch before applying it.

First, double click on “[qparent]”:

image

You’ll note that our two patches have both dropped below the line, and they’re now greyed out. If you take a look at the repository explorer, you’ll see that there’s no sign of them:

image

The last revision has been marked in bold to indicate that that’s the one where your working copy is at.

If you double click on “tidy up” it will move above the line and turn blue again, to indicate that your working copy has been updated to this version:

image

That patch is now where your working copy is at. Do whatever testing you want to do on it, then click on the next one to apply it:

image

Once you’re satisfied that all your patches are ready, right-click on any of them and choose “Finish applied”:

image

Hey presto! Your work is now all committed to your repository, ready to be pushed, pulled or otherwise shared with the big wide world.

image

There are other things you can do with patches in your queue which I haven’t covered here, such as reordering them, or combining two or more of them into one.

Patch queues and hunk selection are two extremely powerful features of Mercurial. While they require a little bit of care and attention in order to adhere to best practices, this is no more arduous than the discipline needed for any source control tool, and they can provide a significant productivity boost, simply because they let your tools work around you rather than forcing you to work around your tools.

21
Jun

TortoiseHg as a github client on Windows

(Update: I’ve updated these instructions for Mercurial 1.6/TortoiseHg 1.1.)

I’m going to get really controversial here and say that I think Mercurial is better than git. My reasoning (as with the reasoning of everyone else who takes sides in this particular debate) is entirely subjective, so we won’t belabour the point here too much. Nevertheless, some of us do have a preference for one over the other, and many Subversion refugees like me who do most of their work in Windows tend to lean towards Mercurial.

But there’s no denying that github is fast becoming the Facebook of open source programming (albeit hopefully without the unethical bits, Farmville, and people tagging you in embarrassing photos for all and sundry to see), and if you want to strut your stuff as a developer, that’s the place to do it. Github is, of course, a hosting facility for git repositories, as one would expect of a site whose name says what it means and means what it says.

Fortunately, it is quite possible to use Mercurial as a client against github repositories via the hg-git extension, and you can pull and push from one to the other pretty much losslessly.

However, setting it all up on Windows is not entirely straightforward, and there doesn’t seem to be a decent guide to it anywhere on the Internet: most of the instructions that you read assume that you’re using either (a) Linux or a Mac, (b) the command line, or (c) both. You also have to figure it out from various places all over the web, and searches on Google and Stack Overflow proved to be surprisingly fruitless. Furthermore, the most comprehensive howto that I came across elsewhere contained several instructions that were just plain wrong.

So, after spending two solid evenings struggling against a myriad of error messages and cryptic dialog boxes, I finally managed to get it working, and for future reference (and anyone else who wants to know how), I’ve documented what I’ve found actually works for me as best I can.

1. Install TortoiseHg and hg-git.

Install TortoiseHg 1.1 or later. If you are using an earlier version, upgrade: these instructions may work if you don’t, but I can’t make any guarantees.

I downloaded hg-git by cloning the repository. You can get it from either github and Bitbucket. The advantage of cloning the repository is that you can upgrade to the latest version quickly and easily by hg pull then hg update, or use the graphical tools if you prefer. You can also easily switch between the bleeding edge version of the code and a stable release if you like.

hg clone http://bitbucket.org/durin42/hg-git c:\abc\mercurial\hg-git

I downloaded hg-git into the directory c:\abc\mercurial\hg-git. If you put it elsewhere in your filespace, alter these instructions to suit.

2. Update to the appropriate version of hg-git.

If you are using TortoiseHg 1.1, you will need to use hg-git 0.2.3. If you ignored my advice to upgrade, and are still using version 1.0, you will need to use hg-git 0.2.1. Don’t use version 0.2.2: it doesn’t work with either version of TortoiseHg.

The official hg-git documentation tells us that we also need to download and install Dulwich 0.4.0 or later. The latest version of hg-git requires Dulwich 0.6.0. In any case, Dulwich is included with TortoiseHg (version 0.6.0 with TortoiseHg 1.1; version 0.5.0 with TortoiseHg 1.0) so you don’t need to do anything else there. Open up the TortoiseHg repository explorer on your clone of hg-git, choose the “Tagged” radio button to show only tagged releases, and update to version 0.2.3:

image

3. Configure Mercurial to use hg-git and an appropriate SSH client.

To do this, you need to edit your mercurial.ini file. You can get to this simply by choosing “Global Settings” on the TortoiseHg context menu in Windows Explorer, and clicking “Edit file” to bring it up in Notepad. Add the following lines to your configuration file:

[extensions]
hggit = C:\abc\mercurial\hg-git\hggit

[ui]
ssh = "C:\Program Files\TortoiseHg\TortoisePlink.exe"

The [extensions] section loads hg-git into Mercurial; the ssh option in the [ui] section specifies an SSH command line client to use to communicate with github. TortoiseHg gives us TortoisePlink, which works fine for me.

4. Create a public key/private key pair.

There are some instructions on github on how to create a public key/private key pair. Unfortunately, these don’t tell you that key pairs come in two formats: OpenSSH (as used by git itself and github), and PuTTY (as used by Tortoise Everything).

A simpler approach is to download PuTTY (you can get it from here) and use PuTTYgen to generate your key pair:

PuTTYgen screenshot

Once you have generated your SSH key, copy and paste the “Public key for pasting into OpenSSH authorized_keys file” into github. Save your public key and private key to your hard disk somewhere.

5. Start Pageant

Pageant is a program that stores all your private keys in memory, where the SSH client used by Mercurial, that we configured above, can find them. It comes with both PuTTY and TortoiseHg. You can set it to load in your private key(s) when you log on to Windows by creating a new shortcut in the Startup folder of your Start menu with this command:

"C:\Program Files\TortoiseHg\Pageant.exe" "c:\abc\github.ppk"

Note that if you don’t start Pageant first and load in your private key, you will not be able to push to github.

6. Clone a repository and start pushing!

You should make sure that you get the format of your repository URL correct. It should be:

git+ssh://git@github.com/your-github-username/your-repo-name.git

The rest from there on is all plain sailing. All being well, you should now be able to pull from your github repository and push changes back up as if it were a Mercurial repository.

Things to check if it goes wrong.

Now all this is a bit of a fiddly process, there is plenty of room for error, and some of the error messages you are likely to get can be a little bit cryptic. However, most of it was due to me trying things that weren’t properly documented, and they all boiled down to a few things that you can check if you’ve followed the above instructions:

  • Are you using the correct version of hg-git? While you can use versions later than 0.2.1, you need to use a later version of Dulwich than that which comes with TortoiseHg 1.0.
  • The “ssh” option in your mercurial.ini file should only specify the name of the executable, without command line options. Some articles tell you that you can fill in the path to your private key in this option. Personally, I couldn’t get this to work, so I just stuck with Pageant.
  • Is Pageant running?
  • Is your private key loaded into Pageant?
  • Do your public and private keys match?
  • Is your private key saved in PuTTY format? If you generated your key pair using git, as per the instructions on github, it will be saved in OpenSSH format instead, and Pageant can’t handle that.1
  • Have you specified the URL to your github repository correctly? The version I gave above works, while missing out various parts of the URL (e.g. using “github.com” instead of “git@github.com“) doesn’t.
1 You can tell the difference between a PuTTY private key and an OpenSSH private key by opening them in Notepad. An OpenSSH private key will start off looking like this:

-----BEGIN RSA PRIVATE KEY-----
<transmission line noise>
-----END RSA PRIVATE KEY-----

whereas a PuTTY private key will look like this:

PuTTY-User-Key-File-2: ssh-rsa
Encryption: none
Comment: imported-openssh-key
Public-Lines: 6
<transmission line noise>
Private-Lines: 14
<transmission line noise>
Private-MAC:
<transmission line noise>
08
Mar

Command line instructions are not a good marketing strategy

Dear fellow Mercurial fans,

Please stop using the command line when you’re writing articles telling us how wonderful Mercurial is.

I don’t need to be convinced that it is superior to Subversion. I’ve been using it for about nine months alongside our central Subversion repository at work, as well as for my private projects at home, and there’s no doubt in my mind which is better by a long shot. Easy branching and merging, and local versioning for experimental development and refactoring, are killer features as far as I’m concerned. And ease of use is supposed to be its big selling point over git.

But other developers do need convincing, and if you’re apparently fanboying the command line, it doesn’t help. In fact, it’s downright embarrassing. Remember, you may be a Linux geek who writes code for fun at weekends, but most of them are nine to five Windows developers who switch out of code mode the minute they leave the office and don’t want to have to learn anything new unless it’s strictly necessary. To them, it looks elitist, arrogant, off-putting, and Luddite.

When I first heard about Mercurial and git about two years ago, neither of them had any form of graphical user interface to speak of. It was a case of hg this, hg that, git this, git that in a command shell versus TortoiseSVN’s repo-browser, show log and commit dialogs. You know, like, where you can actually see what you’re doing? Where you can frequently figure out what you need to do by experimentation and educated guesses rather than having to wade through a morass of man pages? Forget it, I thought. Come back to me in a year or two’s time when you have a decent graphical front end for it. In the meantime, I’m sticking with TortoiseSVN.

Heck, I’m the kind of developer who likes to try out new things. I like Linq, and MVC, and jQuery, and Python, and IOC containers, and Colemak keyboards. I know Linux and I’m not afraid to use it. If I was put off by the impression that Mercurial was command-line only, what hope do you have of convincing the rank and file Windows developers who are scared of the command prompt?

Nowadays, of course, we have TortoiseHg, which gives it a decent, powerful and intuitive front end. In fact it was TortoiseHg that sold me on Mercurial in the first place, because it lets you see exactly what you’re doing when you’re branching and merging, as well as flattening out the learning curve dramatically. Just take a look at its repository explorer, for instance:

image

See? You even get a nice little graph showing you exactly where all your branches are. Context menus make it easy to figure out what to do next and actually do it. Oh, and it shows you the most recent changes first, rather than just vomiting everything out onto the screen and leaving you staring at changeset zero, like you get when you run hg log:

image

To a seasoned developer, there are advantages to the command prompt. It’s easier to type into your blog, easier to copy and paste, and easier to script. But there is a time and a place for everything, and introductory tutorials for tools with perfectly good graphical front ends are not the time and place for a command prompt. Doing a screen capture, firing up Paint.net and cropping your image to the right size may be more of a faff, but in an introductory tutorial, merely typing hg push instead is either outright elitism or sheer laziness. Please, cut it out. Use TortoiseHg to introduce Mercurial, and keep the command line for more advanced tasks.

01
Jun

Why would anyone not use source control?

There’s a question over on Stack Overflow that asks if there are any good reasons for not using source control. It’s a question I’ve been racking my brains over for a while now, especially since you do occasionally encounter people who claim they have good reasons not to. The most common such reason that I come across is that they’re a lone developer — an excuse that simply shows that they haven’t a clue what source control actually is.

One person pointed out that physicists are particularly unlikely to use source control:

For the casual programmers – those to whom programming is just a tool, such as many of the people I work with (scientists) – much of the work is hackish and small scale, there may be a dozen other things that are more likely to fail outside the code which could also be eliminated with better practices.

As a colleague put it, “we don’t get published for writing beautiful code”.

Interesting point that. Most programs written by physicists tend to be no more than a few hundred lines long, or even just a Microsoft Excel spreadsheet, and once they’re debugged and working, they usually don’t change. This is of course the exact opposite of business and web programming, where requirements change faster than you can keep up with them. However, you can’t really generalise here. I’d be very surprised, for instance, if NASA doesn’t use some from of source control for the Mars rovers.

Another person gave an answer that was especially worth commenting on:

“For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is” –Torvalds

If you’ve got quick/easy/automatic backups, you’ve already got 95% of what most of us use VC for. Somebody with a local DVCS repository on his HD but no backups is actually in much worse shape.

Using a VCS does have a real cost, and it’s usually a small one but not always. Every VCS I’ve ever used, I’ve had days where I had to fight with it for hours just to get it to do something that should have been simple.

To those that think “There are no good reasons not to use version control”, where does it end? Must every project have 100% unit test code coverage? Must every project have code reviews? Coding standards? A complete functional spec?

There’s a whole spectrum of programming projects in the world. Not everybody is writing code for the space shuttle. Sometimes being able to diff my code from 11:00am and 11:30am is simply not that important.

Some are merely managing globally-distributed teams of thousands writing operating system kernels.

This is another interesting point — if the Linux kernel managed fine without source control for ten years, why should we use it? In actual fact, the commenter is not entirely correct: the Linux kernel has been under source control since 2002 and Linus Torvalds even wrote his own source control system because he was dissatisfied with all the others that were available at the time. But this is an indictment of CVS in particular, not of source control in general — at the time the choice that you had was between that and something costing an arm and a leg.

This highlights another fairly common reason why people shy away from source control: they perceive it as being more trouble than it’s worth. In recent years, most developers’ first experience of source control has been Subversion. Once you get used to it, Subversion is pretty powerful and works very well, but unfortunately it is not a good example to throw at beginners when telling them they need to use source control. Getting your project under source control in the first place with it is a faff, and I’ve lost count of the number of times that it’s gotten so confused with itself that I’ve had to do a fresh checkout just to get it working properly again. And all those extraneous .svn directories that pollute your project’s filespace can be a major irritation at times.

So what is the best option to convince the naysayers? In a word: Mercurial.

Recently I’ve been playing with some of the new distributed source control systems such as Git and Mercurial, and I get the impression that they are much better suited to new and casual developers than Subversion. They’re a lot easier to use for starters — in combination with visual front ends such as TortoiseHg, you can get your entire project under source control with only three or four mouse clicks. They also have fewer pitfalls and gotchas — you can rename and delete files and directories much more easily without creating a whole lot of confusion, for instance.

Another big advantage of modern distributed source control systems such as Mercurial is that they scale down as well as up. Mercurial creates a single .hg directory in your project’s root which acts as a complete repository in and of itself. For a lone developer this is probably all you need, in tandem with a decent backup strategy, and it even makes it entirely reasonable to get your throwaway scripts under source control. After all, throwaway scripts have a rather nasty habit of not being as throwaway as we first thought they would be.

For development teams, you can have a central repository in addition to the developers’ personal ones, and push the changes to the central server once you’re done. For really big projects, you can have a whole hierarchy of source control servers, with changes being pushed up to the next level once they have passed quality control and whatever other processes you may have in place.

There may have been reasonable excuses for not using source control five years ago on small, trivial projects. But with the latest generation of tools, these excuses are getting flimsier and flimsier every day. Even for physicists.

17
Mar

Making the most of your source control summaries

The summary for Django’s changeset number 9756 caught my attention recently:

Fixed #8138 — Changed django.test.TestCase to rollback tests (when the database supports it) instead of flushing and reloading the database. This can substantially reduce the time it takes to run large test suites.

This change may be slightly backwards incompatible, if existing tests need to test transactional behavior, or if they rely on invalid assumptions or a specific test case ordering. For the first case, django.test.TransactionTestCase should be used. TransactionTestCase is also a quick fix to get around test case errors revealed by the new rollback approach, but a better long-term fix is to correct the test case. See the testing doc for full details.

Many thanks to:

* Marc Remolt for the initial proposal and implementation.
* Luke Plant for initial testing and improving the implementation.
* Ramiro Morales for feedback and help with tracking down a mysterious PostgreSQL issue.
* Eric Holscher for feedback regarding the effect of the change on the Ellington testsuite.
* Russell Keith-Magee for guidance and feedback from beginning to end.

Before I came across this particular essay (via Eric Holscher and Simon Willison), like most developers, I only ever put a short one-sentence summary in my source control commits. It had never occurred to me to do anything else — in fact when Trac lists the changes in your repository, it only displays the first line anyway. Some developers don’t even do that, instead leaving the summary blank altogether.

But hey, I thought — why not? Why shouldn’t we write a whole two paragraphs to add a bit more detail to our descriptions of our source control commits? After all, something short and simple may be OK when you’re the only developer working on a project, but when you’re working as part of a team, you need to keep the communication up.

You see, suppose that one day you are testing some code that you have been working on and you discover that it is barfing. So you scratch your head for a bit, step through it in the debugger, and so on, and you narrow it down to one particular line of code. You look at the revision logs and you find that I was the one who checked in the offending code. You’ll probably be upset with me, but if I have given a decent explanation of what I’ve done when I check in my changes to source control, it will make the situation a lot easier for all concerned.

So don’t just gloss over your source control summaries: make the most of them. You’ll be doing all your colleagues a massive favour if you do.

14
Apr

If you think you don't need source control, you haven't understood it

I have a friend who does not use source control for his programming projects. As far as I can tell, it’s a conscious and deliberate decision on his part, and although he has his reasons, I’ve never got round to asking him what they are. However, I doubt if they are very good ones.

Source control is the very bedrock of modern software development, yet it’s surprising how many developers there are like him, who still don’t see the value of it. One common argument that crops up from time to time is “I don’t use source control on any of my projects because I’m the only person working on them.” This is really a rather lame excuse, because no-one making such an argument would say “I don’t back up my projects because I’m the only person working on them,” nor would they say “I don’t use the undo button because I’m the only person working on my projects.”

For that is exactly what source control does. It provides you with a complete history of the changes you’ve made to your project, when you made them, and (assuming you’ve filled in the comments with each commit), why. If you have ever spent three days working on your code, only to find that your changes aren’t working out or are becoming very messy, and you’ve needed to roll back, only you don’t have a suitable snapshot to roll back to, you will know exactly what I mean.

Sadly, there is a fairly widespread misconcepton knocking around that source control is only useful for large development teams. This isn’t helped by the fact that companies such as Microsoft promote it as such. Visual Studio only includes source control with the (more expensive) Visual Studio Team System, whose very name says to solo developers, “Nothing to see here, move along please.” And many articles on source control (even including Joel’s comments on the subject) concentrate more on the team aspects of source control than on what it can offer for solo developers.

Another widespread misconception is that setting up a source control system is too much effort, or too expensive, or requires a separate server, and is overkill for small projects. Again, this is completely false. Subversion, probably the most popular source control package in the world, is free and open source. TortoiseSVN is a Windows client for it that installs as a shell extension and gives you a whole lot of easy to use source control features from within Windows Explorer. You can even create a repository in any empty folder on your hard disk with only a couple of mouse clicks:

image

I’ve been using source control with Subversion/TortoiseSVN for three years now, and I am embarrassed that I didn’t get started much earlier. To be sure, there is a bit of a learning curve, and there are a few gotchas that you need to watch out for, but really, it isn’t rocket science by a long shot, and it certainly isn’t overkill even for fairly minor projects and scripts. If you are writing any kind of software and aren’t already using source control, I strongly recommend you get started. And if you think you don’t need it, I recommend you take another look at it, because you’ve almost certainly misunderstood something.