james mckay dot net

because there are few things that are less logical than business logic
22
Feb

On named branches in Mercurial

There seems to be a common misconception among some Git users that in order to branch your code in Mercurial, you have to clone your repository. While some Mercurial users prefer to work that way, it isn’t actually necessary, and Mercurial does provide you with a much more lightweight alternative. The easiest way to branch your code is simply to hg update to the revision off which you wish to branch, then when you next hg commit, it will implicitly create a new branch for you. Similarly, when you hg merge, it will implicitly close the branch off. I tend to use a mixture of the two approaches, with repository clones for longer-running feature branches, and in-place branching for ad-hoc experimentation, smaller features, and the like.

A lot of confusion seems to centre round the concept of named branches though. If you’re used to the way Git works, you’d be forgiven for thinking that pulling from a remote repository would replace your “foo” branch with the incoming one, sending your work off to be garbage collected unless you merge immediately after pulling. Mercurial doesn’t actually work that way — what you get is two parallel branches, both called “foo”, which you can then merge, rebase or strip out as appropriate. This is because Mercurial tends to view the DAG as more immutable than Git does, and if you want to remove branches that are no longer needed, you do it explicitly using hg strip (a part of the Mercurial Queues extension).

For what it’s worth, I don’t like the way Mercurial uses the word “branch” here, since it doesn’t accurately reflect what you expect the word “branch” to mean: a single code line where every node in the DAG has exactly one parent and exactly one child. It seems to me that it’s something of a leftover from centralised, line-based tools such as Subversion and Perforce, where every branch has to have a name because of the need to place it somewhere in the file system.

But I don’t find it a big deal. I find the best way to handle branching and merging in Mercurial is to view your branches as essentially anonymous. Branch names, tags and bookmarks then become purely a documentation layer added on top of the DAG. I personally view branch names in particular as largely vestigial and almost never use them — I always commit exclusively to default, and generally recommend others to do the same unless they have a valid use case for them. If you need to keep track of which head is which, the bookmarks extension provides similar functionality to Git branches, and is far less confusing.

Incidentally, one DVCS that does seem to require you to clone your repository in order to create a new branch is Bazaar. I’ve spent a few hours tinkering with Bazaar on and off over the past few months and I haven’t yet been able to find a way to branch in-place similar to hg update/edit/hg commit or git branch. Perhaps someone could enlighten me?

21
Dec

Finding bugs with a binary search of your source control history

Mercurial’s bisect command is a fantastically useful tool when you’re faced with a bug.

It’s a very simple idea. You start off with your latest revision, which you know has the bug, go back to a revision that you know didn’t have the bug, and do a binary search until you find the revision that introduced it.

So let’s say your latest revision was number 500. You’d mark that one as bad, then test, say, revision 100, find that it works as expected, and mark that as your last known good revision. Mercurial will then automatically update to revision number 300 (halfway in between) for you to test. Mark as good or bad as appropriate, lather, rinse and repeat until you find the change that introduced the bug.

With every test that you make, the difference between the “good” and “bad” revisions decreases by a half, quickly narrowing the gap:

bisect1

Consequently you will be able to pinpoint the breaking change after approximately log2 n tests, so a thousand revisions would only take one more test than 500, and a million would only take one more test than 500,000. Once you’ve found the offending change, you can very easily zoom right in on the problematic lines of code, rather than having to spend ages stepping through it all in the debugger.

You don’t need to be using Mercurial to apply this technique. You can do it manually with any version control tool, though you will need to keep a manual note of what’s what if it doesn’t provide you with the necessary tools to do it. It can also be pretty slow with centralised tools, since you have to hit the network for every test.

There are a couple of points to note with this procedure however.

First, bisect is most effective when your revisions are small and serve a single purpose. If the breaking revision changes a lot of code, and tackles too many things at once, it may be difficult to identify the source of the problem once you have located the offending change. This is why it is important to “check in early, check in often.” This is also why good, informative commit summaries are important.

Second, remember that you’re looking for the revision that introduced a specific bug. If a revision does not have this specific bug but has other problems, you should mark it as good nonetheless.

Revisions that don’t compile, or have other problems that prevent you from determining whether the bug exists in the first place, should not be marked as either “good” or “bad” but should be flagged to be skipped. In this case, your “last known good” and “first known bad” revisions won’t be updated, and the number of tests you have to make will increase, slowing down your search. Consequently it is good practice to ensure that every commit that you make to source control should build correctly and ideally also pass all your unit tests where possible. When you’re using a DVCS it can be tempting to disregard this altogether, but if hg bisect reports that your error is somewhere in a string of twenty successive revisions, none of which compiles, you’ll have more of a headache sorting out what’s what. Certainly, broken check-ins should be very much the exception rather than the rule.

04
Oct

Perforce Merge: a very nice free replacement for TortoiseMerge

No matter which source control tool you’re using, sooner or later you’ll encounter a merge conflict. When this happens, a decent graphical merge tool is a must-have.

There are two different types of merge tools. Two-way merge tools show you your version of the file and the other person’s version of the file side by side. Three-way merge tools also show you the original file in the middle. This helps clear up a lot of confusion, since you can see what the original file looked like before anyone did anything to it.

So far, I’ve been using TortoiseMerge as my merge tool of choice, since it comes with TortoiseSVN, it’s familiar, it’s reasonably usable, and it is not too ugly. The only downside is that it’s two-way, rather than three-way. TortoiseHg gives you kdiff3 by default instead, which is a three-way merge tool, but it’s an absolute eyesore and its usability leaves a lot to be desired. Up to now, I’ve always switched it out in favour of TortoiseMerge.

Recently I came across the Perforce merge tool P4Merge (hat tip: Novaleaf Game Studios) and I must say that I’m impressed. It gives a very clear, intuitive view of what’s changed, with a text editor underneath that lets you resolve the conflicts easily. The icons to the right hand side of the text editor allow you to select which version you want to cherry pick. Oh, and visually, it looks fantastic.

Perforce merge tool in action - click to view full size

P4Merge comes with the Perforce client tools which are a free download: if you’re not using Perforce itself for source control, select only the merge tool on the installation wizard and deselect everything else.

image

Once you’ve installed P4Merge, TortoiseHg will automatically detect it and list it as an option in the TortoiseHg configuration dialog or merge wizard. If you’re using Subversion or Git with their respective Tortoises, you need to specify the command line in the options dialog: Using a cool merge tool with SVN or GIT tells you how. Team Foundation Server is somewhat more complicated, but still doable: Using P4Merge with Visual Studio 2008 and TFS explains how to tackle it.

The only downside next to TortoiseMerge is that the option to cherry-pick changes only works on the block level, rather than on a line-by-line basis. However, since the resolution panel at the bottom is of course a free-form text editor, you can easily copy and paste as necessary, so this is no big deal. I think I’ll be using it as my merge tool of choice from now on.

30
Sep

Solving the tangled working copy problem with hunk selection and Mercurial Queues

Programming is full of dilemmas.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, so you pop in the fix and carry on with your new functionality.

A few minutes later, you notice that another method is pulling in an RSS feed from a hard coded, and outdated, source, so you stop to extract it to a configuration setting and use the more up to date feed.

You finish fixing up your new functionality, then you come to check in your code. Now, you have a problem. You have three separate changes tangled up in your working copy.

Most developers would simply bundle all three changes into a single commit, possibly only leaving a commit summary (you do fill in your commit summaries, don’t you?) saying “Added some new payment options to application.” This is misleading, because it doesn’t say anything about the race condition or the RSS fix.

You could say “Added some new payment options to application, fixed a race condition and used a more up to date feed.” But this doesn’t make it all that clear which part of your commit fixes which problem. Someone looking through your history six months later might see your race condition fix has introduced a regression, not realise that it is there to fix a race condition, and revert it to what it was before.

You really need to observe the Single Responsibility Principle, and split the three tasks into separate commits.

So, what do you do?

With traditional source control tools, you are likely to be told, “You should have shelved your changes, reverted your working copy, and performed these tasks as separate commits. Or, if your source control tool doesn’t support shelving, you should save a patch, then revert your working copy, then make the new change, then re-apply the patch.”

There’s just one problem with this bit of advice. It is inefficient, and a total mismatch to the way your average programmer’s brain works.

To see why, let’s rewind your last half hour of coding and start again.

You’ll be deep in concentration, working on your new application, adding some new payment options, when all of a sudden you notice a potential race condition in a nearby method that might cause customers to be billed twice. You know it’ll take all of two lines to fix, but you need to keep these changes separate.

So you shelve your changes, revert your working copy, getting prompted to save/reload/merge your files in the process, and then Visual Studio insists on reloading your entire solution because you had changed something in the .sln file. And since your solution contains more than three projects and they reference more than two assemblies that aren’t in the GAC, it takes forever to reload and you’ve got distracted onto something else while you’re waiting.

By the time you manage to start editing your project again, you’ve been completely knocked out of the zone, and you’ve forgotten why you shelved your changes in the first place.

You see? All the so-called best practice advice about shelving, reverting your working copy, and all that, overlooks one very important fact about programming, namely that it is a mentally intensive discipline that often requires you to juggle several complex details in your mind at once, and even small diversions, such as having to save files and wade through menus to find your shelving tool then think of a name for your shelve set, can have a detrimental effect on your workflow. It adds to the mental burden on you and makes your job more difficult. It’s not a best practice at all, but a workaround for the fact that you don’t have the right tools for the job.

Wouldn’t it be better to just to get the changes down as you notice them and then use a tool that lets you sort out your commits later, going through all the changes you’re checking in, cherry-picking them into a series of separate patches?

Git users wax lyrical about the index, or staging area, because it is designed to solve just this problem. It provides an intermediate store between your working copy and your history, where you can stage your changes, not just one file at a time, but one hunk at a time, using the command git add -p. Once you’ve staged your changes in this way, you can then commit them as a separate, logical change set.

Mercurial has a similar feature in TortoiseHg called “hunk selection.” By double-clicking on a change in the “Hunk selection” tab on the commit dialog, you can include or exclude it from the check-in. If you’re a command line freak, the record extension does something similar, and the crecord extension allows you to take it down to the line-by-line level.

image

You can click on “Commit preview” once you’re done to see what’s going to go in your commit.

There’s just one problem with all this though. As Eric Sink points out, you’re checking in a version of your code which you’ve never tested. This is a bad practice, and it can bite you if you ever need to run git/hg bisect to track down a regression.

So let’s sum up what your options are so far.

  • Check in everything in a single commit at once. This is bad practice.
  • Use git add -p or hg record/TortoiseHg’s hunk selection to separate out your changes into separate commits. This is also bad practice.
  • Use shelving and patches to separate out your changes. This is a hack, which slows you down and risks knocking you out of the zone and making you lose track of your changes altogether.

So is there anything we can do to fix this? As a matter of fact, it turns out that there is.

One of my favourite features of Mercurial is the mq (Mercurial Queues) extension. This may sound a little esoteric, but what it does is quite simple. You can put a whole series of commits into a separate staging area, where you can edit them, reorder them, apply them, unapply them, chop and change them, split them up or combine them together, and of course, most importantly, run your unit tests on them, to your heart’s content before applying them to your master repository.

Let’s just say I am working on some changes to my Comment Timeout WordPress plugin. I’ve done two different things: updated the version number to 2.1.2, and tidied up some code formatting. I want to separate these into two different commits. First of all, I select the hunks that I want to go into the first commit, and then I type a name for the patch into the “QNew” box (keep this short, a couple of words should do):

image

You’ll note that the “Commit” button changes to “QNew” to indicate that your next commit goes into the patch queue. Clicking this will automatically show you the patch queue and change the button to “QRefresh”:

image

You can change the message, or edit the files, or select and unselect hunks to your heart’s content, then click QRefresh. Then you can add a second commit by typing another name into the QNew box:

image

Clicking the “QNew” button creates a second patch:

image

Okay, so now we have a whole series of patches. It’s a bit like the Git index, except that rather than having just one staging area, you have several, all stacked one top of each other. In the Repository Explorer, these revisions appear as a regular part of your DAG:

image

The yellow label “qparent” indicates the parent revision on top of which the patch queue is being applied; “qbase” indicates the first patch in the queue; “qtip” indicates the last; and the blue labels give the names of the patches. You could push them to another repository if you wanted, but I don’t recommend this. Keep them on your own machine for the time being.

Now that we’ve separated out our commits into a series of patches, we can get on with the job of placating the people who are worried about best practices. Namely: testing each patch before applying it.

First, double click on “[qparent]”:

image

You’ll note that our two patches have both dropped below the line, and they’re now greyed out. If you take a look at the repository explorer, you’ll see that there’s no sign of them:

image

The last revision has been marked in bold to indicate that that’s the one where your working copy is at.

If you double click on “tidy up” it will move above the line and turn blue again, to indicate that your working copy has been updated to this version:

image

That patch is now where your working copy is at. Do whatever testing you want to do on it, then click on the next one to apply it:

image

Once you’re satisfied that all your patches are ready, right-click on any of them and choose “Finish applied”:

image

Hey presto! Your work is now all committed to your repository, ready to be pushed, pulled or otherwise shared with the big wide world.

image

There are other things you can do with patches in your queue which I haven’t covered here, such as reordering them, or combining two or more of them into one.

Patch queues and hunk selection are two extremely powerful features of Mercurial. While they require a little bit of care and attention in order to adhere to best practices, this is no more arduous than the discipline needed for any source control tool, and they can provide a significant productivity boost, simply because they let your tools work around you rather than forcing you to work around your tools.

21
Jun

TortoiseHg as a github client on Windows

(Update: I’ve updated these instructions for Mercurial 1.6/TortoiseHg 1.1.)

I’m going to get really controversial here and say that I think Mercurial is better than git. My reasoning (as with the reasoning of everyone else who takes sides in this particular debate) is entirely subjective, so we won’t belabour the point here too much. Nevertheless, some of us do have a preference for one over the other, and many Subversion refugees like me who do most of their work in Windows tend to lean towards Mercurial.

But there’s no denying that github is fast becoming the Facebook of open source programming (albeit hopefully without the unethical bits, Farmville, and people tagging you in embarrassing photos for all and sundry to see), and if you want to strut your stuff as a developer, that’s the place to do it. Github is, of course, a hosting facility for git repositories, as one would expect of a site whose name says what it means and means what it says.

Fortunately, it is quite possible to use Mercurial as a client against github repositories via the hg-git extension, and you can pull and push from one to the other pretty much losslessly.

However, setting it all up on Windows is not entirely straightforward, and there doesn’t seem to be a decent guide to it anywhere on the Internet: most of the instructions that you read assume that you’re using either (a) Linux or a Mac, (b) the command line, or (c) both. You also have to figure it out from various places all over the web, and searches on Google and Stack Overflow proved to be surprisingly fruitless. Furthermore, the most comprehensive howto that I came across elsewhere contained several instructions that were just plain wrong.

So, after spending two solid evenings struggling against a myriad of error messages and cryptic dialog boxes, I finally managed to get it working, and for future reference (and anyone else who wants to know how), I’ve documented what I’ve found actually works for me as best I can.

1. Install TortoiseHg and hg-git.

Install TortoiseHg 1.1 or later. If you are using an earlier version, upgrade: these instructions may work if you don’t, but I can’t make any guarantees.

I downloaded hg-git by cloning the repository. You can get it from either github and Bitbucket. The advantage of cloning the repository is that you can upgrade to the latest version quickly and easily by hg pull then hg update, or use the graphical tools if you prefer. You can also easily switch between the bleeding edge version of the code and a stable release if you like.

hg clone http://bitbucket.org/durin42/hg-git c:\abc\mercurial\hg-git

I downloaded hg-git into the directory c:\abc\mercurial\hg-git. If you put it elsewhere in your filespace, alter these instructions to suit.

2. Update to the appropriate version of hg-git.

If you are using TortoiseHg 1.1, you will need to use hg-git 0.2.3. If you ignored my advice to upgrade, and are still using version 1.0, you will need to use hg-git 0.2.1. Don’t use version 0.2.2: it doesn’t work with either version of TortoiseHg.

The official hg-git documentation tells us that we also need to download and install Dulwich 0.4.0 or later. The latest version of hg-git requires Dulwich 0.6.0. In any case, Dulwich is included with TortoiseHg (version 0.6.0 with TortoiseHg 1.1; version 0.5.0 with TortoiseHg 1.0) so you don’t need to do anything else there. Open up the TortoiseHg repository explorer on your clone of hg-git, choose the “Tagged” radio button to show only tagged releases, and update to version 0.2.3:

image

3. Configure Mercurial to use hg-git and an appropriate SSH client.

To do this, you need to edit your mercurial.ini file. You can get to this simply by choosing “Global Settings” on the TortoiseHg context menu in Windows Explorer, and clicking “Edit file” to bring it up in Notepad. Add the following lines to your configuration file:

[extensions]
hggit = C:\abc\mercurial\hg-git\hggit

[ui]
ssh = "C:\Program Files\TortoiseHg\TortoisePlink.exe"

The [extensions] section loads hg-git into Mercurial; the ssh option in the [ui] section specifies an SSH command line client to use to communicate with github. TortoiseHg gives us TortoisePlink, which works fine for me.

4. Create a public key/private key pair.

There are some instructions on github on how to create a public key/private key pair. Unfortunately, these don’t tell you that key pairs come in two formats: OpenSSH (as used by git itself and github), and PuTTY (as used by Tortoise Everything).

A simpler approach is to download PuTTY (you can get it from here) and use PuTTYgen to generate your key pair:

PuTTYgen screenshot

Once you have generated your SSH key, copy and paste the “Public key for pasting into OpenSSH authorized_keys file” into github. Save your public key and private key to your hard disk somewhere.

5. Start Pageant

Pageant is a program that stores all your private keys in memory, where the SSH client used by Mercurial, that we configured above, can find them. It comes with both PuTTY and TortoiseHg. You can set it to load in your private key(s) when you log on to Windows by creating a new shortcut in the Startup folder of your Start menu with this command:

"C:\Program Files\TortoiseHg\Pageant.exe" "c:\abc\github.ppk"

Note that if you don’t start Pageant first and load in your private key, you will not be able to push to github.

6. Clone a repository and start pushing!

You should make sure that you get the format of your repository URL correct. It should be:

git+ssh://git@github.com/your-github-username/your-repo-name.git

The rest from there on is all plain sailing. All being well, you should now be able to pull from your github repository and push changes back up as if it were a Mercurial repository.

Things to check if it goes wrong.

Now all this is a bit of a fiddly process, there is plenty of room for error, and some of the error messages you are likely to get can be a little bit cryptic. However, most of it was due to me trying things that weren’t properly documented, and they all boiled down to a few things that you can check if you’ve followed the above instructions:

  • Are you using the correct version of hg-git? While you can use versions later than 0.2.1, you need to use a later version of Dulwich than that which comes with TortoiseHg 1.0.
  • The “ssh” option in your mercurial.ini file should only specify the name of the executable, without command line options. Some articles tell you that you can fill in the path to your private key in this option. Personally, I couldn’t get this to work, so I just stuck with Pageant.
  • Is Pageant running?
  • Is your private key loaded into Pageant?
  • Do your public and private keys match?
  • Is your private key saved in PuTTY format? If you generated your key pair using git, as per the instructions on github, it will be saved in OpenSSH format instead, and Pageant can’t handle that.1
  • Have you specified the URL to your github repository correctly? The version I gave above works, while missing out various parts of the URL (e.g. using “github.com” instead of “git@github.com“) doesn’t.
1 You can tell the difference between a PuTTY private key and an OpenSSH private key by opening them in Notepad. An OpenSSH private key will start off looking like this:

-----BEGIN RSA PRIVATE KEY-----
<transmission line noise>
-----END RSA PRIVATE KEY-----

whereas a PuTTY private key will look like this:

PuTTY-User-Key-File-2: ssh-rsa
Encryption: none
Comment: imported-openssh-key
Public-Lines: 6
<transmission line noise>
Private-Lines: 14
<transmission line noise>
Private-MAC:
<transmission line noise>
08
Mar

Command line instructions are not a good marketing strategy

Dear fellow Mercurial fans,

Please stop using the command line when you’re writing articles telling us how wonderful Mercurial is.

I don’t need to be convinced that it is superior to Subversion. I’ve been using it for about nine months alongside our central Subversion repository at work, as well as for my private projects at home, and there’s no doubt in my mind which is better by a long shot. Easy branching and merging, and local versioning for experimental development and refactoring, are killer features as far as I’m concerned. And ease of use is supposed to be its big selling point over git.

But other developers do need convincing, and if you’re apparently fanboying the command line, it doesn’t help. In fact, it’s downright embarrassing. Remember, you may be a Linux geek who writes code for fun at weekends, but most of them are nine to five Windows developers who switch out of code mode the minute they leave the office and don’t want to have to learn anything new unless it’s strictly necessary. To them, it looks elitist, arrogant, off-putting, and Luddite.

When I first heard about Mercurial and git about two years ago, neither of them had any form of graphical user interface to speak of. It was a case of hg this, hg that, git this, git that in a command shell versus TortoiseSVN’s repo-browser, show log and commit dialogs. You know, like, where you can actually see what you’re doing? Where you can frequently figure out what you need to do by experimentation and educated guesses rather than having to wade through a morass of man pages? Forget it, I thought. Come back to me in a year or two’s time when you have a decent graphical front end for it. In the meantime, I’m sticking with TortoiseSVN.

Heck, I’m the kind of developer who likes to try out new things. I like Linq, and MVC, and jQuery, and Python, and IOC containers, and Colemak keyboards. I know Linux and I’m not afraid to use it. If I was put off by the impression that Mercurial was command-line only, what hope do you have of convincing the rank and file Windows developers who are scared of the command prompt?

Nowadays, of course, we have TortoiseHg, which gives it a decent, powerful and intuitive front end. In fact it was TortoiseHg that sold me on Mercurial in the first place, because it lets you see exactly what you’re doing when you’re branching and merging, as well as flattening out the learning curve dramatically. Just take a look at its repository explorer, for instance:

image

See? You even get a nice little graph showing you exactly where all your branches are. Context menus make it easy to figure out what to do next and actually do it. Oh, and it shows you the most recent changes first, rather than just vomiting everything out onto the screen and leaving you staring at changeset zero, like you get when you run hg log:

image

To a seasoned developer, there are advantages to the command prompt. It’s easier to type into your blog, easier to copy and paste, and easier to script. But there is a time and a place for everything, and introductory tutorials for tools with perfectly good graphical front ends are not the time and place for a command prompt. Doing a screen capture, firing up Paint.net and cropping your image to the right size may be more of a faff, but in an introductory tutorial, merely typing hg push instead is either outright elitism or sheer laziness. Please, cut it out. Use TortoiseHg to introduce Mercurial, and keep the command line for more advanced tasks.

01
Jun

Why would anyone not use source control?

There’s a question over on Stack Overflow that asks if there are any good reasons for not using source control. It’s a question I’ve been racking my brains over for a while now, especially since you do occasionally encounter people who claim they have good reasons not to. The most common such reason that I come across is that they’re a lone developer — an excuse that simply shows that they haven’t a clue what source control actually is.

One person pointed out that physicists are particularly unlikely to use source control:

For the casual programmers – those to whom programming is just a tool, such as many of the people I work with (scientists) – much of the work is hackish and small scale, there may be a dozen other things that are more likely to fail outside the code which could also be eliminated with better practices.

As a colleague put it, “we don’t get published for writing beautiful code”.

Interesting point that. Most programs written by physicists tend to be no more than a few hundred lines long, or even just a Microsoft Excel spreadsheet, and once they’re debugged and working, they usually don’t change. This is of course the exact opposite of business and web programming, where requirements change faster than you can keep up with them. However, you can’t really generalise here. I’d be very surprised, for instance, if NASA doesn’t use some from of source control for the Mars rovers.

Another person gave an answer that was especially worth commenting on:

“For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is” –Torvalds

If you’ve got quick/easy/automatic backups, you’ve already got 95% of what most of us use VC for. Somebody with a local DVCS repository on his HD but no backups is actually in much worse shape.

Using a VCS does have a real cost, and it’s usually a small one but not always. Every VCS I’ve ever used, I’ve had days where I had to fight with it for hours just to get it to do something that should have been simple.

To those that think “There are no good reasons not to use version control”, where does it end? Must every project have 100% unit test code coverage? Must every project have code reviews? Coding standards? A complete functional spec?

There’s a whole spectrum of programming projects in the world. Not everybody is writing code for the space shuttle. Sometimes being able to diff my code from 11:00am and 11:30am is simply not that important.

Some are merely managing globally-distributed teams of thousands writing operating system kernels.

This is another interesting point — if the Linux kernel managed fine without source control for ten years, why should we use it? In actual fact, the commenter is not entirely correct: the Linux kernel has been under source control since 2002 and Linus Torvalds even wrote his own source control system because he was dissatisfied with all the others that were available at the time. But this is an indictment of CVS in particular, not of source control in general — at the time the choice that you had was between that and something costing an arm and a leg.

This highlights another fairly common reason why people shy away from source control: they perceive it as being more trouble than it’s worth. In recent years, most developers’ first experience of source control has been Subversion. Once you get used to it, Subversion is pretty powerful and works very well, but unfortunately it is not a good example to throw at beginners when telling them they need to use source control. Getting your project under source control in the first place with it is a faff, and I’ve lost count of the number of times that it’s gotten so confused with itself that I’ve had to do a fresh checkout just to get it working properly again. And all those extraneous .svn directories that pollute your project’s filespace can be a major irritation at times.

So what is the best option to convince the naysayers? In a word: Mercurial.

Recently I’ve been playing with some of the new distributed source control systems such as Git and Mercurial, and I get the impression that they are much better suited to new and casual developers than Subversion. They’re a lot easier to use for starters — in combination with visual front ends such as TortoiseHg, you can get your entire project under source control with only three or four mouse clicks. They also have fewer pitfalls and gotchas — you can rename and delete files and directories much more easily without creating a whole lot of confusion, for instance.

Another big advantage of modern distributed source control systems such as Mercurial is that they scale down as well as up. Mercurial creates a single .hg directory in your project’s root which acts as a complete repository in and of itself. For a lone developer this is probably all you need, in tandem with a decent backup strategy, and it even makes it entirely reasonable to get your throwaway scripts under source control. After all, throwaway scripts have a rather nasty habit of not being as throwaway as we first thought they would be.

For development teams, you can have a central repository in addition to the developers’ personal ones, and push the changes to the central server once you’re done. For really big projects, you can have a whole hierarchy of source control servers, with changes being pushed up to the next level once they have passed quality control and whatever other processes you may have in place.

There may have been reasonable excuses for not using source control five years ago on small, trivial projects. But with the latest generation of tools, these excuses are getting flimsier and flimsier every day. Even for physicists.