You know an advert is intensely annoying when you start whistling the tune from it even though you hate it. #gocompare 3 days ago
05
Jul

Some notes on upgrading to Mercurial 1.6 with hgsubversion

I encountered a few wrinkles when I upgraded to Mercurial 1.6/TortoiseHg 1.1 on Friday. These were entirely due to the hgsubversion extension—an essential part of my toolkit these days, since it upgrades the experience of working with Subversion from “infuriating” to “tolerable.”

First, you should clone or pull the latest version of hgsubversion and update it to tip before you upgrade Mercurial. This is due to breaking changes in Mercurial’s internal API after hgsubversion 1.1 was released.

Second, there is a bug in hgsubversion, again due to breaking changes in the Mercurial API, that results in an exception when it is used with the bookmarks extension in version 1.6. I have implemented a fix for this in my own fork of hgsubversion, though at the time of writing this hasn’t yet made it back into the official repo.

(Update: this fix has now been merged into hgsubversion 1.2.2.)

Apart from that, everything seems to work reasonably enough. My initial tests seem to indicate that hg-git works fine with Mercurial 1.6. TortoiseHg 1.1 comes with Dulwich 0.6.0, so you can now safely update hg-git to the latest stable version (0.2.2) or to tip if you’re that way inclined.

28
Jun

Just how smart is Git?

The merge algorithms in distributed source control are a vast improvement on Subversion, which falls over on all but the simplest cases, but contrary to the claims of some, they are not omniscient.

For example, it is commonly believed that Git can track functions being moved from one file to another, and merge in the changes appropriately. It’s a plausible claim—I’m pretty sure that such a merge algorithm is doable—and it would also be particularly useful, since it would open up several more possibilities for the kind of refactoring that you can manage in a distributed workflow with parallel work streams.

Of course, this is very easy to test, so on Friday evening, I decided to test it.

I loaded the code for the latest version of my Comment Timeout WordPress plugin into a new git repository and created a branch called Alice. I then edited the file class.post-processor.php to add a comment in the constructor, and checked it in.

I then switched back to the original version and created a branch called Bob. I cut the constructor from class.post-processor.php and pasted it into class.comment-processor.php, and checked the resulting changes into Bob. I then attempted to merge Alice and Bob.

Lo and behold, I got a merge conflict.

Both Git and Mercurial support a merge-over-rename scenario which Subversion doesn’t: if Alice makes some edits to foo.txt and Bob renames it to bar.txt, Alice’s changes to foo.txt will be correctly merged into Bob’s bar.txt. I haven’t done any more exhaustive testing than that, but I suspect that Git and Mercurial have overall roughly similar merging capabilities, though undoubtedly there will be some edge cases that one handles that the other doesn’t, and vice versa.

It seems that the biggest difference is that Git uses various heuristics to automatically detect file renames, whereas Mercurial expects the user to flag them explicitly. Which approach is better is controversial, but Mercurial users who prefer Git’s behaviour in this respect may be interested in this experimental extension to detect and register obvious renames at commit time which I knocked together over the weekend.

21
Jun

TortoiseHg as a github client on Windows

(Update: I’ve updated these instructions for Mercurial 1.6/TortoiseHg 1.1.)

I’m going to get really controversial here and say that I think Mercurial is better than git. My reasoning (as with the reasoning of everyone else who takes sides in this particular debate) is entirely subjective, so we won’t belabour the point here too much. Nevertheless, some of us do have a preference for one over the other, and many Subversion refugees like me who do most of their work in Windows tend to lean towards Mercurial.

But there’s no denying that github is fast becoming the Facebook of open source programming (albeit hopefully without the unethical bits, Farmville, and people tagging you in embarrassing photos for all and sundry to see), and if you want to strut your stuff as a developer, that’s the place to do it. Github is, of course, a hosting facility for git repositories, as one would expect of a site whose name says what it means and means what it says.

Fortunately, it is quite possible to use Mercurial as a client against github repositories via the hg-git extension, and you can pull and push from one to the other pretty much losslessly.

However, setting it all up on Windows is not entirely straightforward, and there doesn’t seem to be a decent guide to it anywhere on the Internet: most of the instructions that you read assume that you’re using either (a) Linux or a Mac, (b) the command line, or (c) both. You also have to figure it out from various places all over the web, and searches on Google and Stack Overflow proved to be surprisingly fruitless. Furthermore, the most comprehensive howto that I came across elsewhere contained several instructions that were just plain wrong.

So, after spending two solid evenings struggling against a myriad of error messages and cryptic dialog boxes, I finally managed to get it working, and for future reference (and anyone else who wants to know how), I’ve documented what I’ve found actually works for me as best I can.

1. Install TortoiseHg and hg-git.

Install TortoiseHg 1.1 or later. If you are using an earlier version, upgrade: these instructions may work if you don’t, but I can’t make any guarantees.

I downloaded hg-git by cloning the repository. You can get it from either github and Bitbucket. The advantage of cloning the repository is that you can upgrade to the latest version quickly and easily by hg pull then hg update, or use the graphical tools if you prefer. You can also easily switch between the bleeding edge version of the code and a stable release if you like.

hg clone http://bitbucket.org/durin42/hg-git c:\abc\mercurial\hg-git

I downloaded hg-git into the directory c:\abc\mercurial\hg-git. If you put it elsewhere in your filespace, alter these instructions to suit.

2. Update to the appropriate version of hg-git.

If you are using TortoiseHg 1.1, you will need to use hg-git 0.2.3. If you ignored my advice to upgrade, and are still using version 1.0, you will need to use hg-git 0.2.1. Don’t use version 0.2.2: it doesn’t work with either version of TortoiseHg.

The official hg-git documentation tells us that we also need to download and install Dulwich 0.4.0 or later. The latest version of hg-git requires Dulwich 0.6.0. In any case, Dulwich is included with TortoiseHg (version 0.6.0 with TortoiseHg 1.1; version 0.5.0 with TortoiseHg 1.0) so you don’t need to do anything else there. Open up the TortoiseHg repository explorer on your clone of hg-git, choose the “Tagged” radio button to show only tagged releases, and update to version 0.2.3:

image

3. Configure Mercurial to use hg-git and an appropriate SSH client.

To do this, you need to edit your mercurial.ini file. You can get to this simply by choosing “Global Settings” on the TortoiseHg context menu in Windows Explorer, and clicking “Edit file” to bring it up in Notepad. Add the following lines to your configuration file:

[extensions]
hggit = C:\abc\mercurial\hg-git\hggit

[ui]
ssh = "C:\Program Files\TortoiseHg\TortoisePlink.exe"

The [extensions] section loads hg-git into Mercurial; the ssh option in the [ui] section specifies an SSH command line client to use to communicate with github. TortoiseHg gives us TortoisePlink, which works fine for me.

4. Create a public key/private key pair.

There are some instructions on github on how to create a public key/private key pair. Unfortunately, these don’t tell you that key pairs come in two formats: OpenSSH (as used by git itself and github), and PuTTY (as used by Tortoise Everything).

A simpler approach is to download PuTTY (you can get it from here) and use PuTTYgen to generate your key pair:

PuTTYgen screenshot

Once you have generated your SSH key, copy and paste the “Public key for pasting into OpenSSH authorized_keys file” into github. Save your public key and private key to your hard disk somewhere.

5. Start Pageant

Pageant is a program that stores all your private keys in memory, where the SSH client used by Mercurial, that we configured above, can find them. It comes with both PuTTY and TortoiseHg. You can set it to load in your private key(s) when you log on to Windows by creating a new shortcut in the Startup folder of your Start menu with this command:

"C:\Program Files\TortoiseHg\Pageant.exe" "c:\abc\github.ppk"

Note that if you don’t start Pageant first and load in your private key, you will not be able to push to github.

6. Clone a repository and start pushing!

You should make sure that you get the format of your repository URL correct. It should be:

git+ssh://git@github.com/your-github-username/your-repo-name.git

The rest from there on is all plain sailing. All being well, you should now be able to pull from your github repository and push changes back up as if it were a Mercurial repository.

Things to check if it goes wrong.

Now all this is a bit of a fiddly process, there is plenty of room for error, and some of the error messages you are likely to get can be a little bit cryptic. However, most of it was due to me trying things that weren’t properly documented, and they all boiled down to a few things that you can check if you’ve followed the above instructions:

  • Are you using the correct version of hg-git? While you can use versions later than 0.2.1, you need to use a later version of Dulwich than that which comes with TortoiseHg 1.0.
  • The “ssh” option in your mercurial.ini file should only specify the name of the executable, without command line options. Some articles tell you that you can fill in the path to your private key in this option. Personally, I couldn’t get this to work, so I just stuck with Pageant.
  • Is Pageant running?
  • Is your private key loaded into Pageant?
  • Do your public and private keys match?
  • Is your private key saved in PuTTY format? If you generated your key pair using git, as per the instructions on github, it will be saved in OpenSSH format instead, and Pageant can’t handle that.1
  • Have you specified the URL to your github repository correctly? The version I gave above works, while missing out various parts of the URL (e.g. using “github.com” instead of “git@github.com“) doesn’t.
1 You can tell the difference between a PuTTY private key and an OpenSSH private key by opening them in Notepad. An OpenSSH private key will start off looking like this:

-----BEGIN RSA PRIVATE KEY-----
<transmission line noise>
-----END RSA PRIVATE KEY-----

whereas a PuTTY private key will look like this:

PuTTY-User-Key-File-2: ssh-rsa
Encryption: none
Comment: imported-openssh-key
Public-Lines: 6
<transmission line noise>
Private-Lines: 14
<transmission line noise>
Private-MAC:
<transmission line noise>
08
Mar

Command line instructions are not a good marketing strategy

Dear fellow Mercurial fans,

Please stop using the command line when you’re writing articles telling us how wonderful Mercurial is.

I don’t need to be convinced that it is superior to Subversion. I’ve been using it for about nine months alongside our central Subversion repository at work, as well as for my private projects at home, and there’s no doubt in my mind which is better by a long shot. Easy branching and merging, and local versioning for experimental development and refactoring, are killer features as far as I’m concerned. And ease of use is supposed to be its big selling point over git.

But other developers do need convincing, and if you’re apparently fanboying the command line, it doesn’t help. In fact, it’s downright embarrassing. Remember, you may be a Linux geek who writes code for fun at weekends, but most of them are nine to five Windows developers who switch out of code mode the minute they leave the office and don’t want to have to learn anything new unless it’s strictly necessary. To them, it looks elitist, arrogant, off-putting, and Luddite.

When I first heard about Mercurial and git about two years ago, neither of them had any form of graphical user interface to speak of. It was a case of hg this, hg that, git this, git that in a command shell versus TortoiseSVN’s repo-browser, show log and commit dialogs. You know, like, where you can actually see what you’re doing? Where you can frequently figure out what you need to do by experimentation and educated guesses rather than having to wade through a morass of man pages? Forget it, I thought. Come back to me in a year or two’s time when you have a decent graphical front end for it. In the meantime, I’m sticking with TortoiseSVN.

Heck, I’m the kind of developer who likes to try out new things. I like Linq, and MVC, and jQuery, and Python, and IOC containers, and Colemak keyboards. I know Linux and I’m not afraid to use it. If I was put off by the impression that Mercurial was command-line only, what hope do you have of convincing the rank and file Windows developers who are scared of the command prompt?

Nowadays, of course, we have TortoiseHg, which gives it a decent, powerful and intuitive front end. In fact it was TortoiseHg that sold me on Mercurial in the first place, because it lets you see exactly what you’re doing when you’re branching and merging, as well as flattening out the learning curve dramatically. Just take a look at its repository explorer, for instance:

image

See? You even get a nice little graph showing you exactly where all your branches are. Context menus make it easy to figure out what to do next and actually do it. Oh, and it shows you the most recent changes first, rather than just vomiting everything out onto the screen and leaving you staring at changeset zero, like you get when you run hg log:

image

To a seasoned developer, there are advantages to the command prompt. It’s easier to type into your blog, easier to copy and paste, and easier to script. But there is a time and a place for everything, and introductory tutorials for tools with perfectly good graphical front ends are not the time and place for a command prompt. Doing a screen capture, firing up Paint.net and cropping your image to the right size may be more of a faff, but in an introductory tutorial, merely typing hg push instead is either outright elitism or sheer laziness. Please, cut it out. Use TortoiseHg to introduce Mercurial, and keep the command line for more advanced tasks.

01
Jun

Why would anyone not use source control?

There’s a question over on Stack Overflow that asks if there are any good reasons for not using source control. It’s a question I’ve been racking my brains over for a while now, especially since you do occasionally encounter people who claim they have good reasons not to. The most common such reason that I come across is that they’re a lone developer — an excuse that simply shows that they haven’t a clue what source control actually is.

One person pointed out that physicists are particularly unlikely to use source control:

For the casual programmers – those to whom programming is just a tool, such as many of the people I work with (scientists) – much of the work is hackish and small scale, there may be a dozen other things that are more likely to fail outside the code which could also be eliminated with better practices.

As a colleague put it, “we don’t get published for writing beautiful code”.

Interesting point that. Most programs written by physicists tend to be no more than a few hundred lines long, or even just a Microsoft Excel spreadsheet, and once they’re debugged and working, they usually don’t change. This is of course the exact opposite of business and web programming, where requirements change faster than you can keep up with them. However, you can’t really generalise here. I’d be very surprised, for instance, if NASA doesn’t use some from of source control for the Mars rovers.

Another person gave an answer that was especially worth commenting on:

“For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is” –Torvalds

If you’ve got quick/easy/automatic backups, you’ve already got 95% of what most of us use VC for. Somebody with a local DVCS repository on his HD but no backups is actually in much worse shape.

Using a VCS does have a real cost, and it’s usually a small one but not always. Every VCS I’ve ever used, I’ve had days where I had to fight with it for hours just to get it to do something that should have been simple.

To those that think “There are no good reasons not to use version control”, where does it end? Must every project have 100% unit test code coverage? Must every project have code reviews? Coding standards? A complete functional spec?

There’s a whole spectrum of programming projects in the world. Not everybody is writing code for the space shuttle. Sometimes being able to diff my code from 11:00am and 11:30am is simply not that important.

Some are merely managing globally-distributed teams of thousands writing operating system kernels.

This is another interesting point — if the Linux kernel managed fine without source control for ten years, why should we use it? In actual fact, the commenter is not entirely correct: the Linux kernel has been under source control since 2002 and Linus Torvalds even wrote his own source control system because he was dissatisfied with all the others that were available at the time. But this is an indictment of CVS in particular, not of source control in general — at the time the choice that you had was between that and something costing an arm and a leg.

This highlights another fairly common reason why people shy away from source control: they perceive it as being more trouble than it’s worth. In recent years, most developers’ first experience of source control has been Subversion. Once you get used to it, Subversion is pretty powerful and works very well, but unfortunately it is not a good example to throw at beginners when telling them they need to use source control. Getting your project under source control in the first place with it is a faff, and I’ve lost count of the number of times that it’s gotten so confused with itself that I’ve had to do a fresh checkout just to get it working properly again. And all those extraneous .svn directories that pollute your project’s filespace can be a major irritation at times.

So what is the best option to convince the naysayers? In a word: Mercurial.

Recently I’ve been playing with some of the new distributed source control systems such as Git and Mercurial, and I get the impression that they are much better suited to new and casual developers than Subversion. They’re a lot easier to use for starters — in combination with visual front ends such as TortoiseHg, you can get your entire project under source control with only three or four mouse clicks. They also have fewer pitfalls and gotchas — you can rename and delete files and directories much more easily without creating a whole lot of confusion, for instance.

Another big advantage of modern distributed source control systems such as Mercurial is that they scale down as well as up. Mercurial creates a single .hg directory in your project’s root which acts as a complete repository in and of itself. For a lone developer this is probably all you need, in tandem with a decent backup strategy, and it even makes it entirely reasonable to get your throwaway scripts under source control. After all, throwaway scripts have a rather nasty habit of not being as throwaway as we first thought they would be.

For development teams, you can have a central repository in addition to the developers’ personal ones, and push the changes to the central server once you’re done. For really big projects, you can have a whole hierarchy of source control servers, with changes being pushed up to the next level once they have passed quality control and whatever other processes you may have in place.

There may have been reasonable excuses for not using source control five years ago on small, trivial projects. But with the latest generation of tools, these excuses are getting flimsier and flimsier every day. Even for physicists.