You know an advert is intensely annoying when you start whistling the tune from it even though you hate it. #gocompare 3 days ago
26
Mar

On web deployment

Scott Hanselman says that if you’re using XCopy for deploying web applications, you’re doing it wrong. He is talking, of course, about the web deployment features of Visual Studio 2010, which constitute Microsoft’s attempts to solve a problem that is a lot less trivial than it looks.

It’s a bit of a strong statement, and I’m not sure that I agree with it. For the past four years or so, I’ve used a variant of XCopy deployment that I’ve found to be very effective. I put each release of the website into a separate folder, numbered after the version reported by Subversion, Mercurial, CruiseControl or TeamCity, depending on which of the above I’m using, and I just switch the directory in IIS, or on Linux it’s just a case of changing a symlink. This all but eliminates downtime for the vast majority of upgrades, as well as allowing you to roll back in seconds to any previous version that you still have available if things go pear shaped.

I’d like to see how Web Deploy handles upgrades like this. My experience of software upgrades is that they are rarely that seamless and usually involve several seconds of downtime, though having said that, if your website is so busy that half a minute of downtime is a serious problem, the chances are that you have failover servers that you can bring in while you upgrade.

A more serious issue, however, is rollback. Some of the sites I work on are pretty high profile, and the ability to roll back in seconds if things don’t work out is a deal breaker as far as I’m concerned. That’s why I’ve found the XCopy/IIS settings switchover approach to be such a winner.

I am not impressed with the approach that Visual Studio 2010 adopts to managing web.config files, however. This approach sets your connection strings etc at build time, which can be pretty painful since you have to have different builds for development, integration, test, production and so on, and once you start branching and merging, and have to have separate connection strings for separate branches, it can completely blow up in your face if you’re not careful. No, configuration is a deployment time operation and needs to be treated as such. The best place for your configuration settings is outside your application root, in a common location easily accessible to every version of your site.

Finally, one last tip. Never deploy on a Friday. There are two reasons for this: first, it’s the end of the week, you’re tired, you just want to go home, and you’re much more likely to make a mistake than on a Monday when you’re fresh. Second, if something does go wrong, it will really, really, really ruin your weekend.

23
Mar

NAnt and MSBuild are completely pointless

I mentioned this in passing in a recent blog entry, and I thought I’d expand on it a bit.

I do not like NAnt.

I do not like MSBuild either.

I’ve used both of them, and quite frankly, I don’t see the point of either of them. To be sure, MSBuild allows you to build Visual Studio solutions from the command line, but that’s MSBuild the program. MSBuild the language, on the other hand, is a completely, utterly pointless reinvention of NAnt, which itself is probably the most completely, utterly pointless domain specific language in widespread use that I’ve ever come across.

Neither language does anything that you can’t do in Python. In fact, most of the time, Python does it better, with a cleaner syntax because it isn’t XML-based. XML is fine for some things, but the foundation for a scripting language is not one of them. When you’re using XML as the basis for your scripting language, you’re getting dangerously into “all you have is a hammer, so everything looks like a nail” territory.

Besides, neither of them are used anywhere for anything other than writing build scripts. If you use Python, at least you can leverage your knowledge for other domains, such as web development, game development, OS scripting, and much, much more.

I say Python here purely because I happen to know it. There are other decent, popular, multi-purpose languages that you can use to write build scripts. There’s no reason why you can’t use Ruby, or PowerShell, or even good old fashioned batch files, for instance. But having to learn and use a fiddly, awkward new language solely for the purpose of setting up or changing your build scripts—something that you only do relatively infrequently—simply doesn’t make sense.

15
Mar

If part of your framework is not fit for purpose, don’t use it

I have a long-standing gripe about web.config files. They are where you are “officially” supposed to put all your application’s configuration settings, but the framework throws in a whole lot of other so-called configuration settings that are, to all intents and purposes, code. Such as HTTP modules, assembly references, which version of the C# compiler to use, and so on.

This is bad. A well-designed application configuration file will only contain settings that vary from one deployment to the next. Anything that is the same across the majority of deployments should be set in your code either by convention or by default values. Anything that doesn’t change from one deployment to the next is not configuration, but code.

But why not just stop using web.config for your app settings and connection strings altogether? There’s nothing stopping you from writing your own configuration class which pulls in all your application settings and connection strings from a JSON file in the parent directory to your application root, or in the Windows registry, if that’s what you need to do.

Build scripts are another example. What language do you use to write your build scripts? Chances are, you either use NAnt or MSBuild. But both of these are XML-based, unwieldy, tricky to learn and use, and somewhat limited. What’s to stop you using Python or Ruby instead, or even batch files, for instance? They can do everything that NAnt and MSBuild can do and more, they are much simpler to understand and edit, and you can leverage the knowledge involved elsewhere.

Just because the framework provides you with an “official” way of doing something, it doesn’t mean you have to use it. Using a different approach to the officially touted way may sound a bit radical or perhaps even iconoclastic at first, but it makes perfect sense once you think about it. After all, if the accepted wisdom passed down from Redmond is not fit for purpose, blindly sticking with it even though it gets in the way is just cargo cult programming.

08
Mar

Command line instructions are not a good marketing strategy

Dear fellow Mercurial fans,

Please stop using the command line when you’re writing articles telling us how wonderful Mercurial is.

I don’t need to be convinced that it is superior to Subversion. I’ve been using it for about nine months alongside our central Subversion repository at work, as well as for my private projects at home, and there’s no doubt in my mind which is better by a long shot. Easy branching and merging, and local versioning for experimental development and refactoring, are killer features as far as I’m concerned. And ease of use is supposed to be its big selling point over git.

But other developers do need convincing, and if you’re apparently fanboying the command line, it doesn’t help. In fact, it’s downright embarrassing. Remember, you may be a Linux geek who writes code for fun at weekends, but most of them are nine to five Windows developers who switch out of code mode the minute they leave the office and don’t want to have to learn anything new unless it’s strictly necessary. To them, it looks elitist, arrogant, off-putting, and Luddite.

When I first heard about Mercurial and git about two years ago, neither of them had any form of graphical user interface to speak of. It was a case of hg this, hg that, git this, git that in a command shell versus TortoiseSVN’s repo-browser, show log and commit dialogs. You know, like, where you can actually see what you’re doing? Where you can frequently figure out what you need to do by experimentation and educated guesses rather than having to wade through a morass of man pages? Forget it, I thought. Come back to me in a year or two’s time when you have a decent graphical front end for it. In the meantime, I’m sticking with TortoiseSVN.

Heck, I’m the kind of developer who likes to try out new things. I like Linq, and MVC, and jQuery, and Python, and IOC containers, and Colemak keyboards. I know Linux and I’m not afraid to use it. If I was put off by the impression that Mercurial was command-line only, what hope do you have of convincing the rank and file Windows developers who are scared of the command prompt?

Nowadays, of course, we have TortoiseHg, which gives it a decent, powerful and intuitive front end. In fact it was TortoiseHg that sold me on Mercurial in the first place, because it lets you see exactly what you’re doing when you’re branching and merging, as well as flattening out the learning curve dramatically. Just take a look at its repository explorer, for instance:

image

See? You even get a nice little graph showing you exactly where all your branches are. Context menus make it easy to figure out what to do next and actually do it. Oh, and it shows you the most recent changes first, rather than just vomiting everything out onto the screen and leaving you staring at changeset zero, like you get when you run hg log:

image

To a seasoned developer, there are advantages to the command prompt. It’s easier to type into your blog, easier to copy and paste, and easier to script. But there is a time and a place for everything, and introductory tutorials for tools with perfectly good graphical front ends are not the time and place for a command prompt. Doing a screen capture, firing up Paint.net and cropping your image to the right size may be more of a faff, but in an introductory tutorial, merely typing hg push instead is either outright elitism or sheer laziness. Please, cut it out. Use TortoiseHg to introduce Mercurial, and keep the command line for more advanced tasks.

04
Feb

Catching Exception is almost never justified and almost always harmful

I was doing an ad-hoc review of another developer’s code not long ago when I saw something like this:

try {
    return bool.Parse(GetSomething());
}
catch (Exception) {
    return false;
}

I gently pointed out to him that this is a bad practice. Apart from the fact that you can use bool.TryParse() instead of bool.Parse(), your GetSomething() method may be throwing exceptions indicating a rather more serious problem, such as your database being down.

Catching Exception is one of my pet peeves, but sadly it’s far too common, even among smart developers that I’d have expected to know better, cropping up in commercial products and open source projects alike. Part of the problem is the code samples in the MSDN documentation itself, which are littered with completely unnecessary try ... catch (Exception) blocks, that people copy and paste without thinking about it. But it’s also a quick and dirty hack — it’s easier to simply catch Exception and cross your fingers than to look up the documentation to find out exactly what you should be catching.

But this is reckless and dangerous. Catching exceptions inappropriately can lead to some very serious bugs in your code — serious, because you are deliberately ignoring them while they wreak havoc with your data. In one instance, I was asked to troubleshoot an application where a database upgrade had been botched and nobody had noticed for several days until the users started complaining that their changes weren’t being saved. You may also be ignoring misconfiguration, missing assemblies, external services being offline, and so on. And even if the effects aren’t serious, the bugs can still be particularly difficult to track down, as your logs will likely contain misleading error reports, if indeed they contain any error reports at all.

Catching general exception types without re-throwing them is almost never justified, and almost always harmful.

The correct approach to exceptions is to allow them to bubble up to the topmost level of your code, and handle them there by logging them and presenting an approriate error message to the user. For ASP.NET applications, this is the Application_Error event handler in your global.asax file, or perhaps an error logging framework such as ELMAH. For console applications, it is your Main method. For separate threads, it is the topmost method of the thread. And so on.

Well written code has very few try ... catch blocks. The most common case where you would have a general exception handler is when you need to roll back a transaction or otherwise leave your application in a consistent state when you re-throw:

try {
    BeginTransaction();
    DoStuff();
    Commit();
}
catch {
    RollBack();
    throw;
}

Aside: when you re-throw the exception, always use throw; here (which preserves the stack trace), not throw ex; (which doesn’t).

Apart from that, you should only catch specific exception types that you are both able and willing to handle meaningfully. Certainly, catching Exception should be treated as the nuclear option — and if there really is no alternative, you should always log the exceptions and rigorously justify your decision both in comments and in a code review. And next time you are tempted to write catch (Exception), ask yourself this question:

What would this code do if the exception were due to a botched deployment, an out of memory error, or a misconfiguration?

02
Feb

Are deletionists harming Wikipedia?

There’s a discussion over on the Colemak forums at the moment about the Wikipedia problem. It seems that, not content with having the article deleted on the grounds of non-notability a while ago, some Wikipedians are trying to eradicate every last mention of the layout from anywhere on the site. The deletion decision had eventually ended up as a redirect to a section on the Keyboard layout article, but it seems that even that’s been removed now, by a particularly argumentative individual who is rigidly and inflexibly applying his interpretation of the Reliable Sources policy.

Now as a satisfied Colemak typist I may be somewhat biased on this matter, but this one should be obvious. Colemak may be a pretty niche subject, but it has been covered a couple of times in the media—not a lot, but usually sufficient to at least get a “no consensus” decision in an AfD debate, which automatically defaults to “keep.” On top of that, it is included in X11 and every Linux distribution going. It’s one of only about half a dozen options for keyboard layout variant displayed on the installation screens of Ubuntu. It’s right in your face, not tucked away in some obscure and dangerous config file. Everyone who installs Ubuntu will be aware of it. Some of them will want to find out more about it. And they will expect Wikipedia to say something about it. But it won’t.

Of course, if it were just Colemak that were affected, I’m sure you could just dismiss this as a fanboy rant on my part, but this actually illustrates a much wider problem. With over three million articles, on everything from minor league ice hockey players to fictional foods in Babylon 5, Wikipedia is now the first place people turn to for information on anything obscure and only marginally notable. Wikipedia’s end users expect it to be an indiscriminate collection of information. Yet an indiscriminate collection of information is one of the things that Wikipedians are adamant that Wikipedia is not.

This is like being told that a problem in Sage or QuickBooks that is causing your tax return to be filled out with gibberish is not a bug, but a feature.

The problem is that there is a massive disconnect between Wikipedia’s users—casual visitors who often don’t even bother to create an account—and its overlords—the regular, active Wikipedians with edit counts in the thousands or even tens of thousands and an encyclopaedic knowledge and understanding of its policies. It is at its most striking in the whole inclusionist versus deletionist debate. And the deletionists are alienating a lot of would-be Wikipedians.

It turns out that this is one of the biggest criticisms levelled at Wikipedia by occasional editors. People come onto the site knowing nothing of Wikpedia’s policies, but plenty about some—possibly very niche—subject. They make half a dozen or so edits, then return a week later to find that their article has been deleted with no apparent explanation. Or perhaps it will be flagged with a deletion debate, crammed full of arcane and cabalistic abbreviations such as WP:NFT, WP:NOTE, WP:V, WP:WAX, WP:SOAP, WP:IAR, and so on, all pointing to Wikipedia’s byzantine and convoluted policies, guidelines and procedures. What kind of impression does this leave the casual editor? That Wikipedia is a hideout for a bunch of antisocial, bureaucratic teenage control freaks—a kind of online equivalent to the kids on the beach who kick the sandcastle you’ve just spent three hours building into your face. And since first impressions count the most, they will go off, never contribute anything else, and rant on blogs and forums about how insular and out of touch with Real Life these Wikipedians are.

Why is this harming Wikipedia? Because these are the people who contribute the overwhelming majority of substantive, meaningful content to the site.

This study by Aaron Swartz will be particularly enlightening to anyone who doubts this claim. His research on a data dump of Wikipedia indicated that most contributions of actual substantive content are made by new and casual users, many of whom never even create an account and most of whom only make a handful of edits to the site. Regular Wikipedians, on the other hand, tend to spend most of their time tidying things up—moving text around, correcting spelling mistakes, wikifying things—and deleting stuff.

I’ve sometimes looked at these deletion debates and wondered how many of the people voting for deletion with reference to obscure areas of Wiki policy even begin to understand the subject matter of the article under discussion itself. Some of the arguments for deletion of Colemak are laughable for starters. They’d have us belive that nobody uses it (a brief glance at the activity on the forums and the Facebook group and even the AfD debate itself will quickly dispel this notion); that X11 is an anarchic free-for-all where you could submit a patch containing a rootkit backdoor and it would be accepted; and that the only way to enable Colemak in Ubuntu is to edit some obscure and dangerous config file where it’s buried in a list of gazillions of options and a slight typo will make your computer unbootable.

Certainly, searches for reliable sources are usually cursory: no hits on Google News, no hits on Google Scholar, so delete. Blogs are automatically not considered reliable sources, even if they’re written by experts in the industry such as Tim Bray, Simon Willison or Jeff Atwood. In fact, Jeff Atwood’s Wikipedia entry also fell foul of the deletionists a year ago, when Stack Overflow was in public beta, which shows just how completely out of touch with reality they are. (Incidentally, web development is one area in particular where WP:RS is a very bad metric for notability, simply because it’s an industry where a lot of key activity happens at the grassroots level. The sources that web developers regard as reliable enough for practical purposes are generally high profile blogs like Jeff’s, while the academics writing papers on how to use lines of code per day as a productivity metric are frequently regarded as an irrelevance at best and harmful at worst.)

There’s also a lot of bluster and bullying goes on when the deletionists crop up. Throwing acronyms around sends a signal to newbies that they’re not welcome. If you Twitter about a deletion debate, you’re accused of canvassing and booed off. Anonymous accounts and new users are often regarded with suspicion as potential sock puppets. Most people find it hostile and intimidating, and perhaps even a bit childish, but the deletionists don’t care. They’re so obsessed with making Wikipedia what they think it should be that they’ve completely lost sight of the end users.

16
Dec

Can your database versioning tool do this?

I’ve been evaluating DbGhost recently. It’s one of those database lifecycle management tools that, at first glance, seems to be based around the whole schema comparison/data comparison approach.

Unfortunately, I distrust this approach intensely simply because it’s such a leaky abstraction. Nevertheless, people who have used DbGhost tend to wax lyrical about it, and some people even report that their DBAs like it. This means that either they’re missing something about it, or else I am.

Indeed, it seems that DbGhost does allow you to throw custom scripts into the mix somehow or other, for the cases that schema and data comparison just can’t handle.

So, rather than give any specific comments on DbGhost, or any other database lifecycle management solution, I shall propose a scenario that can be used for evaluation of any tool or approach to database lifecycle management.

This scenario is completely fictitious and not related to anything I’m working on, but it represents the kind of changes that you are likely to come across sooner or later in your application lifecycle. And in recent months I’ve had to perform several database refactorings much more complex than this.

Let’s say you have a database, with a Users table, which in Version 1.0 looks something like this:

Schema for v1.0

As you can see, it is not in first normal form, as you discover when one of your users phones up complaining that they can’t register six e-mail addresses. So in version 1.1, you extract the Email fields to a separate table:

Schema for v1.1

This can be done using a SQL query like this:

create table UserEmails(
    EmailID integer not null identity(1, 1) primary key clustered,
    UserID integer not null foreign key
        references Users(UserID)
        on update cascade on delete cascade,
    Email nvarchar(100)
)

insert into UserEmails(UserID, Email)
    select UserID, Email from Users
        where Email is not null and Email != ''

insert into UserEmails(UserID, Email)
    select UserID, Email2 from Users
        where Email2 is not null and Email2 != ''

insert into UserEmails(UserID, Email)
    select UserID, Email3 from Users
        where Email3 is not null and Email3 != ''

alter table Users
    drop column Email, column Email2, column Email3

Now one point about this refactoring is that it is impossible to complete it correctly using schema comparison and data comparison tools. This is true of any normalisation refactoring, or indeed, any refactoring where you are moving live, constantly changing data from one table to another. Another point is that these are fairly common scenarios. They are not some obscure academic concept only of interest to PhD students; it is inevitable that you’ll have to get your hands dirty and write some SQL at some stage in your application’s lifecycle if you want to hit the high notes with it. In fact, in my experience, approximately 20-30% of all database migrations that I write are beyond the scope of schema and data comparison.

So that brings you to version 1.1 of your product. Then you realise that your website-y fields are not only not in first normal form, they’re out of date. You still have a field for your users’ Pownce profiles! In case you’d forgotten, Pownce doesn’t exist any more. In fact, nobody used Pownce in the first place even though it had people like Robert Scoble fanboying it. And what about the other social networking sites that aren’t listed? Like, for instance, Flickr, Delicious, or Github? It seems that a bit more normalisation is in order:

Schema for v1.2

We’re now up to version 1.2 of our product. But it doesn’t stop there! In version 1.3, we do even more normalisation. Look at those IsAdministrator and IsStaff columns. We need to move them into a separate Roles table to give us more granular control over our website security:

Schema for v1.3

We now have three upgrades to our product, and in each upgrade, we have performed a database refactoring that to the best of my knowledge and understanding can not be handled correctly by schema comparison and data comparison tools alone. These changes need to be scripted by hand, there are no two ways about it.

So here are my questions for DbGhost, or for any competing product or process for managing your database changes:

  • Does it give you a means to upgrade this database to version 1.3 from any previous version, be it 1.0, or 1.1, or 1.2?
  • In one step?
  • Despite the fact that none of the migrations are possible using schema comparison or data comparison tools?
  • And is the process idiot-proof, intuitive and well documented?

This kind of thing is very straightforward with a migration-based approach, such as that offered by Ruby on Rails. I haven’t yet figured out how to do it with a comparison based tool such as DbGhost, but if it can handle it nonetheless, I will be most impressed.

30
Nov

Keep your passwords safe with KeePass

Website logins scare me. It’s frightening how many incompetent and/or lazy and/or irresponsible web developers there are out there who see nothing wrong with storing passwords in plain text in a database, and even worse, give attackers wiggle room to find them by peppering their code with SQL injection vulnerabilities.

Unfortunately, with so many different websites implementing their own login systems, inevitably you have to create dozens of different accounts. And to get round this, pretty much everyone re-uses their passwords all over the place.

The result of this is that if you register on, say, a Christian dating website that subsequently gets hacked, you run the risk of your Facebook account being compromised.

But it simply isn’t practical to have a different password for every site you register on.

Or is it?

Recently I decided to do something about it, so I downloaded and installed KeePass. It’s a Windows program that keeps all your passwords in a strongly encrypted database, allowing you to have different passwords for every site where you have an account, and make them as strong as the site will allow. It has an auto-type feature, where you can get it to enter your user name and password into a web input form for you, and there is a version that you can save on a USB key disk and run on any computer, even if you don’t have administrative rights on it.

image

With a tool such as this, you can make your passwords as strong as you like. I set the password generator to choose 25 character passwords containing any kind of character that it’ll give me: letters, numbers, punctuation marks, brackets, you name it. Passwords such as these would keep all the computers in the world guessing well into the Degenerate Era.

I’m now trying to remember all the websites where I’ve ever registered an account, so I can change my password on all of them. I’ve done all the high risk ones that I use regularly, such as my bank, my web hosting, Facebook and so on. Google has been jogging my memory on various other ones — some of which I had forgotten even existed.

28
Oct

A day of Stack Overflow

Half a dozen or so of us from work were at the London Stack Overflow Dev Days event with several hundred other developers today. I’ve been pretty impressed with the way Jeff Atwood and Joel Spolsky’s enterprise has turned out to be such a resounding success, and I’ve also been an avid reader of Jeff’s blog, Coding Horror, for several years now, so I was naturally delighted to get the opportunity to go.

Encountering Joel and Jeff in real life was an interesting experience, since I’ve only ever read their blogs and Twitter feeds up to now. Over the past year or so, I’ve had to get used to seeing certain people in real life that most people only ever see on TV or on the Internet, but it still seems a bit odd when you do. It certainly gives you a totally different impression of them from what you had before though. You can certainly see why Joel and Jeff in particular are both so successful in what they do: as well as being excellent online communicators, they are both brilliantly engaging and entertaining public speakers. So too was Jon Skeet, who gave a very funny talk about localisation entitled “Humanity: Epic Fail,” assisted by a sock puppet called Tony the Pony. Joel’s talk on FogBugz was a pretty hard sell, but it certainly looks impressive, boasting a feature set that makes Trac look like Notepad.

The other talks included an introduction to Python by Michael Sparks of the BBC, who explained to us Peter Norvig’s 21 line spelling corrector (it didn’t escape my attention that Jon Skeet spent the lunch break porting it to C#); introductions to mobile development for no less than three rival platforms (Google Android by Reto Meir, iPhone by Phil Nash, and Qt/Nokia by Pekka Kosonen); introductions to jQuery (Remy Sharp) and Yahoo! Developer Tools (Christian Heilmann); and an academic talk on “How not to design a scripting language” by Paul Biggar, who recommended the book “Engineering a Compiler,” by Cooper and Torczon as a superior alternative to the Dragon Book.

I spoke to Jeff during the afternoon break and asked him if he had any plans to publish the best of Coding Horror in a book. He said he’d thought about it a bit, but wasn’t entirely convinced it was worth doing. It’s something I’ve recently thought that he’d do well to do—a lot of his posts are ones I’d consider “must-reads” for every working developer, and if he did, I’d buy it in a shot. He wouldn’t be the first person to do something like that either—after all, Joel did it (twice), and so did Raymond Chen. It was interesting what he asked me when I told him I work for Parliament—he was most interested to know whether Britain is part of Europe or not. It’s a good question, that. Officially we are, but unofficially I sometimes think that as a country, we’re not entirely sure ourselves.

There were just two disappointments to the day. One was the catering. I was half expecting something along the lines of a buffet lunch—after all, I do tend to think of the Fog Creek Way as one where they go the extra mile to get these things perfect—but it turned out to be the kind of mass produced sandwiches that you get in a motorway service station that are all ridiculously overpriced, taste exactly the same as each other, and don’t meet with my approval anyway because they’re spread up with margarine. The other disappointment was the venue itself. Kensington town hall simply is not big enough for however many of us (800? 1000?) were there today. Consequently it felt very crowded and claustrophobic, and even a little bit uncomfortable, especially during the breaks when we all crowded into the foyer and had to form a queue stretching seemingly all the way to Barking and back to get to the food.

The day ended at about ten past six and I came away with a whole lot of freebies: a Qt rucksack, a copy of the Aardvark’d DVD, a handful of FogBugz pens, and a handful of Stack Overflow, Server Fault and Superuser stickers. All in all, it was a pretty full day (I had to get up half an hour earlier than usual and I got home an hour and a half later than usual, and sitting through seven hours of talks was pretty intense) but it was well worth it.

26
Oct

How to validate a URL in .NET

System.Uri.TryCreate.

You don’t need to use regular expressions.

More generally, if you are trying to do something extremely common, the chances are that whatever framework you’re using, there’s a method or function somewhere in there which will do it for you. And it will almost certainly do it much better than your home-brewed solution will.