Programming


04
May

How to match any character (including newlines) in a JavaScript regular expression

There is a little gotcha with JavaScript regular expressions. The . (dot) character, which supposedly matches any character, does not match newlines.

Now this is actually standard (if somewhat counter-intuitive) behaviour in regular expressions in most languages, but it can be changed, for example, by setting the RegexOptions.Singleline option in .NET, the /s modifier in Perl, or the PCRE_DOTALL option in PHP.

Unfortunately, there doesn’t seem to be a corresponding option in JavaScript.

However, there is a workaround. The \s character class matches any white space character (including carriage returns and line feeds), whereas the \S character class matches any non-whitespace character, i.e., anything not included in \s. So… if you want to match any character in JavaScript, including newlines, using [\s\S] instead of the dot should do the trick.

For example, to extract the contents of the <body> section of an HTML document:

/<body[^>]*?>([\s\S]*)<\\/body>/.exec(html)
30
Apr

Why I hate web.config

One thing that is vital when deploying web applications is that you should be able to reduce the process of deploying upgrades and changes to as few steps as possible. Furthermore, every step should be a no-brainer — so simple that the scope for fat fingering something is strictly limited.

This kind of thing is acceptable:

  1. Get the appropriate stable build from your daily build server.
  2. FTP it onto the web server into a directory in an appropriate location. (Even better: have an option in your build script to do this automatically.)
  3. Change the IIS settings to point to the new version.
  4. You’re done!

Now in order to do this effectively, you need to build some foundations into your project. You need to isolate every setting that varies between your production environment and your developer box and put them in a separate location outside the website’s hierarchy that does not change from build to build.

These settings are purely concerned with server-specific configuration settings. They change from one machine to the next and will be different between developer machines and the production server. Examples include connection strings, SMTP server details, custom errors and trace settings. They aren’t necessarily stored in your source control, except as a sample file for documentation purposes, and they should definitely not be deployed afresh to the server with every build.

There are other settings that are tied much more closely to the code itself. Examples include HTTP handlers and modules, assemblies referenced in the <compilation> section, and all the additional stuff that ASP.NET Ajax or ASP.NET 3.5 adds to tell it that you’re using the C# 3.0 compiler, not the C# 2.0 compiler. These settings may change from one build to the next, but they are the same on every machine where they are used. They are, to all intents and purposes, code, and should be treated as such, kept in your source control, and deployed unmodified to the server with every new build.

Unfortunately, web.config mixes the two willy-nilly in a thoroughly cavalier way, with the result that there are several additional, more complex and error-prone steps that you need to take:

  1. Locate the previous build.
  2. Copy the web.config file into the new build.
  3. Merge in the changes manually.

These steps are less straightforward and provide much more scope for error. What if you forget to do them, make a dog’s dinner of merging in the changes, or worse, introduce some subtle and mysterious bug that isn’t there on your development machine?

ASP.NET 2.0 added a new feature to sort this mess out. You can now specify an alternative file for your <appSettings> section. By doing <appSettings file="..\myappsettings.config" /> you can even specify a file outside your web application root. Whoopee! Problem solved!

Not so fast. What about the settings that don’t fit in to <appSettings>? For example, connection strings now go in the <connectionStrings> section; custom errors should be enabled on the server but disabled on your development box; tracing should be enabled on your development machine but not on the server; and so on.

It turns out that these too have an option to allow you to reference external files. You can set, say, <connectionStrings configSource="blah" /> to put your connection strings in a separate file. Unfortunately, unlike with <appSettings>, you can’t put this outside your application root.

Meh. Why not??? This is a major pain in the neck — especially for <connectionStrings>.

To make matters worse, there are some elements that straddle both camps. <compilation> is the most obvious example. It needs to have the attribute debug="true" on a development server, but in production you will need to insert debug="false" for improved performance. However, within your <compilation> element, you have a list of additional assembly references for things such as the ASP.NET Ajax extensions. And you can’t put these in a separate file.

All in all, configSource and <appSettings file="blah" /> go some of the way towards solving the deployment problem. Unfortunately, they still have limitations that are awkward and hobble the process and are a major annoyance.

23
Apr

The two golden rules of exception handling

The standard instruction that tends to get bandied about on when to throw exceptions is probably the most useless piece of non-advice that I have ever come across.

“You should only throw exceptions in conditions that are exceptional.”

Amanita muscaria. Don't eat this.What on earth is that supposed to mean? It’s like saying “You should only eat food that is edible.” And that doesn’t tell you, for instance, that Agaricus bisporus is edible but Amanita muscaria isn’t.

However, there are actually two very simple guidelines that tell you, in terms that are much easier to understand, exactly when to throw an exception and when to catch one. They are:

1. Throw an exception when your method can’t do what its name says it does.

Scott Hanselman blogged about this a while ago and it is probably the best piece of advice I’ve read on the subject:

If your method is called “Save” and it can’t Save, then throw. If it’s called DoSomething and it can’t DoSomething, throw. The idea is that the method name is a verb and a contract. It’s promising to do its best and if it can’t do it, it’s very likely exceptional.

It’s clear, unambiguous, to the point and easy to understand. Each method should do one thing, its name should say what that one thing is, and if it can’t do that one thing, that is when you throw an exception.

2. Don’t catch an exception unless you intend to do something about it.

Now the first rule is about throwing exceptions; it doesn’t say anything about catching them. However, Patrick Cauldwell, whom he links to, has this to say:

Never catch Exception

  • If it’s not a problem you know about, delegate up the call stack, because you don’t know what to do about it anyway

Far too often I see code that catches and silently discards all exceptions. This may give your program a superficial appearance of being more robust, but in actual fact, it is merely masking over a problem that may need to be addressed and dealt with.

Remember that an exception means that something has gone wrong. Unless you know exactly what to do about it to recover from the situation, you should let it propagate up to the top level of your program, where you should log it, and, if appropriate, fire off an e-mail to the developers and/or systems administrator to let them know about it.

There may be times when you need to catch Exception. Transactions may need to be rolled back, resources may need to be disposed, and so on. However, unless you can recover from the situation that caused the exception in the first place, you should re-throw it. And even if you do swallow exceptions for any reason (and you shouldn’t do so unless you have a good reason), you should never do so silently: the very least you should do is record a warning in the application’s event log.

14
Apr

If you think you don’t need source control, you haven’t understood it

I have a friend who does not use source control for his programming projects. As far as I can tell, it’s a conscious and deliberate decision on his part, and although he has his reasons, I’ve never got round to asking him what they are. However, I doubt if they are very good ones.

Source control is the very bedrock of modern software development, yet it’s surprising how many developers there are like him, who still don’t see the value of it. One common argument that crops up from time to time is “I don’t use source control on any of my projects because I’m the only person working on them.” This is really a rather lame excuse, because no-one making such an argument would say “I don’t back up my projects because I’m the only person working on them,” nor would they say “I don’t use the undo button because I’m the only person working on my projects.”

For that is exactly what source control does. It provides you with a complete history of the changes you’ve made to your project, when you made them, and (assuming you’ve filled in the comments with each commit), why. If you have ever spent three days working on your code, only to find that your changes aren’t working out or are becoming very messy, and you’ve needed to roll back, only you don’t have a suitable snapshot to roll back to, you will know exactly what I mean.

Sadly, there is a fairly widespread misconcepton knocking around that source control is only useful for large development teams. This isn’t helped by the fact that companies such as Microsoft promote it as such. Visual Studio only includes source control with the (more expensive) Visual Studio Team System, whose very name says to solo developers, “Nothing to see here, move along please.” And many articles on source control (even including Joel’s comments on the subject) concentrate more on the team aspects of source control than on what it can offer for solo developers.

Another widespread misconception is that setting up a source control system is too much effort, or too expensive, or requires a separate server, and is overkill for small projects. Again, this is completely false. Subversion, probably the most popular source control package in the world, is free and open source. TortoiseSVN is a Windows client for it that installs as a shell extension and gives you a whole lot of easy to use source control features from within Windows Explorer. You can even create a repository in any empty folder on your hard disk with only a couple of mouse clicks:

image

I’ve been using source control with Subversion/TortoiseSVN for three years now, and I am embarrassed that I didn’t get started much earlier. To be sure, there is a bit of a learning curve, and there are a few gotchas that you need to watch out for, but really, it isn’t rocket science by a long shot, and it certainly isn’t overkill even for fairly minor projects and scripts. If you are writing any kind of software and aren’t already using source control, I strongly recommend you get started. And if you think you don’t need it, I recommend you take another look at it, because you’ve almost certainly misunderstood something.

07
Mar

Copycat frameworks

(Update: it appears that I misjudged Grails here. The author of Grails has advised me that it is built on mature, tried and tested Java technologies such as Hibernate, Spring, and so on, and it seems that Groovy is not just another random programming language but an extension of Java itself to incorporate language features such as closures, dynamic typing and operator overloading. Unfortunately I won’t be able to attend the lecture myself, but it may well be worth checking out if you are a Sussex based Java developer. More details are in the comments.)

I was asked today if I’m interested in going to a lecture at Sussex University on a new web framework called Grails, which is written in a language called Groovy that runs on the Java platform.

Not really.

One glance tells me it’s Yet Another Rails Copycat. It seems that everyone and his dog are writing them these days, and most of them are completely unnecessary. If I really wanted to do something Rails-ish on the Java platform I’d use Rails with JRuby.

The fact of the matter is that while it’s worth knowing two or three different web frameworks, some of them are just too niche to bother with. Groovy is currently at number 32 in Tiobe’s Top Fifty, just above PL/I, Smalltalk and Haskell, with half the popularity of Fortran and a third of the popularity of Scheme. And nobody except Paul Graham writes web applications in Scheme.

03
Mar

Derailed

This evening, I decided to have another go at Ruby on Rails. I’ve dipped my toe in the water a couple of times but I’ve never managed to spend more than an evening or two on it so far, and I thought it was about time I learned it properly. So I get out my copy of Agile Web Development with Rails and start to work my way through it.

Only to find that putting scaffold :model in your controller — one of the biggest selling points of Rails, and one of the first things that all the tutorials teach you — no longer works in version 2.0. Turns out that it has been relegated to a plugin, which doesn’t work without a second plugin, which I couldn’t get to install.

The new scaffolding doesn’t do what I want it to either. I just wanted to scaffold a new controller and associated views that would work on an existing model. The new approach only allows you to create your scaffolding in tandem with a new model, which is not what I wanted.

This does not inspire confidence. If the upgrade to 2.0 breaks one of the very first things that Rails n00bs like me have to learn, I shudder to think what it does to more advanced functionality.

Can somebody please explain to me why Rails is supposed to be so cool?

16
Feb

Pro JavaScript Techniques

Since I started taking JavaScript seriously just over a year ago, I’ve found myself a bit disappointed with most of the online resources for it that are knocking around. The main ones seem to concentrate solely on the basics, and tend to be aimed at beginners — people who are happy to write code in a purely procedural manner and just want the basic information needed to get the job done, even if it does mean writing gratuitous amounts of copy and paste code.

Personally, I’ve felt a bit disappointed by this. I’ve said before that I think of JavaScript as the new Scheme — so with that in mind, anything that treats it as if it were merely client-side PHP will naturally be something of a disappointment. Perhaps this is a case of quidquid latine dictum sit, altum viditur on my part, but I like to use closures, lambdas, iterators, generics, Linq and so on in my code to maximum effect. I am also firmly of the opinion that every professional developer needs to be familiar with these concepts too — after all, they show that you have the kind of mind that can handle the complexities of software development, and won’t stumble over the FizzBuzz problem.

pro-jsSo when I came across Pro JavaScript Techniques by John Resig on Thursday, I thought it sounded like a breath of fresh air. Resig is something of a JavaScript guru: he is the lead developer of jQuery, and really knows his stuff, so I expected it to hit the mark in this respect. I promptly ordered it through Amazon.co.uk, and it arrived pleasingly promptly yesterday lunchtime.

The book certainly does not disappoint. Following a short introductory chapter, it gets right down to business with a chapter on the more advanced features of the language such as closures, currying, scoping rules, and how to make full use of JavaScript’s somewhat unorthodox, prototype-based approach to object oriented programming. This is followed by advice on how to write reusable code, unit tests for your scripts, and how to enforce good code conventions with tools such as JSLint. The rest of the book focuses on the practicalities of real world JavaScript as it works in the browser, with chapters on the DOM, events, CSS and forms, and Ajax, and ties it altogether with several practical examples including an image gallery, an Ajax search box, an Ajax wiki, and a Google Reader style "never ending" WordPress theme. He treats the subject in a fair amount of depth, about as thoroughly as you can in 350 pages, covering gotchas and issues with common browsers along the way, and points to resources on the web where you can find out further information.

This isn’t a book for complete JavaScript novices — it assumes a certain amount of familiarity with the language, and is written more from a perspective of adopting professional best practices and producing high quality code rather than from simply getting you up and running quickly, so at least some experience of JavaScript is necessary. However, it is not a difficult read, and a competent developer with at least some basic JavaScript experience should find it fairly accessible.

The only downside to it is that as it is now two years old, there are already one or two omissions that date it somewhat: it does not cover Internet Explorer 7 or Safari for Windows, for instance, and it still recommends testing your code against Internet Explorer 5.5, which has since pretty much fallen off the radar. However, all the content is still applicable today, and no doubt will remain so for quite some time to come. Personally, I would recommend it to any professional web developer who wants to improve their JavaScript skills. (And if you are a professional web developer, you jolly well should be improving your JavaScript skills.)

14
Jan

Missing ASP.NET tab in IIS on Windows Server 2003

One thing that’s been at the back of my mind since I upgraded to Windows Server 2003 is that the ASP.NET tab in IIS had taken a walk. Even running aspnet_regiis -i did nothing to solve the problem. Up till now I had no cause to fix it, but today I had to troubleshoot one of our apps that is (still!) using ASP.NET 1.1. This meant I needed to configure it to run version 1.1 in order to debug it using Visual Studio 2003. Might as well get to the bottom of this while I’m at it.

It turns out the problem is something to do with VMWare Server, which I also have running on the same machine, running a couple of instances of Ubuntu for my PHP work. For some reason this conflicts with the ASP.NET tab on Windows Server 2003. Fortunately, a quick Google search led me to this post, which has a fix described in the comments.

So, if you have the same problem:

  1. Stop IIS using iisreset /stop
  2. Open the file C:\WINDOWS\system32\inetsrv\MetaBase.xml in Notepad.
  3. Find and delete the line that says Enable32BitAppOnWin64="TRUE"
  4. Restart IIS using iisreset /start
  5. If you still don’t see your ASP.NET tab, aspnet_regiis -i should now work.
11
Dec

Volta, GWT and leaky abstractions

There’s been quite a bit of hype recently about Volta, the latest and greatest offering from Microsoft. It’s a bit like the Google Web Toolkit or RJS in Ruby on Rails, in that it allows you to write everything in C# and have it translated into JavaScript. You don’t even have to use C#—you could just as easily use VB, since it works on the compiled MSIL, converting that into JavaScript. It allows you to split your application at the lower tiers as well, automatically generating web services so that you can put, say, the user authentication part of your application on a different server to the main site.

It sounds like a good idea in theory, and no doubt it will attract quite a bit of attention from developers who do not want to have to learn yet another programming language. The main attraction of this kind of framework is for developers who are frightened off JavaScript by all the cross-browser insanities and the useless, bizarre and often totally misleading messages that Internet Explorer throws up when it encounters a JavaScript error. The old “Syntax error in line 0″ syndrome. There is also the issue of testing on multiple browsers on multiple operating systems. But hey, now we can write JavaScript without writing JavaScript!

But is it really necessary?

About a year ago, I would have given an emphatic “yes” in answer to that question. However, a lot has happened in the JavaScript world in the past year and a half. We now have free virtualisation software and Intel Macs, so you can run several different operating systems—Windows XP, Windows Vista, Windows Server 2003, Linux and Mac OS X—on the same machine if you are that way inclined, making cross browser testing a whole lot easier. Firebug turns Firefox from a humble browser into a powerful debugging tool. JavaScript frameworks such as Prototype, Scriptaculous, jQuery and Dojo abstract away all the nasty cross-browser stuff, allowing you to discover just how nice a language JavaScript really is. And on top of that, they give you transitions, drag and drop, thickboxes, fade anything techniques, and a whole lot of other eye candy and cool stuff as a bonus.

I’m also rather sceptical of the whole write-language-A-in-language-B business.

Admittedly, I’ve never actually tried the Google Web Toolkit or RJS, but my guess is that while there’s undoubtedly a lot that you can do with them, I doubt if they’re the most efficient. Writing JavaScript in C# or Java or Ruby will inevitably involve a layer of abstraction, and all abstractions are, to a greater or lesser extent, leaky.

Now before you shout me down on this one, yes, I know that Prototype and jQuery are abstraction layers, and therefore may well have plenty leaks of their own. However, the point that I am making is that the process of converting between languages adds a whole further abstraction layer in addition. Making the whole thing even leakier.

At present, Volta has a lot of leaks. Late binding is not supported, for instance, which means that languages such as IronPython, IronRuby or PHP (via Phalanger) are effectively ruled out, and Visual Basic requires you to use Option Strict. This seems a bit surreal given that JavaScript itself is also a dynamically typed language, but it is a consequence of the fact that it all goes through the statically typed MSIL intermediary, and support for reflection (which is needed to simulate dynamic typing on a statically typed platform) is severely limited.

I am also sceptical of the benefit of being able to move parts of your application between the client and the server. Far from making things simpler, this could introduce a whole new can of worms if it is not carefully thought out, partly in terms of performance, particularly if you end up with a very chatty interface between them, but much more seriously, in terms of security. Maintaining state across multiple tiers is also very difficult if not impossible to abstract completely transparently, and it will be interesting to see how they tackle this problem.

However, it is probably a little unfair to knock it too much at this stage. Volta is only a technology preview and pretty experimental, so obviously some of these leaks will be patched as it matures. On the other hand, undoubtedly other leaks will remain and may even prove impossible to patch—in particular, performance will never be the same as with pure vanilla JavaScript, and download sizes will still be greater. So by all means check it out if you like, but as far as I’m concerned, if a task calls for JavaScript, JavaScript is what I intend to use.

15
Nov

Password Reminders Considered Harmful

How does your website handle users who have forgotten their password?

Chances are, you ask for their e-mail address, look them up, extract their password from the database, and e-mail it to them. Nice and simple, and convenient for the end user, and easy to program.

Unfortunately, it is seriously and dangerously flawed.

Almost everyone re-uses login details across multiple web sites. It simply is not realistic to expect them to do otherwise. As a result, if an attacker manages to compromise your user database, they will be able to impersonate your users on potentially thousands of websites, including some that store their credit card details.

Never think you are immune to this. It happened to Reddit, a popular user-generated news site similar to Digg, and it can happen to you. It is very difficult to be 100% sure that your database will never fall into the wrong hands: unless you have enterprise-level security staff, infrastructure, procedures and budget, every single person involved with your data will be a weak link in the chain, from the developers to the DBAs to the dodgy geezer who comes in as a contractor to do the building’s networking. Do you know where all the copies of your data are — even the partial, out of date ones that your developers use for testing? Are you sure there aren’t any hanging around on backup CDs, USB key disks, laptops, or old PCs that you are throwing out?

No, you should never store your users’ passwords directly in a database. Instead, you must use a salted hash: a one-way encryption algorithm which makes it impossible — or at the very least, computationally very expensive and impractical — to reverse engineer them into the original password.

Unfortunately, this means that you can’t send password reminders to your users. Instead, you have to send them a single-use link to a page where they can reset their passwords on confirmation of their e-mail address. Because of this, some people prefer to sacrifice security in favour of convenience here. In fact, if the comments that were left on Jeff Atwood’s blog when he wrote about this subject are anything to go by, sometimes this design decision is imposed on developers, against their recommendations, by their managers.

I think that Mats Helander comes up with the best response to this, when he says that it should be illegal to store passwords in a database in plain text:

Many comments on Jeff [Atwood]’s blog lamented the fact that sometimes your boss will decide for you that passwords should be stored in plaintext (or two-way encrypted using a secret key, which the hacker will of course be able to obtain as readily as your password list, meaning it’s as good as plaintext). One often suggested reason would be a requirement that the system must be able to mail back a user’s forgotten password.

In my opinion, this is one of the very rare cases where I think the law should get involved, protecting the developer from having to compromise my security in order to keep his job. The developer should be able to say “No boss, that would be against the law”.

I couldn’t agree more. Really, the extra complexity introduced by the “reset password” option is very minor, and given the potential consequences of losing your data to an attacker, seriously compromising my security in favour of convenience in this way is inexcusably reckless, especially in a day and age when identity theft is a serious and growing problem.