You know an advert is intensely annoying when you start whistling the tune from it even though you hate it. #gocompare 3 days ago
26
Mar

On web deployment

Scott Hanselman says that if you’re using XCopy for deploying web applications, you’re doing it wrong. He is talking, of course, about the web deployment features of Visual Studio 2010, which constitute Microsoft’s attempts to solve a problem that is a lot less trivial than it looks.

It’s a bit of a strong statement, and I’m not sure that I agree with it. For the past four years or so, I’ve used a variant of XCopy deployment that I’ve found to be very effective. I put each release of the website into a separate folder, numbered after the version reported by Subversion, Mercurial, CruiseControl or TeamCity, depending on which of the above I’m using, and I just switch the directory in IIS, or on Linux it’s just a case of changing a symlink. This all but eliminates downtime for the vast majority of upgrades, as well as allowing you to roll back in seconds to any previous version that you still have available if things go pear shaped.

I’d like to see how Web Deploy handles upgrades like this. My experience of software upgrades is that they are rarely that seamless and usually involve several seconds of downtime, though having said that, if your website is so busy that half a minute of downtime is a serious problem, the chances are that you have failover servers that you can bring in while you upgrade.

A more serious issue, however, is rollback. Some of the sites I work on are pretty high profile, and the ability to roll back in seconds if things don’t work out is a deal breaker as far as I’m concerned. That’s why I’ve found the XCopy/IIS settings switchover approach to be such a winner.

I am not impressed with the approach that Visual Studio 2010 adopts to managing web.config files, however. This approach sets your connection strings etc at build time, which can be pretty painful since you have to have different builds for development, integration, test, production and so on, and once you start branching and merging, and have to have separate connection strings for separate branches, it can completely blow up in your face if you’re not careful. No, configuration is a deployment time operation and needs to be treated as such. The best place for your configuration settings is outside your application root, in a common location easily accessible to every version of your site.

Finally, one last tip. Never deploy on a Friday. There are two reasons for this: first, it’s the end of the week, you’re tired, you just want to go home, and you’re much more likely to make a mistake than on a Monday when you’re fresh. Second, if something does go wrong, it will really, really, really ruin your weekend.

16
Dec

Can your database versioning tool do this?

I’ve been evaluating DbGhost recently. It’s one of those database lifecycle management tools that, at first glance, seems to be based around the whole schema comparison/data comparison approach.

Unfortunately, I distrust this approach intensely simply because it’s such a leaky abstraction. Nevertheless, people who have used DbGhost tend to wax lyrical about it, and some people even report that their DBAs like it. This means that either they’re missing something about it, or else I am.

Indeed, it seems that DbGhost does allow you to throw custom scripts into the mix somehow or other, for the cases that schema and data comparison just can’t handle.

So, rather than give any specific comments on DbGhost, or any other database lifecycle management solution, I shall propose a scenario that can be used for evaluation of any tool or approach to database lifecycle management.

This scenario is completely fictitious and not related to anything I’m working on, but it represents the kind of changes that you are likely to come across sooner or later in your application lifecycle. And in recent months I’ve had to perform several database refactorings much more complex than this.

Let’s say you have a database, with a Users table, which in Version 1.0 looks something like this:

Schema for v1.0

As you can see, it is not in first normal form, as you discover when one of your users phones up complaining that they can’t register six e-mail addresses. So in version 1.1, you extract the Email fields to a separate table:

Schema for v1.1

This can be done using a SQL query like this:

create table UserEmails(
    EmailID integer not null identity(1, 1) primary key clustered,
    UserID integer not null foreign key
        references Users(UserID)
        on update cascade on delete cascade,
    Email nvarchar(100)
)

insert into UserEmails(UserID, Email)
    select UserID, Email from Users
        where Email is not null and Email != ''

insert into UserEmails(UserID, Email)
    select UserID, Email2 from Users
        where Email2 is not null and Email2 != ''

insert into UserEmails(UserID, Email)
    select UserID, Email3 from Users
        where Email3 is not null and Email3 != ''

alter table Users
    drop column Email, column Email2, column Email3

Now one point about this refactoring is that it is impossible to complete it correctly using schema comparison and data comparison tools. This is true of any normalisation refactoring, or indeed, any refactoring where you are moving live, constantly changing data from one table to another. Another point is that these are fairly common scenarios. They are not some obscure academic concept only of interest to PhD students; it is inevitable that you’ll have to get your hands dirty and write some SQL at some stage in your application’s lifecycle if you want to hit the high notes with it. In fact, in my experience, approximately 20-30% of all database migrations that I write are beyond the scope of schema and data comparison.

So that brings you to version 1.1 of your product. Then you realise that your website-y fields are not only not in first normal form, they’re out of date. You still have a field for your users’ Pownce profiles! In case you’d forgotten, Pownce doesn’t exist any more. In fact, nobody used Pownce in the first place even though it had people like Robert Scoble fanboying it. And what about the other social networking sites that aren’t listed? Like, for instance, Flickr, Delicious, or Github? It seems that a bit more normalisation is in order:

Schema for v1.2

We’re now up to version 1.2 of our product. But it doesn’t stop there! In version 1.3, we do even more normalisation. Look at those IsAdministrator and IsStaff columns. We need to move them into a separate Roles table to give us more granular control over our website security:

Schema for v1.3

We now have three upgrades to our product, and in each upgrade, we have performed a database refactoring that to the best of my knowledge and understanding can not be handled correctly by schema comparison and data comparison tools alone. These changes need to be scripted by hand, there are no two ways about it.

So here are my questions for DbGhost, or for any competing product or process for managing your database changes:

  • Does it give you a means to upgrade this database to version 1.3 from any previous version, be it 1.0, or 1.1, or 1.2?
  • In one step?
  • Despite the fact that none of the migrations are possible using schema comparison or data comparison tools?
  • And is the process idiot-proof, intuitive and well documented?

This kind of thing is very straightforward with a migration-based approach, such as that offered by Ruby on Rails. I haven’t yet figured out how to do it with a comparison based tool such as DbGhost, but if it can handle it nonetheless, I will be most impressed.

29
Jun

Why SQL Server 2005 database projects in VSTS are a bad idea

I’ve been working a bit lately with a project that uses SQL Server 2005 database projects in Visual Studio 2008 Team System. These are different from the conventional database projects that you get in Visual Studio Professional, since they have extra features that allow you to do schema and data comparisons, and, in theory at least, manage database deployments and migrations.

The idea is that you should be able to design your database using visual designers rather than having to write all that nasty SQL code to script it for you. Visual designers make things so much easier at the planning and initial design stage, and once you are done, you can use the various schema comparison and script generation tools to generate your production database.

The problem comes when you want to manage your database’s entire lifecycle. I’m sure that many developers will have scratched their heads at some stage about this problem. You chop and change your database on your development server, most likely using the visual tools—but how do you reliably replicate these changes on your live server?

The Microsoft approach here is to rely on the schema comparison tools to generate change scripts that you can then run against your database. Some people think it’s a silver bullet. I beg to differ.

The first problem is that while schema comparisons can make a good starting point, the scripts they generate don’t always work properly out of the box, if at all. Some database refactorings simply can’t be done using schema comparison tools. Examples include normalisation refactorings such as moving data from one table or column to another; introducing constraints or changing a column’s data type when you need to do some data cleanup first; or modifying reference data. Even relatively straightforward refactorings—or even, in some cases, no refactoring at all—can be problematic: if your production and development databases get their collation orders out of sync, for instance, the script may refuse to run at all. And the thought that anyone would blindly use this option on the project properties page makes me shiver:

Perform "smart" column name matching when you add or rename a column

In other words, you’re asking it to guess what’s changed.

Testability—and when you’re dealing with an abstraction as leaky as this one, testability is vital—is another issue. Unfortunately, SQL Server 2005 database projects have serious shortcomings in this area too. They do offer unit testing features, but these only apply to the final database. There doesn’t seem to be any way of integration testing your migrations themselves: you don’t have a consistent record of what’s changed in a format that can easily be applied to a blank or reference database, so there’s no way of verifying that you’re getting the expected results when you’re going from “before” to “after.”

Then there are the change scripts generated by schema compare itself. They are a morass of long winded, convoluted, hard to maintain, spaghetti code. Adding a column to a database table involves dropping and re-creating the table: this is understandable if you need to put the column in the middle of the roster, since SQL Server does not have an AFTER clause in the ALTER TABLE statement like MySQL does, but even if you add a column on to the end, it still drops and re-creates the table. A task that needs only one or two lines of code ends up taking eighty. If you rely on schema comparison tools, sooner or later you are going to need to edit your scripts, and when that happens you’ll find that you’d have been quicker just writing the change that you needed by hand in the first place.

All in all, this seems far too leaky an abstraction to give me any confidence in it to manage a database lifecycle. There is simply no substitute for scripting every database migration, checking it into source control, and having your unit tests run them all on a blank, or reference, database, and having some record in your production database of which scripts have been run and which haven’t. And while schema comparison tools may be a life saver if you lose track of things for any reason, they are a very poor alternative to generating your migration scripts by hand.

02
Apr

Continuous Deployment: radical idea or April fool?

No doubt many of my readers will be aware of the concept of Continuous Integration. It takes daily builds one step further by building your project every time you check in some code, running all the unit tests, creating your packages, and generating some reports. It means that if you break the build, you get fairly quick feedback on the fact. Continuous Deployment takes this even further still, by automatically deploying every successful build to your live production environment.

Now perhaps it’s just coincidence, but I first came across the concept of Continuous Deployment yesterday morning when I saw this headline from Simon Willison pop up in Google Reader — on April Fool’s Day of all days. However, clicking through to the article he linked to led me to a couple of blog entries written back in February, and a Google blog search turned up several dated hits indicating that this isn’t a joke concept that was churned out just for the day, like the old classic about Swiss spaghetti harvesters. Believe it or not, apparently some people are actually running automated deployments to their live production environment of every last check-in.

But is it a good idea?

Well let’s just say it has that kind of radical, crazy startup feel to it. It’s the kind of thing that you do to prove to lesser developers that you’re in a completely different league to the rest of us, and then everybody else tries to copy you and falls flat on their face because it doesn’t work in their situation, or they’ve completely misunderstood you, or they haven’t done it properly in the first place. Either that or else as link bait — a bit like Joel writing his own compiler, or Paul using Lisp for everything.

I guess you could get it to work, if you have the right kind of team, the right kind of processes and the right kind of people working on it. But in effect, you are doing away with your QA and change control processes altogether, and relying entirely on your unit tests as a substitute. I don’t think much of this idea at all myself. Unit tests have their limitations: they can only ever cover a tiny fraction of the possible use cases for your code; making a change to one part of your system often introduces a bug in another part of the system that your unit tests hadn’t foreseen; there are things that it is very difficult to test properly through unit tests, such as user interfaces or external services; and your unit tests themselves may have bugs and be testing the wrong thing anyway. You just have to have them backed up by some manual ad-hoc testing to find more possible ways that issues could crop up.

You could of course — and indeed probably should — deploy continuously to a SAT or UAT environment: that way, your internal testers can evaluate changes fairly quickly and you can adopt a truly agile approach. However, deploying continuously to your live production environment sounds pretty reckless to me. I’m still tempted to call April Fool on this one.

30
Apr

Why I hate web.config

One thing that is vital when deploying web applications is that you should be able to reduce the process of deploying upgrades and changes to as few steps as possible. Furthermore, every step should be a no-brainer — so simple that the scope for fat fingering something is strictly limited.

This kind of thing is acceptable:

  1. Get the appropriate stable build from your daily build server.
  2. FTP it onto the web server into a directory in an appropriate location. (Even better: have an option in your build script to do this automatically.)
  3. Change the IIS settings to point to the new version.
  4. You’re done!

Now in order to do this effectively, you need to build some foundations into your project. You need to isolate every setting that varies between your production environment and your developer box and put them in a separate location outside the website’s hierarchy that does not change from build to build.

These settings are purely concerned with server-specific configuration settings. They change from one machine to the next and will be different between developer machines and the production server. Examples include connection strings, SMTP server details, custom errors and trace settings. They aren’t necessarily stored in your source control, except as a sample file for documentation purposes, and they should definitely not be deployed afresh to the server with every build.

There are other settings that are tied much more closely to the code itself. Examples include HTTP handlers and modules, assemblies referenced in the <compilation> section, and all the additional stuff that ASP.NET Ajax or ASP.NET 3.5 adds to tell it that you’re using the C# 3.0 compiler, not the C# 2.0 compiler. These settings may change from one build to the next, but they are the same on every machine where they are used. They are, to all intents and purposes, code, and should be treated as such, kept in your source control, and deployed unmodified to the server with every new build.

Unfortunately, web.config mixes the two willy-nilly in a thoroughly cavalier way, with the result that there are several additional, more complex and error-prone steps that you need to take:

  1. Locate the previous build.
  2. Copy the web.config file into the new build.
  3. Merge in the changes manually.

These steps are less straightforward and provide much more scope for error. What if you forget to do them, make a dog’s dinner of merging in the changes, or worse, introduce some subtle and mysterious bug that isn’t there on your development machine?

ASP.NET 2.0 added a new feature to sort this mess out. You can now specify an alternative file for your <appSettings> section. By doing <appSettings file="..\myappsettings.config" /> you can even specify a file outside your web application root. Whoopee! Problem solved!

Not so fast. What about the settings that don’t fit in to <appSettings>? For example, connection strings now go in the <connectionStrings> section; custom errors should be enabled on the server but disabled on your development box; tracing should be enabled on your development machine but not on the server; and so on.

It turns out that these too have an option to allow you to reference external files. You can set, say, <connectionStrings configSource="blah" /> to put your connection strings in a separate file. Unfortunately, unlike with <appSettings>, you can’t put this outside your application root.

Meh. Why not??? This is a major pain in the neck — especially for <connectionStrings>.

To make matters worse, there are some elements that straddle both camps. <compilation> is the most obvious example. It needs to have the attribute debug="true" on a development server, but in production you will need to insert debug="false" for improved performance. However, within your <compilation> element, you have a list of additional assembly references for things such as the ASP.NET Ajax extensions. And you can’t put these in a separate file.

All in all, configSource and <appSettings file="blah" /> go some of the way towards solving the deployment problem. Unfortunately, they still have limitations that are awkward and hobble the process and are a major annoyance.

24
Apr

Making website upgrades seamless: script the changes

One of my tasks this week has been to roll out some changes to a fairly busy web application. Because it is pretty heavily used, when we make changes we need to get it right and minimise downtime.

How do you do this? Script the upgrade process. Ideally, you should be able to simply copy a working build to the production server, hit a button (or run a script) to switch from the old version to the new one, and you’re done.

This particular web application’s upgrade process is very simple. You get the code files from the build that you want to deploy and copy them onto the web server — into a new directory for each version, e.g. “html-3.14″. You change a symbolic link at the web application’s root that points to the version you want to use. You’re done. Rolling back — should things go pear-shaped — is simply a case of changing the symlink back again.

To get this working effectively, I had to streamline the application itself. Some files, such as site configuration, may vary from one server to the next, so I had to separate these out. I partitioned the application into three directories: the code, the configuration files, and dynamic data, such as logs and uploaded files.

Only the code directory changes between versions: its contents are exactly the same whether they are on the developer’s machine, the staging server or the live server, and it has no special requirements in terms of access rights or contents. Files that need to vary between different servers (such as configuration files) are kept in a separate directory and are not changed during a normal upgrade process. This makes the upgrade script very simple.

This explanation is probably a little bit simplistic, although it can easily be extended to take account of extra requirements such as changes to the database schema or addition of new configuration options. Some upgrades will also have much more complex requirements, but the general principle is the same. Script the upgrade process so that it runs in a single step. It’s the same principle as the one Joel Spolsky makes when he asks in The Joel Test, “Can you make a build in one step?” If you have to edit several files and jump through several hoops, the risk increases that you will make a mistake somewhere along the line. Being able to do it all in a single step is much more robust, and mostly seamless from the end user’s perspective, because if all goes well, they will notice no downtime, but only that there are some new features and some irritating bugs have been fixed.