james mckay dot net
because there are few things that are less logical than business logic

Can your database versioning tool do this?

I’ve been evaluating DbGhost recently. It’s one of those database lifecycle management tools that, at first glance, seems to be based around the whole schema comparison/data comparison approach.

Unfortunately, I distrust this approach intensely simply because it’s such a leaky abstraction. Nevertheless, people who have used DbGhost tend to wax lyrical about it, and some people even report that their DBAs like it. This means that either they’re missing something about it, or else I am.

Indeed, it seems that DbGhost does allow you to throw custom scripts into the mix somehow or other, for the cases that schema and data comparison just can’t handle.

So, rather than give any specific comments on DbGhost, or any other database lifecycle management solution, I shall propose a scenario that can be used for evaluation of any tool or approach to database lifecycle management.

This scenario is completely fictitious and not related to anything I’m working on, but it represents the kind of changes that you are likely to come across sooner or later in your application lifecycle. And in recent months I’ve had to perform several database refactorings much more complex than this.

Let’s say you have a database, with a Users table, which in Version 1.0 looks something like this:

Schema for v1.0

As you can see, it is not in first normal form, as you discover when one of your users phones up complaining that they can’t register six e-mail addresses. So in version 1.1, you extract the Email fields to a separate table:

Schema for v1.1

This can be done using a SQL query like this:

create table UserEmails(
    EmailID integer not null identity(1, 1) primary key clustered,
    UserID integer not null foreign key
        references Users(UserID)
        on update cascade on delete cascade,
    Email nvarchar(100)
)

insert into UserEmails(UserID, Email)
    select UserID, Email from Users
        where Email is not null and Email != ''

insert into UserEmails(UserID, Email)
    select UserID, Email2 from Users
        where Email2 is not null and Email2 != ''

insert into UserEmails(UserID, Email)
    select UserID, Email3 from Users
        where Email3 is not null and Email3 != ''

alter table Users
    drop column Email, column Email2, column Email3

Now one point about this refactoring is that it is impossible to complete it correctly using schema comparison and data comparison tools. This is true of any normalisation refactoring, or indeed, any refactoring where you are moving live, constantly changing data from one table to another. Another point is that these are fairly common scenarios. They are not some obscure academic concept only of interest to PhD students; it is inevitable that you’ll have to get your hands dirty and write some SQL at some stage in your application’s lifecycle if you want to hit the high notes with it. In fact, in my experience, approximately 20-30% of all database migrations that I write are beyond the scope of schema and data comparison.

So that brings you to version 1.1 of your product. Then you realise that your website-y fields are not only not in first normal form, they’re out of date. You still have a field for your users’ Pownce profiles! In case you’d forgotten, Pownce doesn’t exist any more. In fact, nobody used Pownce in the first place even though it had people like Robert Scoble fanboying it. And what about the other social networking sites that aren’t listed? Like, for instance, Flickr, Delicious, or Github? It seems that a bit more normalisation is in order:

Schema for v1.2

We’re now up to version 1.2 of our product. But it doesn’t stop there! In version 1.3, we do even more normalisation. Look at those IsAdministrator and IsStaff columns. We need to move them into a separate Roles table to give us more granular control over our website security:

Schema for v1.3

We now have three upgrades to our product, and in each upgrade, we have performed a database refactoring that to the best of my knowledge and understanding can not be handled correctly by schema comparison and data comparison tools alone. These changes need to be scripted by hand, there are no two ways about it.

So here are my questions for DbGhost, or for any competing product or process for managing your database changes:

  • Does it give you a means to upgrade this database to version 1.3 from any previous version, be it 1.0, or 1.1, or 1.2?
  • In one step?
  • Despite the fact that none of the migrations are possible using schema comparison or data comparison tools?
  • And is the process idiot-proof, intuitive and well documented?

This kind of thing is very straightforward with a migration-based approach, such as that offered by Ruby on Rails. I haven’t yet figured out how to do it with a comparison based tool such as DbGhost, but if it can handle it nonetheless, I will be most impressed.