@ayende You ought to try Mercurial. in reply to ayende 1 week ago
16
Dec

Can your database versioning tool do this?

I’ve been evaluating DbGhost recently. It’s one of those database lifecycle management tools that, at first glance, seems to be based around the whole schema comparison/data comparison approach.

Unfortunately, I distrust this approach intensely simply because it’s such a leaky abstraction. Nevertheless, people who have used DbGhost tend to wax lyrical about it, and some people even report that their DBAs like it. This means that either they’re missing something about it, or else I am.

Indeed, it seems that DbGhost does allow you to throw custom scripts into the mix somehow or other, for the cases that schema and data comparison just can’t handle.

So, rather than give any specific comments on DbGhost, or any other database lifecycle management solution, I shall propose a scenario that can be used for evaluation of any tool or approach to database lifecycle management.

This scenario is completely fictitious and not related to anything I’m working on, but it represents the kind of changes that you are likely to come across sooner or later in your application lifecycle. And in recent months I’ve had to perform several database refactorings much more complex than this.

Let’s say you have a database, with a Users table, which in Version 1.0 looks something like this:

Schema for v1.0

As you can see, it is not in first normal form, as you discover when one of your users phones up complaining that they can’t register six e-mail addresses. So in version 1.1, you extract the Email fields to a separate table:

Schema for v1.1

This can be done using a SQL query like this:

create table UserEmails(
    EmailID integer not null identity(1, 1) primary key clustered,
    UserID integer not null foreign key
        references Users(UserID)
        on update cascade on delete cascade,
    Email nvarchar(100)
)

insert into UserEmails(UserID, Email)
    select UserID, Email from Users
        where Email is not null and Email != ''

insert into UserEmails(UserID, Email)
    select UserID, Email2 from Users
        where Email2 is not null and Email2 != ''

insert into UserEmails(UserID, Email)
    select UserID, Email3 from Users
        where Email3 is not null and Email3 != ''

alter table Users
    drop column Email, column Email2, column Email3

Now one point about this refactoring is that it is impossible to complete it correctly using schema comparison and data comparison tools. This is true of any normalisation refactoring, or indeed, any refactoring where you are moving live, constantly changing data from one table to another. Another point is that these are fairly common scenarios. They are not some obscure academic concept only of interest to PhD students; it is inevitable that you’ll have to get your hands dirty and write some SQL at some stage in your application’s lifecycle if you want to hit the high notes with it. In fact, in my experience, approximately 20-30% of all database migrations that I write are beyond the scope of schema and data comparison.

So that brings you to version 1.1 of your product. Then you realise that your website-y fields are not only not in first normal form, they’re out of date. You still have a field for your users’ Pownce profiles! In case you’d forgotten, Pownce doesn’t exist any more. In fact, nobody used Pownce in the first place even though it had people like Robert Scoble fanboying it. And what about the other social networking sites that aren’t listed? Like, for instance, Flickr, Delicious, or Github? It seems that a bit more normalisation is in order:

Schema for v1.2

We’re now up to version 1.2 of our product. But it doesn’t stop there! In version 1.3, we do even more normalisation. Look at those IsAdministrator and IsStaff columns. We need to move them into a separate Roles table to give us more granular control over our website security:

Schema for v1.3

We now have three upgrades to our product, and in each upgrade, we have performed a database refactoring that to the best of my knowledge and understanding can not be handled correctly by schema comparison and data comparison tools alone. These changes need to be scripted by hand, there are no two ways about it.

So here are my questions for DbGhost, or for any competing product or process for managing your database changes:

  • Does it give you a means to upgrade this database to version 1.3 from any previous version, be it 1.0, or 1.1, or 1.2?
  • In one step?
  • Despite the fact that none of the migrations are possible using schema comparison or data comparison tools?
  • And is the process idiot-proof, intuitive and well documented?

This kind of thing is very straightforward with a migration-based approach, such as that offered by Ruby on Rails. I haven’t yet figured out how to do it with a comparison based tool such as DbGhost, but if it can handle it nonetheless, I will be most impressed.

29
Jun

Why SQL Server 2005 database projects in VSTS are a bad idea

I’ve been working a bit lately with a project that uses SQL Server 2005 database projects in Visual Studio 2008 Team System. These are different from the conventional database projects that you get in Visual Studio Professional, since they have extra features that allow you to do schema and data comparisons, and, in theory at least, manage database deployments and migrations.

The idea is that you should be able to design your database using visual designers rather than having to write all that nasty SQL code to script it for you. Visual designers make things so much easier at the planning and initial design stage, and once you are done, you can use the various schema comparison and script generation tools to generate your production database.

The problem comes when you want to manage your database’s entire lifecycle. I’m sure that many developers will have scratched their heads at some stage about this problem. You chop and change your database on your development server, most likely using the visual tools—but how do you reliably replicate these changes on your live server?

The Microsoft approach here is to rely on the schema comparison tools to generate change scripts that you can then run against your database. Some people think it’s a silver bullet. I beg to differ.

The first problem is that while schema comparisons can make a good starting point, the scripts they generate don’t always work properly out of the box, if at all. Some database refactorings simply can’t be done using schema comparison tools. Examples include normalisation refactorings such as moving data from one table or column to another; introducing constraints or changing a column’s data type when you need to do some data cleanup first; or modifying reference data. Even relatively straightforward refactorings—or even, in some cases, no refactoring at all—can be problematic: if your production and development databases get their collation orders out of sync, for instance, the script may refuse to run at all. And the thought that anyone would blindly use this option on the project properties page makes me shiver:

Perform "smart" column name matching when you add or rename a column

In other words, you’re asking it to guess what’s changed.

Testability—and when you’re dealing with an abstraction as leaky as this one, testability is vital—is another issue. Unfortunately, SQL Server 2005 database projects have serious shortcomings in this area too. They do offer unit testing features, but these only apply to the final database. There doesn’t seem to be any way of integration testing your migrations themselves: you don’t have a consistent record of what’s changed in a format that can easily be applied to a blank or reference database, so there’s no way of verifying that you’re getting the expected results when you’re going from “before” to “after.”

Then there are the change scripts generated by schema compare itself. They are a morass of long winded, convoluted, hard to maintain, spaghetti code. Adding a column to a database table involves dropping and re-creating the table: this is understandable if you need to put the column in the middle of the roster, since SQL Server does not have an AFTER clause in the ALTER TABLE statement like MySQL does, but even if you add a column on to the end, it still drops and re-creates the table. A task that needs only one or two lines of code ends up taking eighty. If you rely on schema comparison tools, sooner or later you are going to need to edit your scripts, and when that happens you’ll find that you’d have been quicker just writing the change that you needed by hand in the first place.

All in all, this seems far too leaky an abstraction to give me any confidence in it to manage a database lifecycle. There is simply no substitute for scripting every database migration, checking it into source control, and having your unit tests run them all on a blank, or reference, database, and having some record in your production database of which scripts have been run and which haven’t. And while schema comparison tools may be a life saver if you lose track of things for any reason, they are a very poor alternative to generating your migration scripts by hand.