This is why I'm not on Facebook: 100 million Facebook users' info has been made available for free download: http://gu.com/p/2ty2k/tw 17 hrs ago
29
Sep

Book review: The Art of Unit Testing

Unit testing is one of the programming disciplines that, to my mind at least, sets apart the reputable developers from the cowboys. Test driven development has become increasingly popular over the past few years, and for many teams, it’s becoming as fundamental a discipline as using source control. It’s an idea that sounds almost like a no-brainer at first: you write a bunch of methods that feed test data into your code and examine the result, and they report either success or failure. Then when you start extending and modifying your code, you can re-run your tests to check that everything still works as it’s supposed to, and you haven’t broken any dependencies.

Yet of all the programming disciplines that I’ve had to get to grips with over the years, unit testing has probably been one of the most difficult. Setting up your project to facilitate automated testing involves quite a lot of groundwork. You have to configure a test environment, perhaps with a test database or other form of test data source, and you also have to completely re-think the way you write your code. It requires greater discipline — it’s all too easy to think “Stuff that,” and slip into the old mindset of just slapping down your code and testing it manually. Some legacy code may seem almost impossible to test, especially if it is a Big Ball of Mud with long, multi-purpose methods and no clear separation of concerns. Then there is the question of what to do with external dependencies — components that are beyond the control of your unit tests, such as third party web services, input devices, and even the current date and time.

The Art of Unit TestingIf you’re wrestling with problems such as these, Roy Osherove’s book, The Art of Unit Testing, is a must-read. Most treatments of unit testing that I’ve come across only cover the topic at a fairly basic level, but this one picks up where they leave off. I was pleased to see some good solid chapters on stubs, mocks and dependency injection, for instance: these are essential tools needed to break external dependencies. I was also pleased to see a whole chapter devoted to working effectively with legacy code; another whole chapter is devoted to the political and organisational implications of introducing unit testing into your organisation.

The book assumes a basic familiarity with object oriented programming and design patterns (which every working developer should understand properly anyway) but it isn’t a difficult read, and it explains things very clearly. Code samples are given in C#, but developers working with other languages will also find much in it of benefit. There are plenty of suggestions that may not have occurred to you, and you’ll end up much more aware of the “tricks of the trade.”

My only disappointment was that four major, and potentially thorny, aspects of the subject — database, UI, web and thread-related testing — were only given a brief overview in the second half of Appendix B. Of course it could be argued that these were outside the scope of this book — indeed, Osherove argues that unit testing in some of these areas in particular have a relatively low return on investment — and in others, unit testing frameworks and methodologies are still very much in their infancy. However, I was hoping for a somewhat more extensive treatment of these aspects of the subject and it was a little bit disappointing that they only got six pages.

But this doesn’t detract from the value of the book. Whether unit testing is your “thing” or not, it is very much a must-read for every working .NET developer, and indeed for developers working with other languages too. It deserves to be as much a classic as books such as Code Complete, Design Patterns, or Martin Fowler’s Refactoring.

25
Sep

Joel Spolsky, cowboy coder

Some people at work think I’m a bit of a Joel Spolsky fanboy. I’ve certainly got a lot of useful hints and tips out of his blog, Joel on Software, and I’ve been recommending it as a must-read for fellow developers and managers alike.

Until now.

What’s really, really annoying me about Joel these days is that he’s started preaching cowboy coding. And I don’t like it.

His latest post, The Duct Tape Programmer, illustrates this perfectly. He has yet another dig at Architecture Astronauts who come up to you when you’re racing to get an upgrade ready for deployment and tell you that you need to refactor your core code to use multi-apartment threaded COM. Not unreasonable. But then, he goes on to wax lyrical about Jamie Zawinski, who is, to use Joel’s words, the Pretty Boy of Software Development. He wrote Netscape Navigator in the 1990s and is Joel’s hero because—get this—he didn’t write unit tests:

Zawinski didn’t do many unit tests. They “sound great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, ‘We’ve got to go from zero to done in six weeks,’ well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that.”

Now perhaps Mr Zawinski is brilliant enough to get away with a “duct tape programming” approach, but it is not something that we should be encouraging, not least because it is deeply unprofessional. Besides, what he says about unit testing is just WRONG.

Here’s the score with test-driven development. Once you know what you’re doing, it doesn’t slow you down overall. Yes, it takes longer to write the code in the first place, but this is offset by a decrease in time spent debugging, and on top of that, you get a dramatic increase in confidence in your code, a more robust design, and a reproducible, easy to run way of verifying that your code does what it’s supposed to. Every professional developer who is concerned about quality these days writes unit tests insofar as they can. If you don’t like the idea, you are a cowboy coder. Period.

Joel’s sentiments such as these have been bugging me since the start of this year, when he crossed swords with “Uncle Bob” Robert C Martin on the Stack Overflow podcast over the SOLID principles. Back in January on the Stack Overflow podcast, he made this observation:

Joel revisits the SOLID principles, and compares them to designing replaceable batteries, or a headphone jack, into your product. Appropriate in some narrow cases, but not all the time. Imagine a consumer product where every single part of it could be plugged in and replaced with another compatible part. Is that realistic?

Actually, Joel, I can perfectly well imagine such a consumer product. In fact I own one such consumer product myself where this kind of design pattern is very important. It’s called a car.

Let me explain. On the journey up to Faith Camp in August, another driver ran into the back of said car and left it looking rather sorry for itself. The repairs took three weeks, and involved replacing the rear bumper, the rear hatch, and various panels. Ford makes appropriate parts that can be slotted and bolted into place with relative ease as replacements for the damaged originals.

Imagine a car built the way Joel’n’Jeff seemed to be suggesting, where it was just one monolithic structure, with non-interchangeable parts. One ding like that and it’s a writeoff. When the tyres wear thin, or the bulbs go, you have to scrap it. Is that realistic?

Architecture astronauts are a straw man here. We’re not talking about highfalutin over-engineered ProviderSingletonVisitorAdapterMediatorRepositoryFactory design patterns, nor are we talking about more layers of abstraction than The Princess and the Pea, we’re talking about a common sense approach to software development. It’s the kind of thing that you’re either doing anyway but you just don’t know what it’s called, or else you get that kind of “aha!” moment when you first encounter it that makes you think, “Now how come I never thought of that before?”

One other thing. When you’re repairing a car after an accident, you most certainly do NOT just use duct tape and WD-40. No matter how much of a Pretty Boy you think you are.

16
Sep

If you are saving passwords in clear text, you are probably breaking the law

The Christian dating website that got hacked by 4chan back in August was a textbook illustration of why you should not store users’ passwords in plain text. Most of the users of the site had re-used their user names and passwords on Facebook and their e-mail accounts, which were compromised as a result, in many cases in extremely embarrassing ways.

Reading about this (and a similar event somewhat closer to home a week later) has got me thinking about the whole issue again. A couple of years ago, Mats Helander proposed on his blog that saving plain text passwords should be illegal. (Unfortunately he lost his domain name to squatters a few months later, but the post is still up in the Wayback Machine.) His post was in response to some of Jeff Atwood’s readers, who pointed out that many web developers have bosses and clients who insist on them storing passwords in clear text so that they can e-mail password reminders to their users. To be sure, you can try explaining to them that there are alternative approaches that don’t compromise usability, but if your boss is an “I’m not a computer person” type, or just doesn’t care, you might as well try to strike a match on jelly, or you may even find your job on the line. However, if you could tell your boss or clients that they were asking you to do something illegal, you’d be in a much stronger position to push back.

Now I am not a lawyer, but the other day, I took a close look at the Data Protection Act 1998, and if I understand it correctly, saving passwords in clear text is indeed illegal here in the UK.

The relevant part of the Act is Schedule 1, Part I, paragraph 7, which states the seventh of eight Data Protection Principles:

Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.

This is expanded on in Schedule 1, Part II, paragraphs 9-12, which tells us how to interpret this principle. Paragraph 9 in particular says:

9 Having regard to the state of technological development and the cost of implementing any measures, the measures must ensure a level of security appropriate to—

(a) the harm that might result from such unauthorised or unlawful processing or accidental loss, destruction or damage as are mentioned in the seventh principle, and

(b) the nature of the data to be protected.

It should be noted that these restrictions apply to “personal data” as well as to “sensitive personal data.”

As Mats argued, and I would reiterate, and the Christian dating website/4chan incident illustrates dramatically, losing people’s passwords has the potential for immense harm. Defacing Facebook profiles can cause serious embarrassment and possibly even wreck careers, but if the attacker then gets access to your e-mail account, they can obtain or request new passwords for even more sensitive websites such as your bank, your credit cards, and so on.

It seems obvious to me that storing plain text passwords in a database most certainly does not “ensure a level of security appropriate to the harm that might result from such unauthorised or unlawful processing or accidental loss” as required by the law. The state of technological development provides us with a much better solution — a one-way salted hash, which is computationally infeasible to reverse engineer — and since there are still perfectly adequate solutions to the login recovery problem, the cost of doing so is negligible.

I’d be interested to hear from anyone who specialises in the legal issues surrounding computer security whether my understanding of the Data Protection Act is correct here. Do you concur with my conclusions? Or do you think that the law need to be made more explicit on this matter?

29
Aug

Twitter through the eyes of a nine year old

My nephew Aaron (age 9¾) recently started experimenting with Twitter, as I discovered about a week ago when he started following me. He was not the youngest person on Twitter (that honour surely goes to @rockhardawesome son of @codinghorror son of @spolsky) but his experiment seems to have been fairly short lived. His parents decided to enforce Twitter’s terms and conditions (which require you to be thirteen or over to use the service) after a couple of spammers started following him, but his last three tweets seem to suggest that he won’t be missing it:

why do you talk about boring stuff people ?????? talk of soming not boring please

Hmmm, there are certain people on Twitter that I could name who really, really need to read that…

27
Aug

Why can’t every call centre let you know how long you’ll be waiting?

There are some smart companies that regularly tell you how many people there are ahead of you when your phone call is in a queue waiting to be answered. I wish every call centre would do that. In fact, I wish it were a legal requirement.

Unfortunately, they are very much in the minority. Most companies just churn out canned platitudes that “Your call is important to us” every minute or so without giving you the faintest indication whether you’ll be on hold for two minutes or half an hour. Of course your call is important to them. Especially if you have called an 0870 number, when the longer they keep you on hold, the more money they earn. If you knew you were going to be kept waiting for twenty minutes at 7.5 pence a minute, you’d no doubt take your business somewhere more efficient.

17
Aug

Web development is hard, m’kay?

There seems to be a bit of intellectual snobbery among some non-web developers, who regard web development as a soft skill, something other than software engineering that’s only for programming wusses who can’t make the grade to get into desktop development. It’s an attitude that is typified by this flabbergastingly arrogant post by Michael Braude (hat tip: Jeff Atwood) who has this to say:

But then, that’s just it, isn’t it?  The reason most people want to program for the web is that they’re not smart enough to do anything else.  They don’t understand compilers, concurrency, 3D or class inheritance.  They haven’t got a clue why I’d use an interface or an abstract class.  They don’t understand: virtual methods, pointers, references, garbage collection, finalizers, pass-by-reference vs. pass-by-value, virtual C++ destructors, or the differences between C# structs and classes.  They also know nothing about process.  Waterfall?  Spiral?  Agile?  Forget it.  They’ve never seen a requirements document, they’ve never written a design document, they’ve never drawn a UML diagram, and they haven’t even heard of a sequence diagram.

Well I have news for you, Michael. This is just plain wrong. As web developers, we have to know and understand almost everything you’ve listed above—and more. Practically the only exceptions that he has listed are compilers, pointers and virtual C++ destructors, and even then some of us may need to get to grips with them from time to time. We need to understand class inheritance, interfaces, abstract classes, virtual methods, pass by reference versus pass by value, garbage collection and so on or we’re toast.

In fact, some of the concepts he’s listed above are even more critical to the web than other, supposedly more superior, forms of development. Take concurrency for instance. This is a particularly difficult concept to understand, debug, work with and test, and on desktop applications which only ever run in single user mode, it is very often a non-issue. You can build desktop applications for years without needing to know jack squat about concurrency, but once you start building web applications, which are multi-user by their very nature, it’s only a matter of time before you get bitten by it.

There are other aspects to web development that we need to understand to a much greater extent than desktop application developers. Security is much more important, for instance. It isn’t exactly a non-issue for desktop applications—web browsers, Microsoft Anything, and so on can be potential attack vectors, especially if they allow scripting—but web servers and web applications are low hanging fruit as far as hackers are concerned, and you have to contend with bots actively probing your site, 24/7, for data injection and cross site scripting vulnerabilities.

I could give other examples. We have to understand scalability and caching. We have to understand not only HTML but also the protocols that run the web, such as HTTP, TCP/IP, SMTP, SOAP, and so on. We have to work with several different languages at once—HTML, JavaScript, PHP or C#, SQL, CSS, and XML and its many domain-specific variants.

We need to work in cross-browser environments that very often make test-driven development particularly tricky if not impossible. (How do you unit test CSS positioning, for instance?) We need to understand graphic design, accessibility, and usability. We certainly do need to understand business processes such as agile, waterfall, scrum, and so on. We need to understand UML. Most difficult of all, we need to know how to bridge the gap between the exact literalism of computers and the vagaries of humans and other living beings.

And guess what? Far from being a cuddly toy, JavaScript is a rich, powerful, functional language that, despite its C-style syntax, was originally heavily influenced by Scheme.

To be sure, there are a lot of incompetent web developers out there, and the barrier to entry is ridiculously low, but it is thoroughly incorrect and misleading to say that this is “because you don’t need to know complicated things to be a web developer.” On the contrary, if you’re working on a site of any significant proportions, you need to know some pretty hard stuff. You need to be pretty smart to be a successful web developer.

14
Aug

Please untangle this great long conversation by COP today

Some years ago, I received an e-mail message consisting of the following instruction:

James, please deal with this by COP today.

followed by thirty or so screenfuls of the sender’s correspondence with the other interested parties. Half of this consisted of the typical lengthy disclaimers that corporate e-mail systems add to all outgoing messages by default, of the other half, 90% was of only tangential relevance to what he actually wanted me to do, and even after I had waded through the entire e-mail, I was still unclear as to what exactly he was asking for. On top of that, this was the first time I had ever encountered the cryptic abbreviation “COP” meaning “Close of Play,” so I had no idea what he meant. He had obviously just hurriedly and lazily hit “Forward” in his e-mail client, appended a quick note, and left me to untangle the mess.

Folks, don’t do this. It simply isn’t fair on someone to expect them to spend half an hour wading through thirty screenfuls of noise to filter out your instructions, when you could just as easily spend a couple of minutes including a summary at the top, and trimming out the extraneous, irrelevant waffle if necessary. Unfortunately, this particular individual made a habit of doing this kind of thing, and it annoyed me no end.

Besides, e-mail is not a suitable medium for communicating requirements that need to be dealt with by COP today. Your recipient may not be at their desk, or may have their e-mail client turned off, or may have a hundred other messages that also need to be dealt with by COP today, or the message may have been trapped in their spam filter. If it is time sensitive, a telephone call is more appropriate.

22
Jul

London Victoria’s sneaky back entrance

There is a sneaky back entrance to London Victoria station that I’ve taken to using. It’s near the end of Platform One, and it takes you out onto the corner of Hudson’s Place and Bridge Place. It avoids all the crowds outside the main entrance to the station and on Victoria Street, and because it’s much quieter, you also avoid those really annoying characters spamming you with the London Lite and other similar vacuous drivel wherever you turn on the way back in the evenings.

Obviously you still have to negotiate the crowds inside the station, but once you’re out, the twenty minute walk to Millbank up Vauxhall Bridge Road, Francis Street, Greencoat Place, Greycoat Place and Great Peter Street is about as pleasant and stress free as you can possibly get in central London during the rush hour.

29
Jun

Why SQL Server 2005 database projects in VSTS are a bad idea

I’ve been working a bit lately with a project that uses SQL Server 2005 database projects in Visual Studio 2008 Team System. These are different from the conventional database projects that you get in Visual Studio Professional, since they have extra features that allow you to do schema and data comparisons, and, in theory at least, manage database deployments and migrations.

The idea is that you should be able to design your database using visual designers rather than having to write all that nasty SQL code to script it for you. Visual designers make things so much easier at the planning and initial design stage, and once you are done, you can use the various schema comparison and script generation tools to generate your production database.

The problem comes when you want to manage your database’s entire lifecycle. I’m sure that many developers will have scratched their heads at some stage about this problem. You chop and change your database on your development server, most likely using the visual tools—but how do you reliably replicate these changes on your live server?

The Microsoft approach here is to rely on the schema comparison tools to generate change scripts that you can then run against your database. Some people think it’s a silver bullet. I beg to differ.

The first problem is that while schema comparisons can make a good starting point, the scripts they generate don’t always work properly out of the box, if at all. Some database refactorings simply can’t be done using schema comparison tools. Examples include normalisation refactorings such as moving data from one table or column to another; introducing constraints or changing a column’s data type when you need to do some data cleanup first; or modifying reference data. Even relatively straightforward refactorings—or even, in some cases, no refactoring at all—can be problematic: if your production and development databases get their collation orders out of sync, for instance, the script may refuse to run at all. And the thought that anyone would blindly use this option on the project properties page makes me shiver:

Perform "smart" column name matching when you add or rename a column

In other words, you’re asking it to guess what’s changed.

Testability—and when you’re dealing with an abstraction as leaky as this one, testability is vital—is another issue. Unfortunately, SQL Server 2005 database projects have serious shortcomings in this area too. They do offer unit testing features, but these only apply to the final database. There doesn’t seem to be any way of integration testing your migrations themselves: you don’t have a consistent record of what’s changed in a format that can easily be applied to a blank or reference database, so there’s no way of verifying that you’re getting the expected results when you’re going from “before” to “after.”

Then there are the change scripts generated by schema compare itself. They are a morass of long winded, convoluted, hard to maintain, spaghetti code. Adding a column to a database table involves dropping and re-creating the table: this is understandable if you need to put the column in the middle of the roster, since SQL Server does not have an AFTER clause in the ALTER TABLE statement like MySQL does, but even if you add a column on to the end, it still drops and re-creates the table. A task that needs only one or two lines of code ends up taking eighty. If you rely on schema comparison tools, sooner or later you are going to need to edit your scripts, and when that happens you’ll find that you’d have been quicker just writing the change that you needed by hand in the first place.

All in all, this seems far too leaky an abstraction to give me any confidence in it to manage a database lifecycle. There is simply no substitute for scripting every database migration, checking it into source control, and having your unit tests run them all on a blank, or reference, database, and having some record in your production database of which scripts have been run and which haven’t. And while schema comparison tools may be a life saver if you lose track of things for any reason, they are a very poor alternative to generating your migration scripts by hand.

05
Jun

Keyboard switching in IE8 is insane

Earlier this week I took delivery of a new laptop at work. Because I use Colemak with my Microsoft Natural 4000 keyboard and qwerty when the ergonomic option is not available (unfortunately I find flat keyboards and Colemak just don’t mix, though the Colemak/ergo combination is light years ahead in terms of comfort) this means I am likely to be switching to and fro between the two layouts a lot more on the same machine.

Unfortunately, the Windows keyboard switcher is completely insane in this respect. It’s maddening that it sets your keyboard layout separately for each individual window rather than letting you set it across the board for all the windows that you have open, and even more so that it doesn’t give you an option to change this behaviour.

But it gets worse. In IE8 you can set the keyboard layout individually for each tab. This meant that at one point this morning I had Colemak in Twitter and qwerty in the browser’s address bar.

Yes, I know there’s the whole thing about each tab being in a separate process, but Google Chrome has a similar architecture and gets this right. Microsoft: this is a bug, not a feature. Please fix it.