@ayende You ought to try Mercurial. in reply to ayende 1 week ago

Miscellany

Random things that defy categorisation

16
Sep

If you are saving passwords in clear text, you are probably breaking the law

The Christian dating website that got hacked by 4chan back in August was a textbook illustration of why you should not store users’ passwords in plain text. Most of the users of the site had re-used their user names and passwords on Facebook and their e-mail accounts, which were compromised as a result, in many cases in extremely embarrassing ways.

Reading about this (and a similar event somewhat closer to home a week later) has got me thinking about the whole issue again. A couple of years ago, Mats Helander proposed on his blog that saving plain text passwords should be illegal. (Unfortunately he lost his domain name to squatters a few months later, but the post is still up in the Wayback Machine.) His post was in response to some of Jeff Atwood’s readers, who pointed out that many web developers have bosses and clients who insist on them storing passwords in clear text so that they can e-mail password reminders to their users. To be sure, you can try explaining to them that there are alternative approaches that don’t compromise usability, but if your boss is an “I’m not a computer person” type, or just doesn’t care, you might as well try to strike a match on jelly, or you may even find your job on the line. However, if you could tell your boss or clients that they were asking you to do something illegal, you’d be in a much stronger position to push back.

Now I am not a lawyer, but the other day, I took a close look at the Data Protection Act 1998, and if I understand it correctly, saving passwords in clear text is indeed illegal here in the UK.

The relevant part of the Act is Schedule 1, Part I, paragraph 7, which states the seventh of eight Data Protection Principles:

Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.

This is expanded on in Schedule 1, Part II, paragraphs 9-12, which tells us how to interpret this principle. Paragraph 9 in particular says:

9 Having regard to the state of technological development and the cost of implementing any measures, the measures must ensure a level of security appropriate to—

(a) the harm that might result from such unauthorised or unlawful processing or accidental loss, destruction or damage as are mentioned in the seventh principle, and

(b) the nature of the data to be protected.

It should be noted that these restrictions apply to “personal data” as well as to “sensitive personal data.”

As Mats argued, and I would reiterate, and the Christian dating website/4chan incident illustrates dramatically, losing people’s passwords has the potential for immense harm. Defacing Facebook profiles can cause serious embarrassment and possibly even wreck careers, but if the attacker then gets access to your e-mail account, they can obtain or request new passwords for even more sensitive websites such as your bank, your credit cards, and so on.

It seems obvious to me that storing plain text passwords in a database most certainly does not “ensure a level of security appropriate to the harm that might result from such unauthorised or unlawful processing or accidental loss” as required by the law. The state of technological development provides us with a much better solution — a one-way salted hash, which is computationally infeasible to reverse engineer — and since there are still perfectly adequate solutions to the login recovery problem, the cost of doing so is negligible.

I’d be interested to hear from anyone who specialises in the legal issues surrounding computer security whether my understanding of the Data Protection Act is correct here. Do you concur with my conclusions? Or do you think that the law need to be made more explicit on this matter?

29
Aug

Twitter through the eyes of a nine year old

My nephew Aaron (age 9¾) recently started experimenting with Twitter, as I discovered about a week ago when he started following me. He was not the youngest person on Twitter (that honour surely goes to @rockhardawesome son of @codinghorror son of @spolsky) but his experiment seems to have been fairly short lived. His parents decided to enforce Twitter’s terms and conditions (which require you to be thirteen or over to use the service) after a couple of spammers started following him, but his last three tweets seem to suggest that he won’t be missing it:

why do you talk about boring stuff people ?????? talk of soming not boring please

Hmmm, there are certain people on Twitter that I could name who really, really need to read that…

27
Aug

Why can’t every call centre let you know how long you’ll be waiting?

There are some smart companies that regularly tell you how many people there are ahead of you when your phone call is in a queue waiting to be answered. I wish every call centre would do that. In fact, I wish it were a legal requirement.

Unfortunately, they are very much in the minority. Most companies just churn out canned platitudes that “Your call is important to us” every minute or so without giving you the faintest indication whether you’ll be on hold for two minutes or half an hour. Of course your call is important to them. Especially if you have called an 0870 number, when the longer they keep you on hold, the more money they earn. If you knew you were going to be kept waiting for twenty minutes at 7.5 pence a minute, you’d no doubt take your business somewhere more efficient.

17
Aug

Web development is hard, m’kay?

There seems to be a bit of intellectual snobbery among some non-web developers, who regard web development as a soft skill, something other than software engineering that’s only for programming wusses who can’t make the grade to get into desktop development. It’s an attitude that is typified by this flabbergastingly arrogant post by Michael Braude (hat tip: Jeff Atwood) who has this to say:

But then, that’s just it, isn’t it?  The reason most people want to program for the web is that they’re not smart enough to do anything else.  They don’t understand compilers, concurrency, 3D or class inheritance.  They haven’t got a clue why I’d use an interface or an abstract class.  They don’t understand: virtual methods, pointers, references, garbage collection, finalizers, pass-by-reference vs. pass-by-value, virtual C++ destructors, or the differences between C# structs and classes.  They also know nothing about process.  Waterfall?  Spiral?  Agile?  Forget it.  They’ve never seen a requirements document, they’ve never written a design document, they’ve never drawn a UML diagram, and they haven’t even heard of a sequence diagram.

Well I have news for you, Michael. This is just plain wrong. As web developers, we have to know and understand almost everything you’ve listed above—and more. Practically the only exceptions that he has listed are compilers, pointers and virtual C++ destructors, and even then some of us may need to get to grips with them from time to time. We need to understand class inheritance, interfaces, abstract classes, virtual methods, pass by reference versus pass by value, garbage collection and so on or we’re toast.

In fact, some of the concepts he’s listed above are even more critical to the web than other, supposedly more superior, forms of development. Take concurrency for instance. This is a particularly difficult concept to understand, debug, work with and test, and on desktop applications which only ever run in single user mode, it is very often a non-issue. You can build desktop applications for years without needing to know jack squat about concurrency, but once you start building web applications, which are multi-user by their very nature, it’s only a matter of time before you get bitten by it.

There are other aspects to web development that we need to understand to a much greater extent than desktop application developers. Security is much more important, for instance. It isn’t exactly a non-issue for desktop applications—web browsers, Microsoft Anything, and so on can be potential attack vectors, especially if they allow scripting—but web servers and web applications are low hanging fruit as far as hackers are concerned, and you have to contend with bots actively probing your site, 24/7, for data injection and cross site scripting vulnerabilities.

I could give other examples. We have to understand scalability and caching. We have to understand not only HTML but also the protocols that run the web, such as HTTP, TCP/IP, SMTP, SOAP, and so on. We have to work with several different languages at once—HTML, JavaScript, PHP or C#, SQL, CSS, and XML and its many domain-specific variants.

We need to work in cross-browser environments that very often make test-driven development particularly tricky if not impossible. (How do you unit test CSS positioning, for instance?) We need to understand graphic design, accessibility, and usability. We certainly do need to understand business processes such as agile, waterfall, scrum, and so on. We need to understand UML. Most difficult of all, we need to know how to bridge the gap between the exact literalism of computers and the vagaries of humans and other living beings.

And guess what? Far from being a cuddly toy, JavaScript is a rich, powerful, functional language that, despite its C-style syntax, was originally heavily influenced by Scheme.

To be sure, there are a lot of incompetent web developers out there, and the barrier to entry is ridiculously low, but it is thoroughly incorrect and misleading to say that this is “because you don’t need to know complicated things to be a web developer.” On the contrary, if you’re working on a site of any significant proportions, you need to know some pretty hard stuff. You need to be pretty smart to be a successful web developer.

14
Aug

Please untangle this great long conversation by COP today

Some years ago, I received an e-mail message consisting of the following instruction:

James, please deal with this by COP today.

followed by thirty or so screenfuls of the sender’s correspondence with the other interested parties. Half of this consisted of the typical lengthy disclaimers that corporate e-mail systems add to all outgoing messages by default, of the other half, 90% was of only tangential relevance to what he actually wanted me to do, and even after I had waded through the entire e-mail, I was still unclear as to what exactly he was asking for. On top of that, this was the first time I had ever encountered the cryptic abbreviation “COP” meaning “Close of Play,” so I had no idea what he meant. He had obviously just hurriedly and lazily hit “Forward” in his e-mail client, appended a quick note, and left me to untangle the mess.

Folks, don’t do this. It simply isn’t fair on someone to expect them to spend half an hour wading through thirty screenfuls of noise to filter out your instructions, when you could just as easily spend a couple of minutes including a summary at the top, and trimming out the extraneous, irrelevant waffle if necessary. Unfortunately, this particular individual made a habit of doing this kind of thing, and it annoyed me no end.

Besides, e-mail is not a suitable medium for communicating requirements that need to be dealt with by COP today. Your recipient may not be at their desk, or may have their e-mail client turned off, or may have a hundred other messages that also need to be dealt with by COP today, or the message may have been trapped in their spam filter. If it is time sensitive, a telephone call is more appropriate.

22
Jul

London Victoria’s sneaky back entrance

There is a sneaky back entrance to London Victoria station that I’ve taken to using. It’s near the end of Platform One, and it takes you out onto the corner of Hudson’s Place and Bridge Place. It avoids all the crowds outside the main entrance to the station and on Victoria Street, and because it’s much quieter, you also avoid those really annoying characters spamming you with the London Lite and other similar vacuous drivel wherever you turn on the way back in the evenings.

Obviously you still have to negotiate the crowds inside the station, but once you’re out, the twenty minute walk to Millbank up Vauxhall Bridge Road, Francis Street, Greencoat Place, Greycoat Place and Great Peter Street is about as pleasant and stress free as you can possibly get in central London during the rush hour.

29
Jun

Why SQL Server 2005 database projects in VSTS are a bad idea

I’ve been working a bit lately with a project that uses SQL Server 2005 database projects in Visual Studio 2008 Team System. These are different from the conventional database projects that you get in Visual Studio Professional, since they have extra features that allow you to do schema and data comparisons, and, in theory at least, manage database deployments and migrations.

The idea is that you should be able to design your database using visual designers rather than having to write all that nasty SQL code to script it for you. Visual designers make things so much easier at the planning and initial design stage, and once you are done, you can use the various schema comparison and script generation tools to generate your production database.

The problem comes when you want to manage your database’s entire lifecycle. I’m sure that many developers will have scratched their heads at some stage about this problem. You chop and change your database on your development server, most likely using the visual tools—but how do you reliably replicate these changes on your live server?

The Microsoft approach here is to rely on the schema comparison tools to generate change scripts that you can then run against your database. Some people think it’s a silver bullet. I beg to differ.

The first problem is that while schema comparisons can make a good starting point, the scripts they generate don’t always work properly out of the box, if at all. Some database refactorings simply can’t be done using schema comparison tools. Examples include normalisation refactorings such as moving data from one table or column to another; introducing constraints or changing a column’s data type when you need to do some data cleanup first; or modifying reference data. Even relatively straightforward refactorings—or even, in some cases, no refactoring at all—can be problematic: if your production and development databases get their collation orders out of sync, for instance, the script may refuse to run at all. And the thought that anyone would blindly use this option on the project properties page makes me shiver:

Perform "smart" column name matching when you add or rename a column

In other words, you’re asking it to guess what’s changed.

Testability—and when you’re dealing with an abstraction as leaky as this one, testability is vital—is another issue. Unfortunately, SQL Server 2005 database projects have serious shortcomings in this area too. They do offer unit testing features, but these only apply to the final database. There doesn’t seem to be any way of integration testing your migrations themselves: you don’t have a consistent record of what’s changed in a format that can easily be applied to a blank or reference database, so there’s no way of verifying that you’re getting the expected results when you’re going from “before” to “after.”

Then there are the change scripts generated by schema compare itself. They are a morass of long winded, convoluted, hard to maintain, spaghetti code. Adding a column to a database table involves dropping and re-creating the table: this is understandable if you need to put the column in the middle of the roster, since SQL Server does not have an AFTER clause in the ALTER TABLE statement like MySQL does, but even if you add a column on to the end, it still drops and re-creates the table. A task that needs only one or two lines of code ends up taking eighty. If you rely on schema comparison tools, sooner or later you are going to need to edit your scripts, and when that happens you’ll find that you’d have been quicker just writing the change that you needed by hand in the first place.

All in all, this seems far too leaky an abstraction to give me any confidence in it to manage a database lifecycle. There is simply no substitute for scripting every database migration, checking it into source control, and having your unit tests run them all on a blank, or reference, database, and having some record in your production database of which scripts have been run and which haven’t. And while schema comparison tools may be a life saver if you lose track of things for any reason, they are a very poor alternative to generating your migration scripts by hand.

05
Jun

Keyboard switching in IE8 is insane

Earlier this week I took delivery of a new laptop at work. Because I use Colemak with my Microsoft Natural 4000 keyboard and qwerty when the ergonomic option is not available (unfortunately I find flat keyboards and Colemak just don’t mix, though the Colemak/ergo combination is light years ahead in terms of comfort) this means I am likely to be switching to and fro between the two layouts a lot more on the same machine.

Unfortunately, the Windows keyboard switcher is completely insane in this respect. It’s maddening that it sets your keyboard layout separately for each individual window rather than letting you set it across the board for all the windows that you have open, and even more so that it doesn’t give you an option to change this behaviour.

But it gets worse. In IE8 you can set the keyboard layout individually for each tab. This meant that at one point this morning I had Colemak in Twitter and qwerty in the browser’s address bar.

Yes, I know there’s the whole thing about each tab being in a separate process, but Google Chrome has a similar architecture and gets this right. Microsoft: this is a bug, not a feature. Please fix it.

01
Jun

Why would anyone not use source control?

There’s a question over on Stack Overflow that asks if there are any good reasons for not using source control. It’s a question I’ve been racking my brains over for a while now, especially since you do occasionally encounter people who claim they have good reasons not to. The most common such reason that I come across is that they’re a lone developer — an excuse that simply shows that they haven’t a clue what source control actually is.

One person pointed out that physicists are particularly unlikely to use source control:

For the casual programmers – those to whom programming is just a tool, such as many of the people I work with (scientists) – much of the work is hackish and small scale, there may be a dozen other things that are more likely to fail outside the code which could also be eliminated with better practices.

As a colleague put it, “we don’t get published for writing beautiful code”.

Interesting point that. Most programs written by physicists tend to be no more than a few hundred lines long, or even just a Microsoft Excel spreadsheet, and once they’re debugged and working, they usually don’t change. This is of course the exact opposite of business and web programming, where requirements change faster than you can keep up with them. However, you can’t really generalise here. I’d be very surprised, for instance, if NASA doesn’t use some from of source control for the Mars rovers.

Another person gave an answer that was especially worth commenting on:

“For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS is” –Torvalds

If you’ve got quick/easy/automatic backups, you’ve already got 95% of what most of us use VC for. Somebody with a local DVCS repository on his HD but no backups is actually in much worse shape.

Using a VCS does have a real cost, and it’s usually a small one but not always. Every VCS I’ve ever used, I’ve had days where I had to fight with it for hours just to get it to do something that should have been simple.

To those that think “There are no good reasons not to use version control”, where does it end? Must every project have 100% unit test code coverage? Must every project have code reviews? Coding standards? A complete functional spec?

There’s a whole spectrum of programming projects in the world. Not everybody is writing code for the space shuttle. Sometimes being able to diff my code from 11:00am and 11:30am is simply not that important.

Some are merely managing globally-distributed teams of thousands writing operating system kernels.

This is another interesting point — if the Linux kernel managed fine without source control for ten years, why should we use it? In actual fact, the commenter is not entirely correct: the Linux kernel has been under source control since 2002 and Linus Torvalds even wrote his own source control system because he was dissatisfied with all the others that were available at the time. But this is an indictment of CVS in particular, not of source control in general — at the time the choice that you had was between that and something costing an arm and a leg.

This highlights another fairly common reason why people shy away from source control: they perceive it as being more trouble than it’s worth. In recent years, most developers’ first experience of source control has been Subversion. Once you get used to it, Subversion is pretty powerful and works very well, but unfortunately it is not a good example to throw at beginners when telling them they need to use source control. Getting your project under source control in the first place with it is a faff, and I’ve lost count of the number of times that it’s gotten so confused with itself that I’ve had to do a fresh checkout just to get it working properly again. And all those extraneous .svn directories that pollute your project’s filespace can be a major irritation at times.

So what is the best option to convince the naysayers? In a word: Mercurial.

Recently I’ve been playing with some of the new distributed source control systems such as Git and Mercurial, and I get the impression that they are much better suited to new and casual developers than Subversion. They’re a lot easier to use for starters — in combination with visual front ends such as TortoiseHg, you can get your entire project under source control with only three or four mouse clicks. They also have fewer pitfalls and gotchas — you can rename and delete files and directories much more easily without creating a whole lot of confusion, for instance.

Another big advantage of modern distributed source control systems such as Mercurial is that they scale down as well as up. Mercurial creates a single .hg directory in your project’s root which acts as a complete repository in and of itself. For a lone developer this is probably all you need, in tandem with a decent backup strategy, and it even makes it entirely reasonable to get your throwaway scripts under source control. After all, throwaway scripts have a rather nasty habit of not being as throwaway as we first thought they would be.

For development teams, you can have a central repository in addition to the developers’ personal ones, and push the changes to the central server once you’re done. For really big projects, you can have a whole hierarchy of source control servers, with changes being pushed up to the next level once they have passed quality control and whatever other processes you may have in place.

There may have been reasonable excuses for not using source control five years ago on small, trivial projects. But with the latest generation of tools, these excuses are getting flimsier and flimsier every day. Even for physicists.

30
May

Sorry, but I am not a SharePoint expert

I’ve just been taking a look to see who’s following me on Twitter, and it seems that I’ve picked up a handful of SharePoint developers along the way. No doubt this stems from the fact that two of my most popular blog entries are SharePoint posts, almost entirely due to the fact that they feature rather prominently in various Google searches of a SharePointesque nature. It makes me wonder if I’ve unwittingly picked up a bit of a reputation as something of a B-list SharePoint guru.

Well I’m sorry to disappoint you folks, but I’m not one.

Those particular blog entries were actually my initial impressions of the first, and so far the only, SharePoint project I have ever worked on. I had only the vaguest idea of what I was doing and most of my SharePoint efforts at the time were firmly in the “cargo cult” category, as they generally are when you’re plunged in at the deep end with a new, unfamiliar and complex technology, no training, and a tight deadline. Furthermore, neither of them were intended as knowledge base type articles, but as rants — one of them about insanely over-complicated functionality and the other about an idiotic MSDN knowledge base article that didn’t work.

Now I have no idea what effect this post is going to have on my Feedburner subscriptions and Twitter following. If you fall into that category you’re more than welcome to stick around of course, but I just don’t have anything more to say on the subject. I think my SharePoint skills advanced beyond the cargo cult stage as the project progressed, but since I have not been developing for that particular platform for nearly a year now, I am no longer blogging about it either.