SQL injection is the FizzBuzz of web security

FizzBuzz is the (in)famous interview question designed to filter out totally unqualified candidates for a programming job at a very early stage in the process. The kind who can’t solve even the very simplest programming problems and who would be wasting your time and money if you called them in for an interview after the phone screen.

You can—and should—do something similar for web security. Take a look at this snippet of Python code:

def check_password(username, password):
    db = MySQLdb.connect(passwd=DB_PASSWORD, db=DB_DATABASE)
    c = db.cursor()
    c.execute("SELECT password from tblUsers " +
        "WHERE username = \"" + username + "\"")
    row = c.fetchone()
    if row:
        return row[0] == password
    else:
        return False

Did you spot the problem? If you have any significant experience at all as a web developer, it should stand out to you like a sore thumb. You should be able to spot it in seconds, even if you have never used Python before in your life. Even if you’re the kind of .NET-only developer who insists on being spoon-fed by Microsoft and believes that Python is a dangerous heresy, it should still be glaringly obvious to you.

A couple of years ago, I used a similar question to this one on a number of interview candidates—some of them with twenty or more years of experience at a variety of impressive sounding companies. Yet it shocked me just how many of them required very heavy prompting to see it.

If you’re interviewing a candidate for a software developer role, show them this snippet of code. If they can’t tell you in seconds that it contains a SQL injection vulnerability in line 5, don’t hire them. If they can’t tell you why it’s a SQL injection vulnerability, don’t hire them. No exceptions, no excuses.

SQL injection vulnerabilities are quite frankly inexcusable. Out of all the different kinds of security vulnerabilities that you can get, they are the easiest to understand, the easiest to spot, and the easiest to avoid. Anywhere that you see user input being smashed together with any kind of instructions—SQL, SPARQL, LDAP queries, whatever—it should raise a massive red flag. A candidate who can’t spot security vulnerabilities will write security vulnerabilities (or more likely, copy and paste them from the Internet)—and if they can’t spot the simplest vulnerability of the lot, they’re going to have trouble even understanding more complex ones. And that’s before you even get started on other aspects of programming such as data integrity or performance.

With the rise of ransomware and other increasingly nasty exploits, you simply can not afford to be careless or blasé about IT security these days. As software developers, we all have a responsibility to make sure our knowledge and skills are sharp and up to date in this area, and as a recruiter, you can’t afford to take on anyone who isn’t taking this responsibility seriously.

Finally: there is a second glaring security flaw in this snippet, and candidates should be expected to spot it as well. I shall leave that one as an exercise for the reader.

Some thoughts on CQRS and Event Sourcing

CQRS and Event Sourcing have had quite a bit of a following among some developers in recent years, especially among the “passionate programmers” in the .NET ecosystem, who have been touting it as a Best Practice. They tell me that it can provide significant benefits when it’s done right, even on small projects.

This may or may not be true. The principles behind CQRS and Event Sourcing look fairly straightforward and the benefits seem clear enough. It does add some extra complexity to your architecture, but it does so in order to solve specific problems. If these are problems that the business is actually asking you to solve, and you’re taking care to keep the extra complexity under control, then you may well be doing CQRS right.

Unfortunately, I’ve more experience with projects that have got CQRS wrong than with ones that have got it right. Some of the worst examples that I’ve seen have been so bad that they made me think of Poe’s Law.

“Without a winking smiley or other blatant display of humour, it is impossible to create a parody of enterprise software best practices that someone won’t mistake for the real thing.”—Poe’s Law, Enterprise Edition™

The principles of CQRS and Event Sourcing may be straightforward enough, but the implementation is a different matter altogether. It’s very easy to end up over-complicating it, and combined with the tendency of many .NET developers to split up their solutions into far more projects and far more layers than necessary, the result can be borderline unmaintainable. I’ve seen one example that required thirty-six files to be edited in order to add a single column to a database table, and another that contained no less than thirty-nine projects in a solution consisting of a single three-page CRUD application that was only ever used by one person once a week.

One sure-fire way of doing CQRS and Event Sourcing wrong is failing to get your entire team on side with it. You may think it’s a better mental model, but the people who are going to maintain it after you need to think the same. This includes not only your domain modelling experts but also your infrastructure and DevOps guys, project managers, and anyone else who is likely to be affected by it. DevOps will want to know how you’re going to handle schema migrations, for example. HR and your project managers will want to know what skills to look for when recruiting, and how to interview for them or provide the necessary training.

I know one project that ended up being completely rewritten because the team of CQRS aficionados who built it failed to put any of this in place. They were all contractors who left without training up any of the organisation’s permanent staff in it. The result was that nobody was able to understand the codebase, let alone maintain it, and it ended up being rewritten from scratch as a conventional BOL/BLL/DAL application. The team who did so now automatically puts any CV they receive that has CQRS listed on it straight into the bin.

Another way of doing it wrong is as an exercise in CV driven development. Some people introduce CQRS and Event Sourcing into a project that they’re working on for no reason whatsoever other than to be able to list it on their CV. Don’t even think of doing this. Besides the fact that it’s stealing from the business, you’ll just end up misunderstanding it, building something that isn’t CQRS, that doesn’t solve the problems that CQRS is supposed to solve, and that over-complicates things without providing any benefit whatsoever. Ideally, you should have at least someone on your team with serious experience supporting and maintaining a legacy CQRS app in production before you try using it on greenfield work, so they can tell you what pitfalls to avoid. Like, for example, when not to use it.

I’ve no doubt that CQRS and Event Sourcing can provide significant benefits when used correctly. However, at the end of the day, any technique, tool or practice will have costs as well as benefits, whether in terms of infrastructure, training, support, maintainability, or general increased complexity. These costs need to be justified by benefits that the business is actually asking for, otherwise you will just be wasting everybody’s time and money.

Martin Fowler and feature branches revisited

Fábio asks me this by e-mail:

I found this very interesting article (http://web.archive.org/web/20110721063430/http://jamesmckay.net/2011/07/why-does-martin-fowler-not-understand-feature-branches/) that once was in your blog, and it seems it can only be accessed in the archive, or at least I was not able to find it in the live version of your blog.

Is there a reason for that? Do you still keep the same opinion? If not, can you give me a quick hint of what to read to reach the same conclusion?

In answer to his question, I took the original blog post down in January 2013 along with everything else on my blog, because I wanted to give it a complete reboot. In the years since then, I’ve restored some of them going as far back as 2009,  but I hadn’t restored that particular post, mainly because I felt that I’d been too confrontational with it and it hadn’t made me look good. However, since it’s out there in archive.org, and people are still asking about it, I’ve now restored it for historical reference. Unfortunately, I haven’t restored the comments because I no longer have a backup of them.

A bit of historical background.

I wrote the original post in July 2011. At the time, distributed version control tools such as Git and Mercurial had been around for about six years, but were still very new and unfamiliar to most developers. Most enterprise organisations were deeply entrenched in old-school tools such as Subversion and Team Foundation Server, which made branching and merging much harder and much more confusing than necessary, most corporate developers were terrified of the concept, and vendors of these old-school tools were playing on this fear for all it was worth. This was intensely frustrating to those of us who had actually used Git and Mercurial, could clearly see the benefits, and yet were being stonewalled by 5:01 dark matter colleagues and managers who didn’t want to know. To us, Subversion was The Enemy, and to see someone of Martin Fowler’s stature apparently siding with the enemy was maddening.

On the other hand, these tools weren’t yet quite enterprise ready, with only weak Windows support and rudimentary GUIs. SourceTree was not yet a thing. Furthermore, the best practices surrounding them were still being thrashed out and debated by early adopters in the blogosphere and at conferences. In a sense, nobody properly understood feature branches yet. If you read all the discussions and debates at the time, we didn’t even agree about exactly what feature branches were. Martin Fowler, Jez Humble and others were working with the assumption that they referred to branches lasting several days or even weeks, while people such as Adam Dymitruk and myself went by a much broader definition that basically amounted to what we would call a pull request today, albeit with the proviso that they should be kept as short as possible.

These days, of course, everybody (or at least, everybody who’s worth working for) uses Git, so that particular question is moot, and we can now freely discuss best practices around branching and merging as mature adults.

The state of the question in 2017.

I’m generally somewhat more sympathetic to Martin Fowler’s position now than I was six years ago. Most of his concerns about feature branches are valid ones. Long-lived feature branches can be difficult to work with, especially for inexperienced teams and badly architected codebases, where big bang merges can be a significant problem. Feature branches can also be problematic for Continuous Integration and opportunistic refactoring, so they should very much be the exception rather than the rule. Feature toggles can also provide considerable benefits. You can use them for A/B testing, country-specific features, premium features for paying customers, and so on and so forth. I even started writing my own .NET-based web framework built around feature toggles, though as I’m no longer working with .NET, it’s not being actively developed.

However, many of my original points still stand. Short-lived branches, such as pull requests, are fine, and should in fact be the default, because code should be reviewed before it is integrated, not after the fact. Furthermore, if you’re using feature toggles solely as a substitute for branching and merging, you are releasing code into production that you know for a fact to be immature, untested, buggy, unstable and not fit for purpose. Your feature toggles are supposed to isolate this code of course, but there is always a risk that the isolation could be incomplete, or that the toggle could be flipped prematurely by mistake. When feature branches go wrong, they only go wrong in your development environment, and the damage is relatively limited. When feature toggles go wrong, on the other hand, they go wrong in production—sometimes with catastrophic results.

Just how catastrophic? The feature toggles article on Martin Fowler’s own website itself cites an example where badly implemented feature toggles cost one company $460 million in just 45 minutes. The company concerned, we are told, “went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.” To be fair, there were other problems—they relied extensively on manual deployment processes, for example—but it is still a cautionary tale for anyone who thinks feature toggles should always be viewed as a Best Practice without exception.

Of course, not all feature toggles carry this level of risk, and in many cases, the effects of inadvertent exposure will be mostly harmless. In these cases, feature toggles will indeed be the better option, not least because they’re easier to work with. But in general, when deciding whether to use a feature branch or a feature toggle, always ask yourself the question: what would be the damage that this feature would cause if it were activated prematurely? If it’s not a risk you’re prepared to take, your code is better off on a separate branch.

Productivity suggestion: keep lab notes

I’ve recently been getting into various discussions about science and faith with friends at church and on Facebook, and one of the things I’ve been doing is explaining to them exactly how the scientific method works. This got me thinking—there are some aspects of the scientific method that we don’t always apply to software development, that maybe we should.

People with no scientific training often don’t appreciate just how rigorous and exacting the scientific method is. It has a lot of very strict protocols to ensure that research meets stringent standards of quality control. One of these protocols is keeping detailed laboratory notes. In other words, you document everything you do. In meticulous detail. As you do it—not after the fact.

There are several benefits to keeping a detailed record of everything you do in this way. It helps you to keep focused on the task at hand. It gives you something more constructive to do during slow edit/compile/test loops than reading Reddit. It helps you to pick up where you left off after your lunch break or when you get into work first thing in the morning. It jogs your memory for your daily stand-up meeting. It provides you with a searchable audit trail that you can consult to figure out where to find things, why things were done in a certain way, or what exactly happened when you tried X. It provides source material for when you need to write up the documentation proper. And it covers your back. For example, if you’re working from home, it provides evidence—should you need it—that you were actually working and not slacking off.

Of course, if you’re a fan of the Agile Manifesto, you are probably cringing at this idea. Aren’t we supposed to favour working software over comprehensive documentation? In fact, the whole idea no doubt sounds like anathema to many software developers, who hate writing documentation with a passion.

But if we were research scientists rather than software developers, we would be doing this as a matter of course, and we wouldn’t think twice about it. Besides, as Martin Fowler says, if it hurts, do it more often.

Sharing data between Terraform configurations

I’ve been doing quite a lot of work with Terraform lately, Hashicorp’s excellent infrastructure-as-code software, to provision some infrastructure on AWS.

When designing your infrastructure, it is often necessary to break it up into different configurations. For example, you may want to have one set of configuration for your data (databases, S3 buckets, EBS volumes, and so on), and another set of configuration for various compute resources (EC2 instances, ECS containers, AWS Lambda scripts, and so on) — for example, to allow you to tear down your development environment when you are not using it in order to reduce costs, while preserving your data. Or, you may want to reference your DNS hosted zones in multiple different configurations.

When you do this, you will inevitably need to pass state information from one configuration through to the next — for example, EBS volume identifiers, IP addresses, and so on. If you are using remote state, you can do this quite simply using a remote state data source.

In this example, I’ll assume that you have configured Terraform to use remote state hosted in an S3 bucket, that you are using a separate configuration called “dns” for your DNS hosted zones, and that you wish to reference them in a different configuration.

First declare the values that you wish to reference as outputs:

output "zone_id" {
    value = "${module.dns.zone_id}"
}

output "zone_name" {
    value = "${module.dns.zone_name}"
}

In your target configuration, declare the remote state data source:

data "terraform_remote_state" "dns" {
    backend = "s3"
    config {
        bucket = "name-of-your-S3-bucket"
        key = "dns"
        region = "eu-west-1"
    }
}

Then, whenever you need to use any of the remote values, you can reference them as follows:

resource "aws_route53_record" "www" {
   zone_id = "${data.terraform_remote_state.dns.zone_id}"
   name = "www.${data.terraform_remote_state.dns.zone_name}"
   type = "A"
   ttl = "300"
   records = ["${aws_eip.lb.public_ip}"]
}

Note that you will need to run terraform apply on the source environment, even if you have not made any changes, before you attempt to do anything with the target environment. This is because you need to make sure that the outputs have been written into the Terraform state file, and are available to the other modules that depend on them.

Hat tip: Thanks to Paul Stack for pointing me in the right direction here.