Martin Fowler and feature branches revisited

Fábio asks me this by e-mail:

I found this very interesting article (http://web.archive.org/web/20110721063430/http://jamesmckay.net/2011/07/why-does-martin-fowler-not-understand-feature-branches/) that once was in your blog, and it seems it can only be accessed in the archive, or at least I was not able to find it in the live version of your blog.

Is there a reason for that? Do you still keep the same opinion? If not, can you give me a quick hint of what to read to reach the same conclusion?

In answer to his question, I took the original blog post down in January 2013 along with everything else on my blog, because I wanted to give it a complete reboot. In the years since then, I’ve restored some of them going as far back as 2009,  but I hadn’t restored that particular post, mainly because I felt that I’d been too confrontational with it and it hadn’t made me look good. However, since it’s out there in archive.org, and people are still asking about it, I’ve now restored it for historical reference. Unfortunately, I haven’t restored the comments because I no longer have a backup of them.

A bit of historical background.

I wrote the original post in July 2011. At the time, distributed version control tools such as Git and Mercurial had been around for about six years, but were still very new and unfamiliar to most developers. Most enterprise organisations were deeply entrenched in old-school tools such as Subversion and Team Foundation Server, which made branching and merging much harder and much more confusing than necessary, most corporate developers were terrified of the concept, and vendors of these old-school tools were playing on this fear for all it was worth. This was intensely frustrating to those of us who had actually used Git and Mercurial, could clearly see the benefits, and yet were being stonewalled by 5:01 dark matter colleagues and managers who didn’t want to know. To us, Subversion was The Enemy, and to see someone of Martin Fowler’s stature apparently siding with the enemy was maddening.

On the other hand, these tools weren’t yet quite enterprise ready, with only weak Windows support and rudimentary GUIs. SourceTree was not yet a thing. Furthermore, the best practices surrounding them were still being thrashed out and debated by early adopters in the blogosphere and at conferences. In a sense, nobody properly understood feature branches yet. If you read all the discussions and debates at the time, we didn’t even agree about exactly what feature branches were. Martin Fowler, Jez Humble and others were working with the assumption that they referred to branches lasting several days or even weeks, while people such as Adam Dymitruk and myself went by a much broader definition that basically amounted to what we would call a pull request today, albeit with the proviso that they should be kept as short as possible.

These days, of course, everybody (or at least, everybody who’s worth working for) uses Git, so that particular question is moot, and we can now freely discuss best practices around branching and merging as mature adults.

The state of the question in 2017.

I’m generally somewhat more sympathetic to Martin Fowler’s position now than I was six years ago. Most of his concerns about feature branches are valid ones. Long-lived feature branches can be difficult to work with, especially for inexperienced teams and badly architected codebases, where big bang merges can be a significant problem. Feature branches can also be problematic for Continuous Integration and opportunistic refactoring, so they should very much be the exception rather than the rule. Feature toggles can also provide considerable benefits. You can use them for A/B testing, country-specific features, premium features for paying customers, and so on and so forth. I even started writing my own .NET-based web framework built around feature toggles, though as I’m no longer working with .NET, it’s not being actively developed.

However, many of my original points still stand. Short-lived branches, such as pull requests, are fine, and should in fact be the default, because code should be reviewed before it is integrated, not after the fact. Furthermore, if you’re using feature toggles solely as a substitute for branching and merging, you are releasing code into production that you know for a fact to be immature, untested, buggy, unstable and not fit for purpose. Your feature toggles are supposed to isolate this code of course, but there is always a risk that the isolation could be incomplete, or that the toggle could be flipped prematurely by mistake. When feature branches go wrong, they only go wrong in your development environment, and the damage is relatively limited. When feature toggles go wrong, on the other hand, they go wrong in production—sometimes with catastrophic results.

Just how catastrophic? The feature toggles article on Martin Fowler’s own website itself cites an example where badly implemented feature toggles cost one company $460 million in just 45 minutes. The company concerned, we are told, “went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.” To be fair, there were other problems—they relied extensively on manual deployment processes, for example—but it is still a cautionary tale for anyone who thinks feature toggles should always be viewed as a Best Practice without exception.

Of course, not all feature toggles carry this level of risk, and in many cases, the effects of inadvertent exposure will be mostly harmless. In these cases, feature toggles will indeed be the better option, not least because they’re easier to work with. But in general, when deciding whether to use a feature branch or a feature toggle, always ask yourself the question: what would be the damage that this feature would cause if it were activated prematurely? If it’s not a risk you’re prepared to take, your code is better off on a separate branch.

Productivity suggestion: keep lab notes

I’ve recently been getting into various discussions about science and faith with friends at church and on Facebook, and one of the things I’ve been doing is explaining to them exactly how the scientific method works. This got me thinking—there are some aspects of the scientific method that we don’t always apply to software development, that maybe we should.

People with no scientific training often don’t appreciate just how rigorous and exacting the scientific method is. It has a lot of very strict protocols to ensure that research meets stringent standards of quality control. One of these protocols is keeping detailed laboratory notes. In other words, you document everything you do. In meticulous detail. As you do it—not after the fact.

There are several benefits to keeping a detailed record of everything you do in this way. It helps you to keep focused on the task at hand. It gives you something more constructive to do during slow edit/compile/test loops than reading Reddit. It helps you to pick up where you left off after your lunch break or when you get into work first thing in the morning. It jogs your memory for your daily stand-up meeting. It provides you with a searchable audit trail that you can consult to figure out where to find things, why things were done in a certain way, or what exactly happened when you tried X. It provides source material for when you need to write up the documentation proper. And it covers your back. For example, if you’re working from home, it provides evidence—should you need it—that you were actually working and not slacking off.

Of course, if you’re a fan of the Agile Manifesto, you are probably cringing at this idea. Aren’t we supposed to favour working software over comprehensive documentation? In fact, the whole idea no doubt sounds like anathema to many software developers, who hate writing documentation with a passion.

But if we were research scientists rather than software developers, we would be doing this as a matter of course, and we wouldn’t think twice about it. Besides, as Martin Fowler says, if it hurts, do it more often.

Sharing data between Terraform configurations

I’ve been doing quite a lot of work with Terraform lately, Hashicorp’s excellent infrastructure-as-code software, to provision some infrastructure on AWS.

When designing your infrastructure, it is often necessary to break it up into different configurations. For example, you may want to have one set of configuration for your data (databases, S3 buckets, EBS volumes, and so on), and another set of configuration for various compute resources (EC2 instances, ECS containers, AWS Lambda scripts, and so on) — for example, to allow you to tear down your development environment when you are not using it in order to reduce costs, while preserving your data. Or, you may want to reference your DNS hosted zones in multiple different configurations.

When you do this, you will inevitably need to pass state information from one configuration through to the next — for example, EBS volume identifiers, IP addresses, and so on. If you are using remote state, you can do this quite simply using a remote state data source.

In this example, I’ll assume that you have configured Terraform to use remote state hosted in an S3 bucket, that you are using a separate configuration called “dns” for your DNS hosted zones, and that you wish to reference them in a different configuration.

First declare the values that you wish to reference as outputs:

output "zone_id" {
    value = "${module.dns.zone_id}"
}

output "zone_name" {
    value = "${module.dns.zone_name}"
}

In your target configuration, declare the remote state data source:

data "terraform_remote_state" "dns" {
    backend = "s3"
    config {
        bucket = "name-of-your-S3-bucket"
        key = "dns"
        region = "eu-west-1"
    }
}

Then, whenever you need to use any of the remote values, you can reference them as follows:

resource "aws_route53_record" "www" {
   zone_id = "${data.terraform_remote_state.dns.zone_id}"
   name = "www.${data.terraform_remote_state.dns.zone_name}"
   type = "A"
   ttl = "300"
   records = ["${aws_eip.lb.public_ip}"]
}

Note that you will need to run terraform apply on the source environment, even if you have not made any changes, before you attempt to do anything with the target environment. This is because you need to make sure that the outputs have been written into the Terraform state file, and are available to the other modules that depend on them.

Hat tip: Thanks to Paul Stack for pointing me in the right direction here.

Password hashing as a microservice with Docker Compose

So you’ve stopped breaking the law by storing your passwords in plain text, and you’re aware that MD5 doesn’t cut it any more in a world of GPU-based cracking tools, so you’ve started using bcrypt instead. But can you make your passwords any more secure? Can you even protect your users who have chosen passwords such as “password” or “qwerty” or “123456” or “letmein”? And since bcrypt is so computationally expensive, how can you stop an attempt to brute force your admin password from bringing your entire site down?

Here’s what you can do. Use a microservice.

This gives you two advantages:

  1. Your passwords are stored in a separate database from your user accounts. The “password” field in your users table only contains a randomly allocated identifier; without the password database as well, this tells an attacker nothing. Nada. Nichts. Bupkis.
  2. Your password hash algorithm can be scaled independently from the rest of your site. It can be run on completely different hardware. A brute force attack on your admin site won’t end up DOS-ing everything else, no matter how slow you make it.

Over the past week or so I’ve been experimenting with an implementation of this approach. You can find it on GitHub here.

How it works

Since it’s 2016 and Docker is all the rage, the password hashing microservice, the main web application, and their respective databases, are all in separate Docker containers, managed as a group using Docker Compose and defined in the docker-compose.yml file. Tools such as Docker Swarm or Kubernetes make it a breeze to scale up and down as needed.

The password hashing microservice is in /passwords in the Git repo. It is implemented as a very basic Flask application in passwords/serve.py, storing the passwords in a MongoDB database. It exposes three methods:

  • POST /password: hash a password, allocate it a random ID, and save it.
  • POST /password/test/<id>: test a password against the saved hash with the given ID.
  • DELETE /password/<id>: delete the hash with the given ID from the database.

The POST methods both take the password in a form field called password. The password ID returned by the first method is a randomly generated GUID.

The sample web application is a Django application in /web in the Git repo, backed by a Postgresql database. The file web/webapp/security.py contains a custom password hasher which saves the GUID returned from the microservice in the password column in the users table. The password ID thereby acts as a proxy for the password hash, but since it can not be derived from the hash, the password can not be brute forced from it.

The docker-compose.yml file also puts a Træfɪk proxy server between the web application and the microservice to act as a load balancer. This allows you to scale the microservice as necessary simply by typing docker-compose scale passwords-svc=<n> where <n> is the number of containers you wish to spin up.

Ramping up the security even further…

To make things even harder for an attacker, the hashing service combines both the password and its identifier with application-specific secret strings before saving them into the password database. These secrets will frustrate anyone who manages to get hold of both the user database and the password database, as unless you know their values, you will not be able to brute force the passwords, nor will you be able to tell which password hashes correspond to which users in the database.

These secrets are stored as environment variables called PASSWORD_SECRET and KEY_SECRET and are defined in this example in the docker-compose.yml file. Naturally, you should change them in your own application to other, similarly complex, random strings. If possible, you should load them in from a dedicated secret store such as Hashicorp Vault.

Unless the attacker has access to the user database and the password database and both these secret keys, they aren’t going to be able to crack any of your users’ passwords. Even if your users have used silly passwords such as “password” or “123456” or “arsenal” or “chelsea” their credentials will still be safe.

This is what microservices are for.

There’s a lot of debate going on at the moment about how to use microservices. Should you keep your application as a single monolith and only extract out certain tiny pieces of functionality, or should you break it down into a large number of separate microservices? In each case, just how much should each microservice handle?

Password hashing, as we’ve implemented it here, is a great example of where microservices are appropriate, and how much they should try to achieve. We’ve identified some very specific problems. There is no speculation, guesswork or YAGNI involved. Most importantly, we’ve seen that they provide well defined real-world benefits.

The Continuous Retrospective

The sprint retrospective is one of the most important ceremonies in Scrum. At the end of every sprint, the team meets together to discuss what went well, what went badly, and how you can improve your processes  and working practices.

Effective as it may be, the end-of-sprint retrospective isn’t perfect. It’s very common to encounter problems during the sprint, only to get to the retrospective and  find that you’ve forgotten what they were. On the other hand, you can end up wasting a lot of time complaining about things that you aren’t able to do anything about.

To resolve these problems, we have recently implemented a “continuous retrospective” within our team. Instead of waiting until  the end of the sprint, we can highlight problems as and when they arise, and action them accordingly.

continuous retrospective

Besides the fact that issues are less likely to be forgotten, there are several other advantages to this approach. It is less formal, and a better use of people’s time. It improves communication. It can also improve visibility and transparency, allowing people to see that you are aware of, and addressing, issues that you are encountering.

A different approach.

This is a different approach from your traditional, end-of-sprint retrospective. For starters, it is less meeting-oriented. You simply set up a kanban-style board in your team’s working area, where you can add Post-It notes for things that are going well, things that need action, and things that need further discussion, perhaps to reach a team consensus.

Write things down as soon as you think of them — don’t wait until the end of the sprint, or even until the daily stand-up meeting. You should aim to discuss them with the rest of the team as soon as possible, and reach a consensus on what action to take. Actions may be, for example:

  • Creating backlog tasks for the items you bring up
  • Updating documentation
  • Communicating problems or findings with other teams or the wider software development community as appropriate (e.g. through email, Twitter, blogs, or forums)
  • Flagging issues to raise with senior management.

Don’t just limit items to things that are within your team’s remit. Anything that needs attention or can be improved is fair game — the appropriate action may be to raise it with other teams or senior management, for example.

Do you still need an end of sprint retrospective?

Maybe, or maybe not. It depends on your team.

Some teams may find that a continuous retrospective renders the end of sprint retrospective mostly superfluous. Others may find that the end of sprint retrospective still provides value — for example, by allowing you to check your progress, or to prioritise issues that have proven to be more complex to address.

But regardless of whether we retain the end of sprint retrospective or not, our goal is to make the continuous retrospective, as the sprint is in progress, the main driver of our team’s improvement. Agility is all about keeping your feedback loops as tight as possible, and like continuous integration and continuous delivery, continuous retrospectives are another way to achieve that end.

Thanks to Steven Wade, Vinitha Devadas and Dan Barrett for their contributions to this post.