james mckay dot net
because there are few things that are less logical than business logic

Posts tagged: continuous integration

Martin Fowler and feature branches revisited

Fábio asks me this by e-mail:

I found this very interesting article (http://web.archive.org/web/20110721063430/https://jamesmckay.net/2011/07/why-does-martin-fowler-not-understand-feature-branches/) that once was in your blog, and it seems it can only be accessed in the archive, or at least I was not able to find it in the live version of your blog.

Is there a reason for that? Do you still keep the same opinion? If not, can you give me a quick hint of what to read to reach the same conclusion?

In answer to his question, I took the original blog post down in January 2013 along with everything else on my blog, because I wanted to give it a complete reboot. In the years since then, I’ve restored some of them going as far back as 2009,  but I hadn’t restored that particular post, mainly because I felt that I’d been too confrontational with it and it hadn’t made me look good. However, since it’s out there in archive.org, and people are still asking about it, I’ve now restored it for historical reference. Unfortunately, I haven’t restored the comments because I no longer have a backup of them.

A bit of historical background.

I wrote the original post in July 2011. At the time, distributed version control tools such as Git and Mercurial had been around for about six years, but were still very new and unfamiliar to most developers. Most enterprise organisations were deeply entrenched in old-school tools such as Subversion and Team Foundation Server, which made branching and merging much harder and much more confusing than necessary, most corporate developers were terrified of the concept, and vendors of these old-school tools were playing on this fear for all it was worth. This was intensely frustrating to those of us who had actually used Git and Mercurial, could clearly see the benefits, and yet were being stonewalled by 5:01 dark matter colleagues and managers who didn’t want to know. To us, Subversion was The Enemy, and to see someone of Martin Fowler’s stature apparently siding with the enemy was maddening.

On the other hand, these tools weren’t yet quite enterprise ready, with only weak Windows support and rudimentary GUIs. SourceTree was not yet a thing. Furthermore, the best practices surrounding them were still being thrashed out and debated by early adopters in the blogosphere and at conferences. In a sense, nobody properly understood feature branches yet. If you read all the discussions and debates at the time, we didn’t even agree about exactly what feature branches were. Martin Fowler, Jez Humble and others were working with the assumption that they referred to branches lasting several days or even weeks, while people such as Adam Dymitruk and myself went by a much broader definition that basically amounted to what we would call a pull request today, albeit with the proviso that they should be kept as short as possible.

These days, of course, everybody (or at least, everybody who’s worth working for) uses Git, so that particular question is moot, and we can now freely discuss best practices around branching and merging as mature adults.

The state of the question in 2017.

I’m generally somewhat more sympathetic to Martin Fowler’s position now than I was six years ago. Most of his concerns about feature branches are valid ones. Long-lived feature branches can be difficult to work with, especially for inexperienced teams and badly architected codebases, where big bang merges can be a significant problem. Feature branches can also be problematic for Continuous Integration and opportunistic refactoring, so they should very much be the exception rather than the rule. Feature toggles can also provide considerable benefits. You can use them for A/B testing, country-specific features, premium features for paying customers, and so on and so forth. I even started writing my own .NET-based web framework built around feature toggles, though as I’m no longer working with .NET, it’s not being actively developed.

However, many of my original points still stand. Short-lived branches, such as pull requests, are fine, and should in fact be the default, because code should be reviewed before it is integrated, not after the fact. Furthermore, if you’re using feature toggles solely as a substitute for branching and merging, you are releasing code into production that you know for a fact to be immature, untested, buggy, unstable and not fit for purpose. Your feature toggles are supposed to isolate this code of course, but there is always a risk that the isolation could be incomplete, or that the toggle could be flipped prematurely by mistake. When feature branches go wrong, they only go wrong in your development environment, and the damage is relatively limited. When feature toggles go wrong, on the other hand, they go wrong in production—sometimes with catastrophic results.

Just how catastrophic? The feature toggles article on Martin Fowler’s own website itself cites an example where badly implemented feature toggles cost one company $460 million in just 45 minutes. The company concerned, we are told, “went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.” To be fair, there were other problems—they relied extensively on manual deployment processes, for example—but it is still a cautionary tale for anyone who thinks feature toggles should always be viewed as a Best Practice without exception.

Of course, not all feature toggles carry this level of risk, and in many cases, the effects of inadvertent exposure will be mostly harmless. In these cases, feature toggles will indeed be the better option, not least because they’re easier to work with. But in general, when deciding whether to use a feature branch or a feature toggle, always ask yourself the question: what would be the damage that this feature would cause if it were activated prematurely? If it’s not a risk you’re prepared to take, your code is better off on a separate branch.

Why you should use a general purpose scripting language for your build scripts

For quite some time now, given the choice, I’ve opted to write my build scripts in plain Python using nothing but the standard libraries. I personally believe (with good reason) that build scripts are best served by a general purpose scripting language such as this, and that domain-specific languages or frameworks for build scripting have little if anything to offer. Most people use DSLs such as NAnt, MSBuild, Rake, Grunt or Gradle for their build scripts simply because they believe That Is How You Are Supposed To Do It, but in most cases it isn’t necessary, and in many cases it is even counterproductive.

In this post, I’d like to explain the reasons why I recommend using general purpose scripting languages and avoiding specialised build frameworks, and to address some commonly held misconceptions about build scripts in general. If I’m saying a lot about MSBuild, that’s because it’s the tool that I have the most experience with; however, many of my points apply to other tools as well, including those that aren’t XML-based.

1. Build scripts are code, not configuration.

Most build frameworks view build scripts as configuration first and code second. This is wrong. Badly wrong.

You see this in the way they adopt a declarative, rather than an imperative, approach, defining your script in terms of “targets” or “tasks” with dependencies between them. This approach can make sense in some situations — in particular, where you have a large number of similar tasks in a complex dependency graph, and you need to allow the build engine to determine the order in which they are run. This is the case, for example, with Visual Studio solutions that consist of a large number of projects.

But top level build scripts don’t work that way. At the topmost level, build scripts are inherently imperative in nature, so a declarative approach doesn’t make a whole lot of sense. Your typical build script consists of a sequence of diverse tasks which are run in an order that you define, or sometimes in a loop based on values in a collection. For example, it may look something like this:

  • Fetch the latest version of your code from source control
  • Delete any leftover files from the previous build
  • Write a file to disk containing version information
  • Fetch your project’s dependencies (NuGet packages, for example)
  • Compile your project
  • Bundle and minify your assets
  • Run your unit tests
  • Run your integration tests
  • Prepare installation packages
  • Deploy your build to the appropriate servers
  • Prepare reports (e.g. code coverage)

Writing your build script imperatively, with each of these steps as a function call in the top level of your code, allows you to see, at a glance, what your script is doing. On the other hand, writing them declaratively, with each task specifying its own dependencies, often requires you to jump around all over your build script just to get a handle on things.

One important thing that build scripts need is control flow structures — conditions, loops, iteration over arrays, parametrised subroutines, variables, and so on. You simply can’t represent these properly with a declarative language. Sure, you can define tasks to handle some of these requirements, such as regex-based find and replace, but that will never be as clear as a purely imperative approach.

I’ve never come across a definitive explanation why build frameworks should all be based around the declarative, configuration-like approach of tasks with dependencies, other than a vague, hand-waving and unsubstantiated claim that “people prefer it that way.” Personally I think it’s more likely that people just saw that this was how make, the granddaddy of build tools, was designed, assumed that it was a Best Practice, and blindly copied it without thinking.

2. Build scripts need to be maintained.

Build scripts don’t tend to change very often — perhaps once every three to six months or so. Consequently it’s tempting to view them as something that you write once and can forget about completely. However, they do change, so readability and maintainability are critical. A well written build script can make all the difference between a change taking half an hour and it taking half a sprint; between it working as intended and being riddled with bugs.

This means, of course, that XML-based build languages, such as MSBuild or NAnt, are a very bad idea. This is nothing to do with a lack of “cool” — it’s a lack of readability and maintainability, pure and simple. XML simply isn’t capable of expressing the kind of control flow structures that you need in a succinct, readable manner. MSBuild is particularly bad here. Its lack of support for looping, iteration or parametrised subroutines makes it difficult if not impossible to write anything more complex than the simplest of build scripts without resorting to painful amounts of copy and paste code. Since DRY is a vital discipline in keeping your code maintainable, anything that forces you to violate it as much as MSBuild does should be avoided with extreme prejudice.

To mitigate the problem, Ant, NAnt and MSBuild allow you to embed snippets of code in other languages, such as PowerShell. Besides the fact that the syntax to do so is so verbose and cumbersome that it’s scarcely worth it, this just raises the question: why not just use PowerShell end to end instead?

3. Build scripts need to be run from the command line.

It’s all too common to find build scripts that are very tightly integrated with the Continuous Integration server. This usually happens when you have vast swathes of configuration settings in TeamCity, TFS, Jenkins or what have you. This causes two problems: first, you have a lot of important and potentially breaking detail that isn’t checked into source control; second, it becomes very difficult if not impossible to run your build on your local machine, end to end, from the command line.

If you can’t run your build from the command line, debugging it will be painful. Every iteration of your edit-compile-test loop will require a separate check-in and a sit-on-your-hands wait for several minutes until it either completes or breaks. This is a very inefficient and wasteful way of doing things. It can also cause problems when you have to track down a regression with git bisect, because you’ll have a whole string of broken revisions to contend with.

4. Build scripts have few other domain specific requirements, if any.

Apart from this, there are only two other requirements that your build scripts have. Your build language needs to be interpreted rather than compiled (otherwise you’ll have a chicken-and-egg problem), and it needs to be able to run other programs: your compiler; NuGet to fetch your dependencies; your test runner; and so on. But that’s pretty much it. Just about any general purpose scripting language — Python, Ruby, PowerShell, bash, DOS batch files, heck even PHP if you’re that way inclined — will fit the bill.

What about the specific (N)Ant/MSBuild tasks that you need to call? Most of these can be implemented quite simply as calls to either the language’s standard library or a command line interface.

Some .NET developers don’t like this approach because they say that using, say, Python or PowerShell would mean having to learn a new language. Personally I find this a very strange argument, because if you’re using MSBuild, you’re doing that already anyway. Not only that, but the learning curve that you’re taking is actually steeper: the conceptual differences between, say, C# and Python are very superficial when compared to the conceptual differences between C# and MSBuild. Besides, learning a scripting language is a skill that can be transferred to other problem domains if necessary, whereas MSBuild is a very specialised and niche language that only ever gets used for build scripts for .NET projects.

Just because you are presented with something that describes itself as a build tool doesn’t mean to say you have to use it. Aim to choose tools and languages that allow you to write code that is easy to read, understand and maintain. You’ll be much more productive and much less stressed — and the people who have to maintain your code after you will thank you for it.

For further reading