james mckay dot net
because there are few things that are less logical than business logic

Posts tagged: git

Martin Fowler and feature branches revisited

Fábio asks me this by e-mail:

I found this very interesting article (http://web.archive.org/web/20110721063430/https://jamesmckay.net/2011/07/why-does-martin-fowler-not-understand-feature-branches/) that once was in your blog, and it seems it can only be accessed in the archive, or at least I was not able to find it in the live version of your blog.

Is there a reason for that? Do you still keep the same opinion? If not, can you give me a quick hint of what to read to reach the same conclusion?

In answer to his question, I took the original blog post down in January 2013 along with everything else on my blog, because I wanted to give it a complete reboot. In the years since then, I’ve restored some of them going as far back as 2009,  but I hadn’t restored that particular post, mainly because I felt that I’d been too confrontational with it and it hadn’t made me look good. However, since it’s out there in archive.org, and people are still asking about it, I’ve now restored it for historical reference. Unfortunately, I haven’t restored the comments because I no longer have a backup of them.

A bit of historical background.

I wrote the original post in July 2011. At the time, distributed version control tools such as Git and Mercurial had been around for about six years, but were still very new and unfamiliar to most developers. Most enterprise organisations were deeply entrenched in old-school tools such as Subversion and Team Foundation Server, which made branching and merging much harder and much more confusing than necessary, most corporate developers were terrified of the concept, and vendors of these old-school tools were playing on this fear for all it was worth. This was intensely frustrating to those of us who had actually used Git and Mercurial, could clearly see the benefits, and yet were being stonewalled by 5:01 dark matter colleagues and managers who didn’t want to know. To us, Subversion was The Enemy, and to see someone of Martin Fowler’s stature apparently siding with the enemy was maddening.

On the other hand, these tools weren’t yet quite enterprise ready, with only weak Windows support and rudimentary GUIs. SourceTree was not yet a thing. Furthermore, the best practices surrounding them were still being thrashed out and debated by early adopters in the blogosphere and at conferences. In a sense, nobody properly understood feature branches yet. If you read all the discussions and debates at the time, we didn’t even agree about exactly what feature branches were. Martin Fowler, Jez Humble and others were working with the assumption that they referred to branches lasting several days or even weeks, while people such as Adam Dymitruk and myself went by a much broader definition that basically amounted to what we would call a pull request today, albeit with the proviso that they should be kept as short as possible.

These days, of course, everybody (or at least, everybody who’s worth working for) uses Git, so that particular question is moot, and we can now freely discuss best practices around branching and merging as mature adults.

The state of the question in 2017.

I’m generally somewhat more sympathetic to Martin Fowler’s position now than I was six years ago. Most of his concerns about feature branches are valid ones. Long-lived feature branches can be difficult to work with, especially for inexperienced teams and badly architected codebases, where big bang merges can be a significant problem. Feature branches can also be problematic for Continuous Integration and opportunistic refactoring, so they should very much be the exception rather than the rule. Feature toggles can also provide considerable benefits. You can use them for A/B testing, country-specific features, premium features for paying customers, and so on and so forth. I even started writing my own .NET-based web framework built around feature toggles, though as I’m no longer working with .NET, it’s not being actively developed.

However, many of my original points still stand. Short-lived branches, such as pull requests, are fine, and should in fact be the default, because code should be reviewed before it is integrated, not after the fact. Furthermore, if you’re using feature toggles solely as a substitute for branching and merging, you are releasing code into production that you know for a fact to be immature, untested, buggy, unstable and not fit for purpose. Your feature toggles are supposed to isolate this code of course, but there is always a risk that the isolation could be incomplete, or that the toggle could be flipped prematurely by mistake. When feature branches go wrong, they only go wrong in your development environment, and the damage is relatively limited. When feature toggles go wrong, on the other hand, they go wrong in production—sometimes with catastrophic results.

Just how catastrophic? The feature toggles article on Martin Fowler’s own website itself cites an example where badly implemented feature toggles cost one company $460 million in just 45 minutes. The company concerned, we are told, “went from being the largest trader in US equities and a major market maker in the NYSE and NASDAQ to bankrupt.” To be fair, there were other problems—they relied extensively on manual deployment processes, for example—but it is still a cautionary tale for anyone who thinks feature toggles should always be viewed as a Best Practice without exception.

Of course, not all feature toggles carry this level of risk, and in many cases, the effects of inadvertent exposure will be mostly harmless. In these cases, feature toggles will indeed be the better option, not least because they’re easier to work with. But in general, when deciding whether to use a feature branch or a feature toggle, always ask yourself the question: what would be the damage that this feature would cause if it were activated prematurely? If it’s not a risk you’re prepared to take, your code is better off on a separate branch.

The three essential files required by every Git repository

There are three files that you should add to every new Git repository, right from the outset. Unfortunately I frequently see projects that don’t have all three of these files, for whatever reason. These are they.

.gitignore

This is the one you’re most likely to have. However, you may be doing it wrong.

You’ll no doubt be aware what .gitignore does—it specifies a list of file patterns to ignore. However, did you know that GitHub maintains a repository of technology-specific .gitignore files? Or did you know that there is a website called gitignore.io that lets you combine two or more of them into one? So for example, if you are using Visual Studio together with Node.js, Python and Visual Studio Code, you can request a .gitignore file that contains all four sets of patterns.

This should be your starting point when you’re setting up your .gitignore file. Some platforms, such as Visual Studio, have some pretty complex ignore requirements, and it’s all too easy to end up ignoring too much, or too little. But since these ready-made .gitignore templates are peer reviewed and have proven themselves in numerous projects, they save you a lot of guesswork. Additionally, because they cover most if not all of the cases that you need, they are generally a case of “set it and forget it.”

.gitattributes

As you will no doubt be aware, Windows handles line endings differently from Unix or OS X. Windows uses the ASCII control characters CR and LF (0x0D, 0x0A) to indicate an end of line, whereas Unix and OS X use only LF (0x0A).

To allow users of different operating systems to work on the same codebase, Git can be configured either to normalise line endings on check-in and check-out, or to leave them as-is, using the core.autocrlf configuration setting. However, not everyone configures Git the same way, and this can cause confusion (and unnecessary merge conflicts) if you’re not careful, as well as confusing certain text editors such as Notepad or gedit. To avoid problems here, you can (and should) override the core.autocrlf setting for your project using a .gitattributes file. This should contain just one line:

* text=auto

You can set other options with .gitattributes, but this simple example will be sufficient for 99% of cases. It will ensure that text files are checked out on Windows with CRLF endings and on Unix/OS X with LF endings.

In some cases, you may need to make sure that code is checked out with Unix (LF) line endings on both platforms. This can be the case if your files get shared between Windows and Unix systems by mechanisms other than Git — for example, using tools such as Vagrant, Terraform or Docker. In this case, use the following line:

* text=auto eol=lf

Note however that if you are using git-svn against a Subversion repository, you want to make sure that line ending normalisation is turned off, otherwise both Git and Subversion will attempt to handle line endings, leading to confusion among Subversion users who aren’t using git-svn. In this case, you should use:

* -text

README.md

The third file does not actually affect Git’s behaviour. However, it is every bit as important. Your README.md file — a Markdown document located in your root directory is the home page for your developer documentation.

I say this because it is the first place that anyone working on your project will see. GitHub, Bitbucket, GitLab and most other modern Git hosts render it when you visit your source repository’s home page in your browser. As such, even if your actual developer documentation home page is elsewhere, you should at the very minimum have a readme file containing a link pointing to it. This is a well-established standard, and by sticking to it you will make your documentation much easier to find.

Signing Git commits with GPG on Windows

One of the “gotchas” with Git is that it allows you to check in code as anyone. By running git config user.name and git config user.email, you can put anyone’s name to your commits—Stephen Hawking, Linus Torvalds, Henry VIII, or even me. If you want an idea of some of the problems this can cause, Mike Gerwitz’s article A Git Horror Story is a cautionary tale.

To resolve this problem, Git allows you to sign commits using GPG (GNU Privacy Guard, the GNU implementation of PGP), and in fact, Git includes the command-line version of GPG out of the box. You can run it within a Git Bash console.

However, most Windows users would prefer a GUI-based version, and gpg4win (GPG for Windows) is your go-to option here. You can install it either using the downloadable installer or else via Chocolatey.

To use gpg4win with Git needs a little bit of configuration, but first we’ll generate a new certificate. Go to your Start menu and start up Kleopatra, the gpg4win key manager:

image

Now click on the “File” menu and choose “New certificate…”

image

Choose the first option here—a personal OpenPGP key pair. Enter your name and e-mail address, and optionally a comment:

image

Review the certificate parameters and click “Create Key”:

image

You will be prompted to enter a passphrase:

image

Finally, your key pair will be successfully created:

image

Click on “Make a Backup of Your Key Pair” to back it up to your hard disk.

image

Save your private key somewhere safe. Your password manager database is as good a place as any. (You are using a password manager, aren’t you?)

Once you’ve gone through the wizard, you will see your new key pair in the Kleopatra main window. Click on the “My Certificates” tab if you don’t see it at first:

image

The number in the right hand column, in this case 1B9DC839, is your key ID. You now need to configure Git to use it. Type this in a Git shell, replacing “1B9DC839” with your own GPG key:

git config --global user.signingkey 1B9DC839

Prior to version 2.0, you had to instruct Git to sign each commit one at a time by specifying the -S parameter to git commit. However, Git 2.0 introduced a configuration option that instructs it to sign every commit automatically. Type this at the console:

git config --global commit.gpgsign true

Finally, you need to tell Git to use the gpg4win version of gpg.exe. Git comes with its own version of gpg.exe, but it is the MinGW version—a direct port of the Linux version, which saves your keychain in the ~/.gnupg folder in your home directory. The gpg4win port, on the other hand, saves your keychain in ~/AppData/Roaming/GnuPG. Certificates managed by one won’t be seen by the other. You will also need to use the gpg4win version if you want to use a GUI such as SourceTree, since the MinGW version of gpg.exe is entirely command line based and doesn’t play nicely with Git GUIs. By contrast, the gpg4win version brings up a dialog box to prompt for your password.

git config --global gpg.program "C:\Program Files (x86)\GnuPG\bin\gpg.exe"

If you are using 32-bit Windows, or if you have installed gpg4win into a custom location, you will need to tweak the location of the program. (Update February 2018: the path to gpg.exe has changed with the release of gpg4win 3.0. I have updated the path here to point to the new location. Note also that if you are using a Cygwin shell, you may need to specify the path in a different format — see the comments below.)

To check that it works, commit some code to a repository somewhere. You should be prompted for the passphrase that you entered earlier:

image

You can then verify that your commit has been signed as follows:

$ git log c05ddaa8 --show-signature -1
commit c05ddaa8e9da289fa5148d370b8ba9e5c419df9a
gpg: Signature made 02/24/16 08:08:39 GMT Standard Time using RSA key ID 1B9DC839^M
gpg: Good signature from "James McKay (Signed Git commits) <code@JAMESMCKAY.NET>" [ultimate]^M
Author: James McKay <code@JAMESMCKAY.NET>
Date:   Wed Feb 24 08:07:47 2016 +0000

You won’t be asked for your passphrase every time. Once you’ve entered it once, gpg spins up a process called gpg-agent.exe, which caches it in memory for a while.

Check-in before code review is an antipattern

I’ve never been that satisfied with most explanations that I see on the Internet of why Git is better than Subversion. Usually they wax lyrical about distributed versus centralised workflows or the advantages of branching and merging, but I find that kind of misses something because it takes way too long to get to the point. The question is, what is Git’s biggest advantage, in business terms, over Subversion?

The answer is quite simple. Git supports workflows that Subversion does not, that have significant benefits for your code quality, team collaboration and knowledge sharing.

There are a few such workflows, and the ones that have become popular all have one thing in common. Code gets reviewed before it is merged into the main codebase, rather than waiting till after the fact.

In actual fact, Git’s flexibility about when you conduct code reviews leaves Subversion dead in the water. Pull requests on a web-based Git server allow you to not only review code before it is merged, but to involve the whole team in the code review and even to carry out code reviews on work that is still in progress if you’re that way inclined. In effect, every task, every user story, every feature becomes a complete conversation.

To be fair, you can adopt this workflow with Subversion, using either task branches or patches, but Subversion makes branching and merging so clunky, user-unfriendly and error-prone that it simply isn’t practical, and besides being similarly clunky, submitting patches blocks further work until your submission has been reviewed. Consequently in practice, on most Subversion-based projects, commit-before-review is the norm, and review-before-merge is only used on the most high-impact, high-risk work. And it shows—trunk-based Subversion-hosted projects almost always have a far, far lower quality than pull request-based Git-hosted projects.

Why the difference? Simple. In the commit-before-review workflow, every check-in becomes a fait accompli.

This can be a recipe for disaster.

If bad code gets checked in, you have to explicitly ask for it to be backed out or modified—and you have to follow through to ensure that this is done. Sometimes it can’t be backed out or modified, because other code has been checked in that depends on it. Contentious design decisions can all too easily be steamrollered in without any discussion—and in the event of a disagreement, backing them out can all too easily be filibustered.

There’s also a strong psychological pressure to let standards slip as well. When you’re checking in code as a fait accompli, it’s all too easy to check in ill-thought-out variable names, poor test coverage, that doesn’t follow the team’s coding standards, with useless commit summaries to boot. It’s also far too easy for your reviewer (singular—you seldom if ever get more than one person reviewing your code in this model) to decide to pick his or her battles and only focus on the more important things.

On the other hand, when the default action is “reject,” as with pull requests, the onus is on you as the author of the code to prove that your changes are fit for purpose. This gives you all the more incentive to get things right—to stick to the team’s agreed coding conventions, to write tests, to separate concerns correctly, and so on. It also means that you pay more attention to making your code readable and your commit summaries informative. After all, your team-mates (plural) are going to have to make this judgment call based on whether they can understand what you’ve done or not.

Another significant benefit of pull requests is that they dramatically improve knowledge sharing among the team. A new developer may submit a pull request that reinvents methods that already exist, or that violates coding standards that they didn’t know existed. Pull requests are an opportunity for education here—you can easily point them in the right direction. On the other hand, with commit-before-review, because it is so easy to overlook things such as these, opportunities to educate your team-mates get lost.

One other thing bears saying here. Even if you do manage to adopt a pull request-like workflow with Subversion, you still face one major limitation: changesets in Subversion are immutable. With Git, if the commit history of a task branch makes it difficult to review, you can always ask the author to revise it—clarifying commit summaries, squashing superfluous changesets, and perhaps (for experienced Git users) even teasing changesets apart. You can do this quite effectively with the git rebase --interactive command. With Subversion, on the other hand, once it’s in, you’re stuck with it.

Pull requests are not the only advantage that Git has over Subversion. But they are the most important and the most business-critical. A pull request-based workflow with Git will give you a codebase that is much cleaner and much more robust, with fewer nasty surprises and an informative and useful source history. Trunk-based development in Subversion, on the other hand, can leave you with a very bad taste in your mouth. For this reason, sticking with Subversion raises serious questions about the quality and maintainability of your codebase.

Mercurial is doing better than you think

Here’s something interesting that I came across the other day. Apparently, Facebook has recently switched from Git to Mercurial for source control for its internal projects, and on top of that, they have hired several key members of the core Mercurial crew, including Matt Mackall, the Mercurial project lead.

Their main reason for this decision is performance: some of Facebook’s repositories are so large that they are bringing Git to its knees. They looked at the Git source code and the Mercurial source code and felt that the latter would be easier to fine tune to give them the performance that they needed.

This is an interesting development. With Git now on the verge of overtaking Subversion to the most widely used SCM in corporate settings, it’s tempting to write off Mercurial as something of a lost cause. Back in January, when I posted a suggestion on the Visual Studio UserVoice forums that Microsoft should support Mercurial as well as Git in Team Foundation Server, I thought it would be doing well to get three hundred votes and then plateau. But as it stands, it’s now passed 2,500 votes and still going strong, making it the tenth most popular open request on the forums and the second most popular request for TFS in particular, with nearly twice as many votes as the original DVCS request had. Git may have cornered the market for open source collaboration, but its unnecessarily steep learning curve and often pathological behaviour make it surprisingly unpopular with the majority of developers for whom public collaboration on open source projects is not a priority.

It’ll be interesting to see the outcome of this, but one thing is for certain: it’s not all over yet.