james mckay dot net

Because there are few things that are less logical than business logic

Sharing data between Terraform configurations

I’ve been doing quite a lot of work with Terraform lately, Hashicorp’s excellent infrastructure-as-code software, to provision some infrastructure on AWS.

When designing your infrastructure, it is often necessary to break it up into different configurations. For example, you may want to have one set of configuration for your data (databases, S3 buckets, EBS volumes, and so on), and another set of configuration for various compute resources (EC2 instances, ECS containers, AWS Lambda scripts, and so on) — for example, to allow you to tear down your development environment when you are not using it in order to reduce costs, while preserving your data. Or, you may want to reference your DNS hosted zones in multiple different configurations.

When you do this, you will inevitably need to pass state information from one configuration through to the next — for example, EBS volume identifiers, IP addresses, and so on. If you are using remote state, you can do this quite simply using a remote state data source.

In this example, I’ll assume that you have configured Terraform to use remote state hosted in an S3 bucket, that you are using a separate configuration called “dns” for your DNS hosted zones, and that you wish to reference them in a different configuration.

First declare the values that you wish to reference as outputs:

output "zone_id" {
    value = "${module.dns.zone_id}"

output "zone_name" {
    value = "${module.dns.zone_name}"

In your target configuration, declare the remote state data source:

data "terraform_remote_state" "dns" {
    backend = "s3"
    config {
        bucket = "name-of-your-S3-bucket"
        key = "dns"
        region = "eu-west-1"

Then, whenever you need to use any of the remote values, you can reference them as follows:

resource "aws_route53_record" "www" {
   zone_id = "${data.terraform_remote_state.dns.zone_id}"
   name = "www.${data.terraform_remote_state.dns.zone_name}"
   type = "A"
   ttl = "300"
   records = ["${aws_eip.lb.public_ip}"]

Note that you will need to run terraform apply on the source environment, even if you have not made any changes, before you attempt to do anything with the target environment. This is because you need to make sure that the outputs have been written into the Terraform state file, and are available to the other modules that depend on them.

Hat tip: Thanks to Paul Stack for pointing me in the right direction here.


Password hashing as a microservice with Docker Compose

So you’ve stopped breaking the law by storing your passwords in plain text, and you’re aware that MD5 doesn’t cut it any more in a world of GPU-based cracking tools, so you’ve started using bcrypt instead. But can you make your passwords any more secure? Can you even protect your users who have chosen passwords such as “password” or “qwerty” or “123456” or “letmein”? And since bcrypt is so computationally expensive, how can you stop an attempt to brute force your admin password from bringing your entire site down?

Here’s what you can do. Use a microservice.

This gives you two advantages:

  1. Your passwords are stored in a separate database from your user accounts. The “password” field in your users table only contains a randomly allocated identifier; without the password database as well, this tells an attacker nothing. Nada. Nichts. Bupkis.
  2. Your password hash algorithm can be scaled independently from the rest of your site. It can be run on completely different hardware. A brute force attack on your admin site won’t end up DOS-ing everything else, no matter how slow you make it.

Over the past week or so I’ve been experimenting with an implementation of this approach. You can find it on GitHub here.

How it works

Since it’s 2016 and Docker is all the rage, the password hashing microservice, the main web application, and their respective databases, are all in separate Docker containers, managed as a group using Docker Compose and defined in the docker-compose.yml file. Tools such as Docker Swarm or Kubernetes make it a breeze to scale up and down as needed.

The password hashing microservice is in /passwords in the Git repo. It is implemented as a very basic Flask application in passwords/serve.py, storing the passwords in a MongoDB database. It exposes three methods:

  • POST /password: hash a password, allocate it a random ID, and save it.
  • POST /password/test/<id>: test a password against the saved hash with the given ID.
  • DELETE /password/<id>: delete the hash with the given ID from the database.

The POST methods both take the password in a form field called password. The password ID returned by the first method is a randomly generated GUID.

The sample web application is a Django application in /web in the Git repo, backed by a Postgresql database. The file web/webapp/security.py contains a custom password hasher which saves the GUID returned from the microservice in the password column in the users table. The password ID thereby acts as a proxy for the password hash, but since it can not be derived from the hash, the password can not be brute forced from it.

The docker-compose.yml file also puts a Træfɪk proxy server between the web application and the microservice to act as a load balancer. This allows you to scale the microservice as necessary simply by typing docker-compose scale passwords-svc=<n> where <n> is the number of containers you wish to spin up.

Ramping up the security even further…

To make things even harder for an attacker, the hashing service combines both the password and its identifier with application-specific secret strings before saving them into the password database. These secrets will frustrate anyone who manages to get hold of both the user database and the password database, as unless you know their values, you will not be able to brute force the passwords, nor will you be able to tell which password hashes correspond to which users in the database.

These secrets are stored as environment variables called PASSWORD_SECRET and KEY_SECRET and are defined in this example in the docker-compose.yml file. Naturally, you should change them in your own application to other, similarly complex, random strings. If possible, you should load them in from a dedicated secret store such as Hashicorp Vault.

Unless the attacker has access to the user database and the password database and both these secret keys, they aren’t going to be able to crack any of your users’ passwords. Even if your users have used silly passwords such as “password” or “123456” or “arsenal” or “chelsea” their credentials will still be safe.

This is what microservices are for.

There’s a lot of debate going on at the moment about how to use microservices. Should you keep your application as a single monolith and only extract out certain tiny pieces of functionality, or should you break it down into a large number of separate microservices? In each case, just how much should each microservice handle?

Password hashing, as we’ve implemented it here, is a great example of where microservices are appropriate, and how much they should try to achieve. We’ve identified some very specific problems. There is no speculation, guesswork or YAGNI involved. Most importantly, we’ve seen that they provide well defined real-world benefits.


The Continuous Retrospective

The sprint retrospective is one of the most important ceremonies in Scrum. At the end of every sprint, the team meets together to discuss what went well, what went badly, and how you can improve your processes  and working practices.

Effective as it may be, the end-of-sprint retrospective isn’t perfect. It’s very common to encounter problems during the sprint, only to get to the retrospective and  find that you’ve forgotten what they were. On the other hand, you can end up wasting a lot of time complaining about things that you aren’t able to do anything about.

To resolve these problems, we have recently implemented a “continuous retrospective” within our team. Instead of waiting until  the end of the sprint, we can highlight problems as and when they arise, and action them accordingly.

continuous retrospective

Besides the fact that issues are less likely to be forgotten, there are several other advantages to this approach. It is less formal, and a better use of people’s time. It improves communication. It can also improve visibility and transparency, allowing people to see that you are aware of, and addressing, issues that you are encountering.

A different approach.

This is a different approach from your traditional, end-of-sprint retrospective. For starters, it is less meeting-oriented. You simply set up a kanban-style board in your team’s working area, where you can add Post-It notes for things that are going well, things that need action, and things that need further discussion, perhaps to reach a team consensus.

Write things down as soon as you think of them — don’t wait until the end of the sprint, or even until the daily stand-up meeting. You should aim to discuss them with the rest of the team as soon as possible, and reach a consensus on what action to take. Actions may be, for example:

  • Creating backlog tasks for the items you bring up
  • Updating documentation
  • Communicating problems or findings with other teams or the wider software development community as appropriate (e.g. through email, Twitter, blogs, or forums)
  • Flagging issues to raise with senior management.

Don’t just limit items to things that are within your team’s remit. Anything that needs attention or can be improved is fair game — the appropriate action may be to raise it with other teams or senior management, for example.

Do you still need an end of sprint retrospective?

Maybe, or maybe not. It depends on your team.

Some teams may find that a continuous retrospective renders the end of sprint retrospective mostly superfluous. Others may find that the end of sprint retrospective still provides value — for example, by allowing you to check your progress, or to prioritise issues that have proven to be more complex to address.

But regardless of whether we retain the end of sprint retrospective or not, our goal is to make the continuous retrospective, as the sprint is in progress, the main driver of our team’s improvement. Agility is all about keeping your feedback loops as tight as possible, and like continuous integration and continuous delivery, continuous retrospectives are another way to achieve that end.

Thanks to Steven Wade, Vinitha Devadas and Dan Barrett for their contributions to this post.


The three essential files required by every Git repository

There are three files that you should add to every new Git repository, right from the outset. Unfortunately I frequently see projects that don’t have all three of these files, for whatever reason. These are they.


This is the one you’re most likely to have. However, you may be doing it wrong.

You’ll no doubt be aware what .gitignore does—it specifies a list of file patterns to ignore. However, did you know that GitHub maintains a repository of technology-specific .gitignore files? Or did you know that there is a website called gitignore.io that lets you combine two or more of them into one? So for example, if you are using Visual Studio together with Node.js, Python and Visual Studio Code, you can request a .gitignore file that contains all four sets of patterns.

This should be your starting point when you’re setting up your .gitignore file. Some platforms, such as Visual Studio, have some pretty complex ignore requirements, and it’s all too easy to end up ignoring too much, or too little. But since these ready-made .gitignore templates are peer reviewed and have proven themselves in numerous projects, they save you a lot of guesswork. Additionally, because they cover most if not all of the cases that you need, they are generally a case of “set it and forget it.”


As you will no doubt be aware, Windows handles line endings differently from Unix or OS X. Windows uses the ASCII control characters CR and LF (0x0D, 0x0A) to indicate an end of line, whereas Unix and OS X use only LF (0x0A).

To allow users of different operating systems to work on the same codebase, Git can be configured either to normalise line endings on check-in and check-out, or to leave them as-is, using the core.autocrlf configuration setting. However, not everyone configures Git the same way, and this can cause confusion (and unnecessary merge conflicts) if you’re not careful, as well as confusing certain text editors such as Notepad or gedit. To avoid problems here, you can (and should) override the core.autocrlf setting for your project using a .gitattributes file. This should contain just one line:

* text=auto

You can set other options with .gitattributes, but this simple example will be sufficient for 99% of cases. It will ensure that text files are checked out on Windows with CRLF endings and on Unix/OS X with LF endings.

In some cases, you may need to make sure that code is checked out with Unix (LF) line endings on both platforms. This can be the case if your files get shared between Windows and Unix systems by mechanisms other than Git — for example, using tools such as Vagrant, Terraform or Docker. In this case, use the following line:

* text=auto eol=lf

Note however that if you are using git-svn against a Subversion repository, you want to make sure that line ending normalisation is turned off, otherwise both Git and Subversion will attempt to handle line endings, leading to confusion among Subversion users who aren’t using git-svn. In this case, you should use:

* -text


The third file does not actually affect Git’s behaviour. However, it is every bit as important. Your README.md file — a Markdown document located in your root directory is the home page for your developer documentation.

I say this because it is the first place that anyone working on your project will see. GitHub, Bitbucket, GitLab and most other modern Git hosts render it when you visit your source repository’s home page in your browser. As such, even if your actual developer documentation home page is elsewhere, you should at the very minimum have a readme file containing a link pointing to it. This is a well-established standard, and by sticking to it you will make your documentation much easier to find.


Fuzzy dates aren’t as good an idea as you think

I received an e-mail from a colleague the other day about some code that I’d recently pushed to GitHub. Since I’d pushed some more changes round about the time he sent the e-mail, I needed to know which revision he was referring to.

There’s just one problem:


Of course, I could have got the proper times from SourceTree or typing git log in the console, but it’s still annoying, especially since the GitHub page was more easily to hand. And GitHub does show you the exact time as a tooltip—something I missed at the time—but it’s still annoying, especially if you have to hover over half a dozen different datestamps to find the one you’re looking for.

We need to have a rethink about fuzzy dates. Yes, I know that it’s friendly and cuddly and warm and fuzzy and cute (and more interesting to code) to say “two days ago” or “eighteen hours ago,” but when I’m trying to refer back to 17:22 precisely, it’s utterly useless and just adds friction without providing any value whatsoever.


Signing Git commits with GPG on Windows

One of the “gotchas” with Git is that it allows you to check in code as anyone. By running git config user.name and git config user.email, you can put anyone’s name to your commits—Stephen Hawking, Linus Torvalds, Henry VIII, or even me. If you want an idea of some of the problems this can cause, Mike Gerwitz’s article A Git Horror Story is a cautionary tale.

To resolve this problem, Git allows you to sign commits using GPG (GNU Privacy Guard, the GNU implementation of PGP), and in fact, Git includes the command-line version of GPG out of the box. You can run it within a Git Bash console.

However, most Windows users would prefer a GUI-based version, and gpg4win (GPG for Windows) is your go-to option here. You can install it either using the downloadable installer or else via Chocolatey.

To use gpg4win with Git needs a little bit of configuration, but first we’ll generate a new certificate. Go to your Start menu and start up Kleopatra, the gpg4win key manager:


Now click on the “File” menu and choose “New certificate…”


Choose the first option here—a personal OpenPGP key pair. Enter your name and e-mail address, and optionally a comment:


Review the certificate parameters and click “Create Key”:


You will be prompted to enter a passphrase:


Finally, your key pair will be successfully created:


Click on “Make a Backup of Your Key Pair” to back it up to your hard disk.


Save your private key somewhere safe. Your password manager database is as good a place as any. (You are using a password manager, aren’t you?)

Once you’ve gone through the wizard, you will see your new key pair in the Kleopatra main window. Click on the “My Certificates” tab if you don’t see it at first:


The number in the right hand column, in this case 1B9DC839, is your key ID. You now need to configure Git to use it. Type this in a Git shell, replacing “1B9DC839” with your own GPG key:

git config --global user.signingkey 1B9DC839

Prior to version 2.0, you had to instruct Git to sign each commit one at a time by specifying the -S parameter to git commit. However, Git 2.0 introduced a configuration option that instructs it to sign every commit automatically. Type this at the console:

git config --global commit.gpgsign true

Finally, you need to tell Git to use the gpg4win version of gpg.exe. Git comes with its own version of gpg.exe, but it is the MinGW version—a direct port of the Linux version, which saves your keychain in the ~/.gnupg folder in your home directory. The gpg4win port, on the other hand, saves your keychain in ~/AppData/Roaming/GnuPG. Certificates managed by one won’t be seen by the other. You will also need to use the gpg4win version if you want to use a GUI such as SourceTree, since the MinGW version of gpg.exe is entirely command line based and doesn’t play nicely with Git GUIs. By contrast, the gpg4win version brings up a dialog box to prompt for your password.

git config --global gpg.program "c:/Program Files (x86)/GNU/GnuPG/gpg2.exe"

If you are using 32-bit Windows, or if you have installed gpg4win into a custom location, you will need to tweak the location of the program.

To check that it works, commit some code to a repository somewhere. You should be prompted for the passphrase that you entered earlier:


You can then verify that your commit has been signed as follows:

$ git log c05ddaa8 --show-signature -1
commit c05ddaa8e9da289fa5148d370b8ba9e5c419df9a
gpg: Signature made 02/24/16 08:08:39 GMT Standard Time using RSA key ID 1B9DC839^M
gpg: Good signature from "James McKay (Signed Git commits) <code@JAMESMCKAY.NET>" [ultimate]^M
Author: James McKay <code@JAMESMCKAY.NET>
Date:   Wed Feb 24 08:07:47 2016 +0000

You won’t be asked for your passphrase every time. Once you’ve entered it once, gpg spins up a process called gpg-agent.exe, which caches it in memory for a while.


“I’ve never had a problem with it.”

From time to time, when I’m discussing possible tools or techniques for a project with other developers, one of them will defend their choice by saying that they’ve never had a problem with it.

This immediately makes me sceptical. If you say you’ve never had a problem with something technical, it tells me one of two things. Either you’re lying, or you don’t have enough experience with it to be able to recommend it.

Pretty much every tool, every technique, every framework, every language, every best practice, has its pitfalls. It may work well in some situations but not in others. It may only work if you follow a strict set of protocols to the letter. It may have bugs or unintended consequences. It may not scale to meet your needs. It may deliver benefits that you don’t need at the expense of ones that you do. It may not even deliver the benefits it claims to deliver at all. In nearly every case there are multiple ways of getting it badly wrong. More often than not, you don’t discover what the pitfalls are until you’ve been using it for quite some time.

There’s no shame in admitting this. Problems with your approach aren’t necessarily a deal breaker. But if you’re recommending something to me, I want to know that you’ll be able to steer me through the problems and teach me how to avoid them, mitigate them, or recover from them. By saying that you’ve never had a problem with it, you’re telling me that you will not be able to do so. And that is a deal breaker.


Check-in before code review is an antipattern

I’ve never been that satisfied with most explanations that I see on the Internet of why Git is better than Subversion. Usually they wax lyrical about distributed versus centralised workflows or the advantages of branching and merging, but I find that kind of misses something because it takes way too long to get to the point. The question is, what is Git’s biggest advantage, in business terms, over Subversion?

The answer is quite simple. Git supports workflows that Subversion does not, that have significant benefits for your code quality, team collaboration and knowledge sharing.

There are a few such workflows, and the ones that have become popular all have one thing in common. Code gets reviewed before it is merged into the main codebase, rather than waiting till after the fact.

In actual fact, Git’s flexibility about when you conduct code reviews leaves Subversion dead in the water. Pull requests on a web-based Git server allow you to not only review code before it is merged, but to involve the whole team in the code review and even to carry out code reviews on work that is still in progress if you’re that way inclined. In effect, every task, every user story, every feature becomes a complete conversation.

To be fair, you can adopt this workflow with Subversion, using either task branches or patches, but Subversion makes branching and merging so clunky, user-unfriendly and error-prone that it simply isn’t practical, and besides being similarly clunky, submitting patches blocks further work until your submission has been reviewed. Consequently in practice, on most Subversion-based projects, commit-before-review is the norm, and review-before-merge is only used on the most high-impact, high-risk work. And it shows—trunk-based Subversion-hosted projects almost always have a far, far lower quality than pull request-based Git-hosted projects.

Why the difference? Simple. In the commit-before-review workflow, every check-in becomes a fait accompli.

This can be a recipe for disaster.

If bad code gets checked in, you have to explicitly ask for it to be backed out or modified—and you have to follow through to ensure that this is done. Sometimes it can’t be backed out or modified, because other code has been checked in that depends on it. Contentious design decisions can all too easily be steamrollered in without any discussion—and in the event of a disagreement, backing them out can all too easily be filibustered.

There’s also a strong psychological pressure to let standards slip as well. When you’re checking in code as a fait accompli, it’s all too easy to check in ill-thought-out variable names, poor test coverage, that doesn’t follow the team’s coding standards, with useless commit summaries to boot. It’s also far too easy for your reviewer (singular—you seldom if ever get more than one person reviewing your code in this model) to decide to pick his or her battles and only focus on the more important things.

On the other hand, when the default action is “reject,” as with pull requests, the onus is on you as the author of the code to prove that your changes are fit for purpose. This gives you all the more incentive to get things right—to stick to the team’s agreed coding conventions, to write tests, to separate concerns correctly, and so on. It also means that you pay more attention to making your code readable and your commit summaries informative. After all, your team-mates (plural) are going to have to make this judgment call based on whether they can understand what you’ve done or not.

Another significant benefit of pull requests is that they dramatically improve knowledge sharing among the team. A new developer may submit a pull request that reinvents methods that already exist, or that violates coding standards that they didn’t know existed. Pull requests are an opportunity for education here—you can easily point them in the right direction. On the other hand, with commit-before-review, because it is so easy to overlook things such as these, opportunities to educate your team-mates get lost.

One other thing bears saying here. Even if you do manage to adopt a pull request-like workflow with Subversion, you still face one major limitation: changesets in Subversion are immutable. With Git, if the commit history of a task branch makes it difficult to review, you can always ask the author to revise it—clarifying commit summaries, squashing superfluous changesets, and perhaps (for experienced Git users) even teasing changesets apart. You can do this quite effectively with the git rebase --interactive command. With Subversion, on the other hand, once it’s in, you’re stuck with it.

Pull requests are not the only advantage that Git has over Subversion. But they are the most important and the most business-critical. A pull request-based workflow with Git will give you a codebase that is much cleaner and much more robust, with fewer nasty surprises and an informative and useful source history. Trunk-based development in Subversion, on the other hand, can leave you with a very bad taste in your mouth. For this reason, sticking with Subversion raises serious questions about the quality and maintainability of your codebase.


Inseparable concerns

Separation of concerns is often cited as the reasoning behind the traditional three-layer architecture. It is important, otherwise you will end up with a Big Ball of Mud.

However, in order to separate out your concerns, you must first categorise them correctly as either business concerns, presentational concerns or data access concerns. Otherwise you will end up with unnecessary complexity, poor performance, anaemic layers, and/or poor testability.

Unfortunately, most three-layer applications completely fail to categorise their concerns correctly. More often than not this is because it is simply not possible to do so, as some concerns fall into more than one category and can’t be refactored out without introducing adverse effects. I propose the term inseparable concerns for such cases.

The key to separation of concerns is to let it be driven by your tests. Under TDD, the first thing you would do if a particular line of code contained a bug would be to write a failing unit test that would pass given the expected correct behaviour. It is what this test does that tells you whether the code under test is a business concern, a data access concern, or a presentational concern.

It is a presentational concern if the test simulates raw user input, or examines final rendered output. For example, mocking any part of a raw HTTP request (GET or POST arguments, cookies, HTTP headers, and so on), verifying the returned HTTP status code, or examining the output generated by a view. In general, if it’s your controllers or your views that you’re testing, it’s a presentational concern.

It is a business concern if the test verifies the correctness of a business rule. Basically, this means that queries are business concerns, period. If they are not returning the correct results, then they have not implemented some business rule or other correctly. Other examples of business concerns include validation, verifying that the data passed to the database or a web service from a command is correct, or confirming that the correct exception is thrown in response to various failure modes.

It is a data access concern if the test requires the code to hit the database. Note that this is where the so-called “best practice” that your unit tests should never hit the database breaks down: if you are adhering to it strictly, sooner or later you will encounter a bug where it stops you from writing a failing test. Most people, when confronted with such cases, skip this step. Don’t: TDD should take precedence. Set up a test database and write the test already.

It is an inseparable concern if it falls into more than one of the above categories. Pretty much any performance-related optimisation that you do will be an example here. For example, if you have to bypass Entity Framework and drop down to raw SQL, you will have to hit the database to verify that business logic is correct. Therefore, it is both a business concern and a data access concern.

Inseparable concerns are much more prevalent than you might expect. IQueryable<T> is the best that we’ve got in terms of making your business and data access layers separable, but, as Mark Seemann points out, it still falls short because NotSupportedException. Another example is calling .Include() on a DbSet to include child entities. Although this is a no-op on Mock<IDbSet<T>>, you can’t verify that you are making the correct calls to .Include() in the first place without hitting the database. Besides which, if you’re mocking DbSet<T> instead of IDbSet<T>, as you’re supposed to be able to do with EF6, calling .Include() throws an exception.

I would just like to stress here that inseparable concerns are not an antipattern—they are a fact of life. All but the simplest of code bases will have them somewhere. The real antipattern is not introducing them, but trying to treat them as if they were something that they’re not.


“That’s just your opinion” means “I’m not listening”

Ever been trying to present a reasoned argument, with evidence, for something, only to be rebuffed with a response like this?

  • “That’s just your opinion.”
  • “Well, everyone’s entitled to their own opinion.”
  • “You just don’t like it because it’s not cool and trendy.”

What they mean is this:

  • “I’m not listening. I’ve got my fingers in my ears. Neener, neener, neener.”