james mckay dot net
because there are few things that are less logical than business logic

Posts tagged: passwords

Password hashing as a microservice with Docker Compose

So you’ve stopped breaking the law by storing your passwords in plain text, and you’re aware that MD5 doesn’t cut it any more in a world of GPU-based cracking tools, so you’ve started using bcrypt instead. But can you make your passwords any more secure? Can you even protect your users who have chosen passwords such as “password” or “qwerty” or “123456” or “letmein”? And since bcrypt is so computationally expensive, how can you stop an attempt to brute force your admin password from bringing your entire site down?

Here’s what you can do. Use a microservice.

This gives you two advantages:

  1. Your passwords are stored in a separate database from your user accounts. The “password” field in your users table only contains a randomly allocated identifier; without the password database as well, this tells an attacker nothing. Nada. Nichts. Bupkis.
  2. Your password hash algorithm can be scaled independently from the rest of your site. It can be run on completely different hardware. A brute force attack on your admin site won’t end up DOS-ing everything else, no matter how slow you make it.

Over the past week or so I’ve been experimenting with an implementation of this approach. You can find it on GitHub here.

How it works

Since it’s 2016 and Docker is all the rage, the password hashing microservice, the main web application, and their respective databases, are all in separate Docker containers, managed as a group using Docker Compose and defined in the docker-compose.yml file. Tools such as Docker Swarm or Kubernetes make it a breeze to scale up and down as needed.

The password hashing microservice is in /passwords in the Git repo. It is implemented as a very basic Flask application in passwords/serve.py, storing the passwords in a MongoDB database. It exposes three methods:

  • POST /password: hash a password, allocate it a random ID, and save it.
  • POST /password/test/<id>: test a password against the saved hash with the given ID.
  • DELETE /password/<id>: delete the hash with the given ID from the database.

The POST methods both take the password in a form field called password. The password ID returned by the first method is a randomly generated GUID.

The sample web application is a Django application in /web in the Git repo, backed by a Postgresql database. The file web/webapp/security.py contains a custom password hasher which saves the GUID returned from the microservice in the password column in the users table. The password ID thereby acts as a proxy for the password hash, but since it can not be derived from the hash, the password can not be brute forced from it.

The docker-compose.yml file also puts a Træfɪk proxy server between the web application and the microservice to act as a load balancer. This allows you to scale the microservice as necessary simply by typing docker-compose scale passwords-svc=<n> where <n> is the number of containers you wish to spin up.

Ramping up the security even further…

To make things even harder for an attacker, the hashing service combines both the password and its identifier with application-specific secret strings before saving them into the password database. These secrets will frustrate anyone who manages to get hold of both the user database and the password database, as unless you know their values, you will not be able to brute force the passwords, nor will you be able to tell which password hashes correspond to which users in the database.

These secrets are stored as environment variables called PASSWORD_SECRET and KEY_SECRET and are defined in this example in the docker-compose.yml file. Naturally, you should change them in your own application to other, similarly complex, random strings. If possible, you should load them in from a dedicated secret store such as Hashicorp Vault.

Unless the attacker has access to the user database and the password database and both these secret keys, they aren’t going to be able to crack any of your users’ passwords. Even if your users have used silly passwords such as “password” or “123456” or “arsenal” or “chelsea” their credentials will still be safe.

This is what microservices are for.

There’s a lot of debate going on at the moment about how to use microservices. Should you keep your application as a single monolith and only extract out certain tiny pieces of functionality, or should you break it down into a large number of separate microservices? In each case, just how much should each microservice handle?

Password hashing, as we’ve implemented it here, is a great example of where microservices are appropriate, and how much they should try to achieve. We’ve identified some very specific problems. There is no speculation, guesswork or YAGNI involved. Most importantly, we’ve seen that they provide well defined real-world benefits.

Your password hash algorithm is (probably) snake oil

For several years now, it’s been standard practice among web developers who know what they’re doing to store passwords as a one-way salted SHA-1 hash. Using a salt means that they aren’t vulnerable to rainbow table attacks, for instance, so the only realistic option open to hackers is a dictionary attack, which is slower. Or so the thinking goes at any rate.

There’s just one problem. Dictionary attacks are blazingly fast these days, thanks to the massive parallelism that you can get from the GPU in your graphics card. Just how fast? Coda Hale explains:

Rainbow tables, despite their recent popularity as a subject of blog posts, have not aged gracefully. CUDA/OpenCL implementations of password crackers can leverage the massive amount of parallelism available in GPUs, peaking at billions of candidate passwords a second. You can literally test all lowercase, alphabetic passwords which are ≤7 characters in less than 2 seconds. And you can now rent the hardware which makes this possible to the tune of less than $3/hour. For about $300/hour, you could crack around 500,000,000,000 candidate passwords a second.

How fast is that? Let’s put it this way: for all but your most security-conscious users, a salted SHA-1 now offers no more protection than clear text. Commodity hardware (an off-the-shelf laptop) can test candidate passwords at a rate of up to five billion per second. Given how bad people are at choosing secure passwords, huge swathes of your user base could easily have their credentials cracked at a rate of several thousand every second.

If you are serious about protecting your users’ passwords (and you should be, because if you’re not, you’re legally on thin ice), you need to use a salted hash algorithm that is designed specifically with passwords in mind. The one that seems to be getting most mindshare at the moment is bcrypt. Its killer features are that it is (a) slow, and (b) adaptive. By tweaking its “work factor,” you can decide whether it takes milliseconds or hours, as well as making it slower as hardware gets faster.

(Bcrypt implementations: .NET | Java | PHP | Python | Ruby | Perl | Erlang)

Just how slow should you make it? As slow as you can get away with. From your point of view, your login screen will be one of the less visited pages on your website. People will only enter their password once a week or so, and when they do, they will be pretty serious about using your site and willing to put up with a short delay before they are authenticated. A delay of about 100-500 milliseconds won’t faze them. On the other hand, an attacker cares massively how fast your hash algorithm is. If he can only test a couple of candidate passwords a second, a dictionary attack will be totally impractical for all but your most idiotic users — the kind who choose “password” or “123456” as their passwords.

There are a couple of other secure password hashing algorithms available, namely PBKDF2 and scrypt. They each have their advantages and disadvantages, and as with all things programmer, they are the subject of a bit of a Religious War as to which one you should use. If you’re interested in the ins and outs of the debate, this Hacker News thread and this question on security.stackexchange.com provide some good discussion on the pros and cons of each. But whatever you do, don’t you dare use a naive MD5, or SHA-1, or SHA-256, or SHA-512 for your passwords. When it comes to hashing passwords, these general-purpose algorithms are simply not fit for purpose.