Password hashing as a microservice with Docker Compose

So you’ve stopped breaking the law by storing your passwords in plain text, and you’re aware that MD5 doesn’t cut it any more in a world of GPU-based cracking tools, so you’ve started using bcrypt instead. But can you make your passwords any more secure? Can you even protect your users who have chosen passwords such as “password” or “qwerty” or “123456” or “letmein”? And since bcrypt is so computationally expensive, how can you stop an attempt to brute force your admin password from bringing your entire site down?

Here’s what you can do. Use a microservice.

This gives you two advantages:

  1. Your passwords are stored in a separate database from your user accounts. The “password” field in your users table only contains a randomly allocated identifier; without the password database as well, this tells an attacker nothing. Nada. Nichts. Bupkis.
  2. Your password hash algorithm can be scaled independently from the rest of your site. It can be run on completely different hardware. A brute force attack on your admin site won’t end up DOS-ing everything else, no matter how slow you make it.

Over the past week or so I’ve been experimenting with an implementation of this approach. You can find it on GitHub here.

How it works

Since it’s 2016 and Docker is all the rage, the password hashing microservice, the main web application, and their respective databases, are all in separate Docker containers, managed as a group using Docker Compose and defined in the docker-compose.yml file. Tools such as Docker Swarm or Kubernetes make it a breeze to scale up and down as needed.

The password hashing microservice is in /passwords in the Git repo. It is implemented as a very basic Flask application in passwords/serve.py, storing the passwords in a MongoDB database. It exposes three methods:

  • POST /password: hash a password, allocate it a random ID, and save it.
  • POST /password/test/<id>: test a password against the saved hash with the given ID.
  • DELETE /password/<id>: delete the hash with the given ID from the database.

The POST methods both take the password in a form field called password. The password ID returned by the first method is a randomly generated GUID.

The sample web application is a Django application in /web in the Git repo, backed by a Postgresql database. The file web/webapp/security.py contains a custom password hasher which saves the GUID returned from the microservice in the password column in the users table. The password ID thereby acts as a proxy for the password hash, but since it can not be derived from the hash, the password can not be brute forced from it.

The docker-compose.yml file also puts a Træfɪk proxy server between the web application and the microservice to act as a load balancer. This allows you to scale the microservice as necessary simply by typing docker-compose scale passwords-svc=<n> where <n> is the number of containers you wish to spin up.

Ramping up the security even further…

To make things even harder for an attacker, the hashing service combines both the password and its identifier with application-specific secret strings before saving them into the password database. These secrets will frustrate anyone who manages to get hold of both the user database and the password database, as unless you know their values, you will not be able to brute force the passwords, nor will you be able to tell which password hashes correspond to which users in the database.

These secrets are stored as environment variables called PASSWORD_SECRET and KEY_SECRET and are defined in this example in the docker-compose.yml file. Naturally, you should change them in your own application to other, similarly complex, random strings. If possible, you should load them in from a dedicated secret store such as Hashicorp Vault.

Unless the attacker has access to the user database and the password database and both these secret keys, they aren’t going to be able to crack any of your users’ passwords. Even if your users have used silly passwords such as “password” or “123456” or “arsenal” or “chelsea” their credentials will still be safe.

This is what microservices are for.

There’s a lot of debate going on at the moment about how to use microservices. Should you keep your application as a single monolith and only extract out certain tiny pieces of functionality, or should you break it down into a large number of separate microservices? In each case, just how much should each microservice handle?

Password hashing, as we’ve implemented it here, is a great example of where microservices are appropriate, and how much they should try to achieve. We’ve identified some very specific problems. There is no speculation, guesswork or YAGNI involved. Most importantly, we’ve seen that they provide well defined real-world benefits.

Your password hash algorithm is (probably) snake oil

For several years now, it’s been standard practice among web developers who know what they’re doing to store passwords as a one-way salted SHA-1 hash. Using a salt means that they aren’t vulnerable to rainbow table attacks, for instance, so the only realistic option open to hackers is a dictionary attack, which is slower. Or so the thinking goes at any rate.

There’s just one problem. Dictionary attacks are blazingly fast these days, thanks to the massive parallelism that you can get from the GPU in your graphics card. Just how fast? Coda Hale explains:

Rainbow tables, despite their recent popularity as a subject of blog posts, have not aged gracefully. CUDA/OpenCL implementations of password crackers can leverage the massive amount of parallelism available in GPUs, peaking at billions of candidate passwords a second. You can literally test all lowercase, alphabetic passwords which are ≤7 characters in less than 2 seconds. And you can now rent the hardware which makes this possible to the tune of less than $3/hour. For about $300/hour, you could crack around 500,000,000,000 candidate passwords a second.

How fast is that? Let’s put it this way: for all but your most security-conscious users, a salted SHA-1 now offers no more protection than clear text. Commodity hardware (an off-the-shelf laptop) can test candidate passwords at a rate of up to five billion per second. Given how bad people are at choosing secure passwords, huge swathes of your user base could easily have their credentials cracked at a rate of several thousand every second.

If you are serious about protecting your users’ passwords (and you should be, because if you’re not, you’re legally on thin ice), you need to use a salted hash algorithm that is designed specifically with passwords in mind. The one that seems to be getting most mindshare at the moment is bcrypt. Its killer features are that it is (a) slow, and (b) adaptive. By tweaking its “work factor,” you can decide whether it takes milliseconds or hours, as well as making it slower as hardware gets faster.

(Bcrypt implementations: .NET | Java | PHP | Python | Ruby | Perl | Erlang)

Just how slow should you make it? As slow as you can get away with. From your point of view, your login screen will be one of the less visited pages on your website. People will only enter their password once a week or so, and when they do, they will be pretty serious about using your site and willing to put up with a short delay before they are authenticated. A delay of about 100-500 milliseconds won’t faze them. On the other hand, an attacker cares massively how fast your hash algorithm is. If he can only test a couple of candidate passwords a second, a dictionary attack will be totally impractical for all but your most idiotic users — the kind who choose “password” or “123456” as their passwords.

There are a couple of other secure password hashing algorithms available, namely PBKDF2 and scrypt. They each have their advantages and disadvantages, and as with all things programmer, they are the subject of a bit of a Religious War as to which one you should use. If you’re interested in the ins and outs of the debate, this Hacker News thread and this question on security.stackexchange.com provide some good discussion on the pros and cons of each. But whatever you do, don’t you dare use a naive MD5, or SHA-1, or SHA-256, or SHA-512 for your passwords. When it comes to hashing passwords, these general-purpose algorithms are simply not fit for purpose.

Avoiding password re-use is not that easy

Another wrong argument that occasionally comes up in favour of plain text passwords is that you shouldn’t be responsible for the fact that your users do stupid things, like re-using passwords. Unfortunately, it’s not that simple.

In the eighteen months since I started using KeePass to manage my online passwords, I’ve found that it involves a certain amount of friction. For starters, it’s less user-friendly. Rather than just typing your user name and password straight into your browser, you have to switch to the KeePass window, find your website, and then paste. There are shortcut keys and a search facility to make things easier, but it is a bit of a learning curve. Furthermore, when you register with a new site, you have to fiddle about with the password generator in order to create a new password, and on top of that, some websites have undocumented limitations or bugs in their password forms. One well known website that I use allows you to set passwords of any length, but limits you to 20 characters in the login screen, for example.

That’s a relatively minor complaint, of course. A much more serious difficulty is synchronising passwords between different devices, not all of which may support your password manager of choice. KeePass for the iPhone is not yet available outside the USA and Canada, for example. Third party password managers may not even be an option on certain other Internet-enabled devices such as the PlayStation, Internet TV, and so on. Then there are situations such as a friend’s computer when you don’t have your password database on you; Internet cafes and kiosks; and locked-down workstations on corporate networks.

Clearly, avoiding password reuse requires a certain amount of discipline, sacrifice, and technical know-how. Even many relatively tech-savvy users view it as one of those things that “I must get round to someday” — a bit like taking up exercise or flossing your teeth. So where does that leave your non-technical users?

The overwhelming majority of non-technical users are shockingly ignorant about even the most basic aspects of web use. A couple of years ago, Google interviewed passers-by in Times Square, New York, and found that more than ninety percent of the people they stopped didn’t even know the difference between a web browser and a search engine:

The upshot of all this is that as a data controller, it is totally unrealistic to expect your users not to re-use passwords. They shouldn’t re-use passwords, and they should use a password manager, and they should choose secure passwords, and you should warn them of the risks. But to many of your users, if you try to explain the risks to them and what to do about it, their eyes will just glaze over and they will say to you, “Oh, I’m not a computer person. That’s all too technical to me.” They will re-use passwords. In so doing, many of them will entrust you with the login details to their e-mail, Facebook, PayPal, and possibly even their bank accounts. They shouldn’t, but they do. And with that in mind, you have a responsibility to do everything in your power to protect those details.

Eight wrong reasons why you are storing passwords for clear text recovery

The recent data loss by Sony of seventy million PlayStation users’ personal details was a serious lapse. They have assured us that their users’ credit card details were encrypted, though Graham Cluley of Sophos has asked questions about exactly what kind of encryption they were using.

A more serious issue is that they admitted that users’ passwords were compromised. This can mean only one thing: they were storing passwords in a form that could be recovered to plain text. The press doesn’t tend to make as much noise about passwords as about credit cards, which is a pity, because passwords are at least as valuable to a fraudster, if not more, for the simple reason that the overwhelming majority of people re-use passwords on multiple websites, including high-value ones such as their e-mail and their bank.

It’s a sobering reminder to all of us web developers that we should not be implementing plain text password recovery, period. It is inexcusable. And, it could be argued, illegal in some jurisdictions.

It is inexcusable because it is a security compromise that is totally unnecessary. Yet it is sometimes mandated by pointy-haired bosses and clueless clients despite developers’ objections. There was a question on Stack Overflow a while ago asking how you can ethically approach storing passwords for later plain text retrieval. I replied that you can’t.

Here are some of the excuses that people make for storing passwords in plain text.

1. “My site is small and obscure, nobody is going to try hacking it.”

If you are thinking that way, I have bad news for you. Hackers love small, obscure websites precisely because their developers think exactly that, and they are consequently riddled with SQL injection vulnerabilities exposing plain text passwords. Sites like that are low hanging fruit to a malicious user.

Furthermore, hackers operate botnets that trawl the Internet probing every website they come across for security holes. Even the smallest, most obscure website should expect to be probed by malicious scanners weekly if not daily.

How small and obscure is “smallest and most obscure”? How about my own blog, circa December 2005? At the time, it had been going for only seven months, and since I was (a) a lot more reserved and introverted back then than I am now, and (b) a complete blogging newbie, I had kept a very low profile with it up to then. Only half a dozen or so of my close friends knew of its existence. I had even turned off the feature to ping Technorati and other blog aggregators. Yet there it was, on the receiving end of an SMTP injection probe.

Bottom line: if you are collecting user registrations, you are not small and obscure. Period.

2. “Users’ logins to my site are a low-value asset.”

No they are not.

No matter how much we nag them, how much we tell them to be careful, people will still re-use passwords. There are far too many people who, when you tell them they need to use KeePass or something similar, will just shrug and say to you, “Oh, that’s too complicated for me, I’m not a computer person.” They regard good password practices as a tinfoil hat, because, after all, you’re not Fort Knox.

Consequently, if your password database gets breached, your attackers have not just got away with your users’ logins on your site. They have got away with your users’ logins on Facebook, eBay, PayPal, GMail, and their bank accounts. Give an identity thief access to your clients’ e-mail, and you have given them access to their entire life.

Far from being a low-value asset, your users’ logins are the most valuable asset that you are dealing with. They are arguably even more valuable than their credit card numbers.

3. “A password reset is harder from a usability perspective than a password reminder.”

No it is not. Not if you implement it correctly.

People only think that a password reset is harder to use because they think you have to e-mail your users a new password, and it usually ends up looking like transmission line noise. But there is a better alternative.

E-mail your users a one-time, time-limited link to a page on your website that lets them choose a new password then automatically logs them in once they’ve done so. This is actually easier to use than either a password reset or a password reminder, since one click from their e-mail client opens their web browser straight to the “choose a new password” page.

If you want to see this in action, this is how Amazon does it. So does Twitter. So does any other website that takes security seriously these days.

4. “As an administrator, I need to be able to log on as any other user, so that I can troubleshoot if need be.”

Regardless of whether this user story is a good idea or not (it might be controversial if, for instance, you are Facebook and word of it gets out), you do not need to know the other user’s password in order to facilitate it.

All you need is a button on your user administration page that allows an already authenticated superuser to switch identity to the user in question with a single click of the mouse. It’s exactly the same principle as the Unix su command.

Any web developer who isn’t able to implement this is, quite frankly, incompetent.

5. “Two-way encryption is sufficient.”

No it is not.

If you want to allow your users to recover their passwords through your web front end, your web front end has to be able to decrypt them. If your end users can decrypt their passwords, there is always the possibility that a malicious user can do likewise. If your code has a bug in it that lets a malicious user view passwords to which they are not entitled, that would totally nullify the effect of the encryption. Furthermore, you would have to put both your public key and your private key on a public-facing web server, making it all the easier for an attacker to get hold of the means to decrypt your users’ passwords.

6. “But what about credit cards? You need two-way encryption for them, surely?”

Your users’ credit cards should not be recoverable to plain text by the web server either.

If you look at how Amazon stores your credit card information, you will see that they don’t display it back to you once you’ve typed it in, apart from the last four digits. Yet they need access to this information in order to bill you correctly. So how do they do it?

The answer is simple. The web server only has access to their public key. They put the private key (and their back-end administration system) on a completely different server in a completely different subnet, separated from their web servers by firewalls with strict security.

If you want to store credit card details for later processing, this is what you must do. If you can’t do so, you should use a third-party payment provider such as PayPal, send their credit card details to them using web services, and do not store them in your own database.

Despite this, this is still not as secure as a one-way salted hash, and there is still a chance that an attacker could steal your users’ credit card database and the private key. But in this case there is a limit to how much security you can practically apply without rendering the system unusable. This limit does not apply to passwords, since clear text recovery, as we have seen, is completely unnecessary for passwords.

7. “You need to do a risk assessment before deciding what security is appropriate.”

This one came up on a Stack Overflow discussion a while ago and I just about hit the roof when I saw it.

For crying out loud people, this isn’t a hypothetical, theoretical scenario! It is a scenario that has happened in practice, over and over and over again. It is a scenario that is keeping seventy million PlayStation users awake at night at this very moment. The risks are well known and well documented. If your users’ passwords get compromised, hackers can take control not only of their login to your dinky little website, but also to Facebook, PayPal, their e-mail, their bank. You are exposing them to the risk of identity theft, which can get pretty nasty.

Besides, a risk assessment is only meaningful if you can demonstrate that there is a convincing business case for any security compromises that you have to make. And what compromises do you have to make in order to avoid sending plain text passwords? There aren’t any! I’ve explained above just why the perceived disadvantages are not disadvantages of the sort, and how recoverable passwords are totally unnecessary for a user-friendly, flexible authentication system.

8. “The client/my boss insists on it.”

The best way to approach this is to show them how Twitter and Amazon handle login recovery (“show, don’t tell”), build a prototype, and point to some real-world examples of where things went wrong — the 4chan/Christian dating website that I mentioned a couple of years back is as good an example as any.

If, in spite of this, and being advised that they are probably breaking the law in requiring plain text passwords, they still insist on having passwords recoverable to plain text, you are likely to have other problems with your client. The Daily WTF kind of problems.

If you end up in this kind of situation, sack your client. Or start looking for another job if you’re a permie. Now.

So are there any valid reasons why you can be storing passwords for clear text recovery?

Yes, there is one. And only one. It is this.

You have inherited a legacy application, written by someone else, that has so much bad design and technical debt that it is infeasible to migrate your passwords to an encrypted version.

If you end up in such a situation, get the code under source control before you do anything else, to cover your back if nothing else. Advise the client that they are storing passwords dangerously. Warn them of the implications. Propose an alternative, and quote for it. And recommend that they make it a priority.