james mckay dot net
because there are few things that are less logical than business logic

Posts tagged: security

Necessary and sufficient conditions

Take a look at these two statements. Are they both saying the same thing?

  1. “If you are using HTTPS, then your website is secure.”
  2. “If you are not using HTTPS, then your website is not secure.”

In actual fact, they are not. Furthermore, only the second statement is true: the first statement is false.

The first statement is an example of a sufficient condition. If it were true, all you would need to do to secure your website would be to install an SSL certificate and you’d be done.

The second statement, on the other hand, is an example of a necessary condition. There are, of course, other things you need to do to ensure that your website is secure: for example, take care to avoid SQL injection and cross-site scripting attacks, keep your servers patched and up to date, and so on. But you still need to use HTTPS in addition to all these. If you don’t, your site will be vulnerable to a man-in-the-middle attack.

You can see the difference if I draw up a truth table for a sufficient condition:

Sufficient condition Other stuff Secure?
No No No
Yes No Yes
No Yes Maybe
Yes Yes Yes

On the other hand, a necessary condition looks like this:

Necessary condition Other stuff Secure?
No No No
Yes No Maybe
No Yes No
Yes Yes Maybe

Some conditions can be both necessary and sufficient. In this case, the truth table looks like this:

Necessary and
sufficient condition
Other stuff Secure?
No No No
Yes No Yes
No Yes No
Yes Yes Yes

A necessary and sufficient condition can be written as “if and only if.” This is sometimes shortened to “iff.”

Insufficient does not mean unnecessary.

The most common misunderstanding that people have about necessary and sufficient conditions is the mistaken belief that one implies the other. Or that a lack of one implies a lack of the other.

  • It is possible for conditions to be sufficient but not necessary.
  • It is possible for conditions to be necessary but not sufficient.

Take, for example, this comment:

Google is just a bully because it is so big. It can go f*** itself. A standard webpage is not insecure and the use of SSL doesn’t make it secure either. Maybe everyone forgets that when SSL certs were comprised. I do work on e-commerce sites and I have seen clients who sites got hacked, not because of lack of SSL, but because of bad code on their backend. The hackers proceeded to add code so they would get emailed the credit card info after it was submitted. The user would never know, because the big green icon in the browser said it was secure. The whole thing is just a way for companies to make money.

This commenter correctly realised that SSL is insufficient but he then assumed that this means that SSL is therefore unnecessary. This is of course incorrect. SSL may be insufficient, but it is very, very necessary.

Unfortunately, in the world of IT security, there are plenty of necessary conditions. But there are no sufficient ones.

SQL injection is the FizzBuzz of web security

FizzBuzz is the (in)famous interview question designed to filter out totally unqualified candidates for a programming job at a very early stage in the process. The kind who can’t solve even the very simplest programming problems and who would be wasting your time and money if you called them in for an interview after the phone screen.

You can—and should—do something similar for web security. Take a look at this snippet of Python code:

def check_password(username, password):
    db = MySQLdb.connect(passwd=DB_PASSWORD, db=DB_DATABASE)
    c = db.cursor()
    c.execute("SELECT password from tblUsers " +
        "WHERE username = \"" + username + "\"")
    row = c.fetchone()
    if row:
        return row[0] == password
    else:
        return False

Did you spot the problem? If you have any significant experience at all as a web developer, it should stand out to you like a sore thumb. You should be able to spot it in seconds, even if you have never used Python before in your life. Even if you’re the kind of .NET-only developer who insists on being spoon-fed by Microsoft and believes that Python is a dangerous heresy, it should still be glaringly obvious to you.

A couple of years ago, I used a similar question to this one on a number of interview candidates—some of them with twenty or more years of experience at a variety of impressive sounding companies. Yet it shocked me just how many of them required very heavy prompting to see it.

If you’re interviewing a candidate for a software developer role, show them this snippet of code. If they can’t tell you in seconds that it contains a SQL injection vulnerability in line 5, don’t hire them. If they can’t tell you why it’s a SQL injection vulnerability, don’t hire them. No exceptions, no excuses.

SQL injection vulnerabilities are quite frankly inexcusable. Out of all the different kinds of security vulnerabilities that you can get, they are the easiest to understand, the easiest to spot, and the easiest to avoid. Anywhere that you see user input being smashed together with any kind of instructions—SQL, SPARQL, LDAP queries, whatever—it should raise a massive red flag. A candidate who can’t spot security vulnerabilities will write security vulnerabilities (or more likely, copy and paste them from the Internet)—and if they can’t spot the simplest vulnerability of the lot, they’re going to have trouble even understanding more complex ones. And that’s before you even get started on other aspects of programming such as data integrity or performance.

With the rise of ransomware and other increasingly nasty exploits, you simply can not afford to be careless or blasé about IT security these days. As software developers, we all have a responsibility to make sure our knowledge and skills are sharp and up to date in this area, and as a recruiter, you can’t afford to take on anyone who isn’t taking this responsibility seriously.

Finally: there is a second glaring security flaw in this snippet, and candidates should be expected to spot it as well. I shall leave that one as an exercise for the reader.

Password hashing as a microservice with Docker Compose

So you’ve stopped breaking the law by storing your passwords in plain text, and you’re aware that MD5 doesn’t cut it any more in a world of GPU-based cracking tools, so you’ve started using bcrypt instead. But can you make your passwords any more secure? Can you even protect your users who have chosen passwords such as “password” or “qwerty” or “123456” or “letmein”? And since bcrypt is so computationally expensive, how can you stop an attempt to brute force your admin password from bringing your entire site down?

Here’s what you can do. Use a microservice.

This gives you two advantages:

  1. Your passwords are stored in a separate database from your user accounts. The “password” field in your users table only contains a randomly allocated identifier; without the password database as well, this tells an attacker nothing. Nada. Nichts. Bupkis.
  2. Your password hash algorithm can be scaled independently from the rest of your site. It can be run on completely different hardware. A brute force attack on your admin site won’t end up DOS-ing everything else, no matter how slow you make it.

Over the past week or so I’ve been experimenting with an implementation of this approach. You can find it on GitHub here.

How it works

Since it’s 2016 and Docker is all the rage, the password hashing microservice, the main web application, and their respective databases, are all in separate Docker containers, managed as a group using Docker Compose and defined in the docker-compose.yml file. Tools such as Docker Swarm or Kubernetes make it a breeze to scale up and down as needed.

The password hashing microservice is in /passwords in the Git repo. It is implemented as a very basic Flask application in passwords/serve.py, storing the passwords in a MongoDB database. It exposes three methods:

  • POST /password: hash a password, allocate it a random ID, and save it.
  • POST /password/test/<id>: test a password against the saved hash with the given ID.
  • DELETE /password/<id>: delete the hash with the given ID from the database.

The POST methods both take the password in a form field called password. The password ID returned by the first method is a randomly generated GUID.

The sample web application is a Django application in /web in the Git repo, backed by a Postgresql database. The file web/webapp/security.py contains a custom password hasher which saves the GUID returned from the microservice in the password column in the users table. The password ID thereby acts as a proxy for the password hash, but since it can not be derived from the hash, the password can not be brute forced from it.

The docker-compose.yml file also puts a Træfɪk proxy server between the web application and the microservice to act as a load balancer. This allows you to scale the microservice as necessary simply by typing docker-compose scale passwords-svc=<n> where <n> is the number of containers you wish to spin up.

Ramping up the security even further…

To make things even harder for an attacker, the hashing service combines both the password and its identifier with application-specific secret strings before saving them into the password database. These secrets will frustrate anyone who manages to get hold of both the user database and the password database, as unless you know their values, you will not be able to brute force the passwords, nor will you be able to tell which password hashes correspond to which users in the database.

These secrets are stored as environment variables called PASSWORD_SECRET and KEY_SECRET and are defined in this example in the docker-compose.yml file. Naturally, you should change them in your own application to other, similarly complex, random strings. If possible, you should load them in from a dedicated secret store such as Hashicorp Vault.

Unless the attacker has access to the user database and the password database and both these secret keys, they aren’t going to be able to crack any of your users’ passwords. Even if your users have used silly passwords such as “password” or “123456” or “arsenal” or “chelsea” their credentials will still be safe.

This is what microservices are for.

There’s a lot of debate going on at the moment about how to use microservices. Should you keep your application as a single monolith and only extract out certain tiny pieces of functionality, or should you break it down into a large number of separate microservices? In each case, just how much should each microservice handle?

Password hashing, as we’ve implemented it here, is a great example of where microservices are appropriate, and how much they should try to achieve. We’ve identified some very specific problems. There is no speculation, guesswork or YAGNI involved. Most importantly, we’ve seen that they provide well defined real-world benefits.

Your password hash algorithm is (probably) snake oil

For several years now, it’s been standard practice among web developers who know what they’re doing to store passwords as a one-way salted SHA-1 hash. Using a salt means that they aren’t vulnerable to rainbow table attacks, for instance, so the only realistic option open to hackers is a dictionary attack, which is slower. Or so the thinking goes at any rate.

There’s just one problem. Dictionary attacks are blazingly fast these days, thanks to the massive parallelism that you can get from the GPU in your graphics card. Just how fast? Coda Hale explains:

Rainbow tables, despite their recent popularity as a subject of blog posts, have not aged gracefully. CUDA/OpenCL implementations of password crackers can leverage the massive amount of parallelism available in GPUs, peaking at billions of candidate passwords a second. You can literally test all lowercase, alphabetic passwords which are ≤7 characters in less than 2 seconds. And you can now rent the hardware which makes this possible to the tune of less than $3/hour. For about $300/hour, you could crack around 500,000,000,000 candidate passwords a second.

How fast is that? Let’s put it this way: for all but your most security-conscious users, a salted SHA-1 now offers no more protection than clear text. Commodity hardware (an off-the-shelf laptop) can test candidate passwords at a rate of up to five billion per second. Given how bad people are at choosing secure passwords, huge swathes of your user base could easily have their credentials cracked at a rate of several thousand every second.

If you are serious about protecting your users’ passwords (and you should be, because if you’re not, you’re legally on thin ice), you need to use a salted hash algorithm that is designed specifically with passwords in mind. The one that seems to be getting most mindshare at the moment is bcrypt. Its killer features are that it is (a) slow, and (b) adaptive. By tweaking its “work factor,” you can decide whether it takes milliseconds or hours, as well as making it slower as hardware gets faster.

(Bcrypt implementations: .NET | Java | PHP | Python | Ruby | Perl | Erlang)

Just how slow should you make it? As slow as you can get away with. From your point of view, your login screen will be one of the less visited pages on your website. People will only enter their password once a week or so, and when they do, they will be pretty serious about using your site and willing to put up with a short delay before they are authenticated. A delay of about 100-500 milliseconds won’t faze them. On the other hand, an attacker cares massively how fast your hash algorithm is. If he can only test a couple of candidate passwords a second, a dictionary attack will be totally impractical for all but your most idiotic users — the kind who choose “password” or “123456” as their passwords.

There are a couple of other secure password hashing algorithms available, namely PBKDF2 and scrypt. They each have their advantages and disadvantages, and as with all things programmer, they are the subject of a bit of a Religious War as to which one you should use. If you’re interested in the ins and outs of the debate, this Hacker News thread and this question on security.stackexchange.com provide some good discussion on the pros and cons of each. But whatever you do, don’t you dare use a naive MD5, or SHA-1, or SHA-256, or SHA-512 for your passwords. When it comes to hashing passwords, these general-purpose algorithms are simply not fit for purpose.