james mckay dot net
because there are few things that are less logical than business logic

Positive, negative, or error?

In my recent blog post about error handling in Go, I wrote about why I thought that they made the wrong decision by not implementing exception handling. One of the posts I linked to was this article by a Go developer called Dave Cheney.

In another article by the same author, he says that he will prove that Go’s error handling is superior. However, the example he gives is based on flawed logic. Nevertheless, it’s instructive to look at it, because there are some important things that we can learn from it.

His argument is based around this snippet of code:

package main

import "fmt"

// Positive returns true if the number is positive, false if it is negative.
func Positive(n int) bool {
    return n > -1
}

func Check(n int) {
    if Positive(n) {
        fmt.Println(n, "is positive")
    } else {
        fmt.Println(n, "is negative")
    }
}

func main() {
    Check(1)
    Check(0)
    Check(-1)
}

which, as he points out, gives this answer:

1 is positive
0 is positive
-1 is negative

which is wrong, because 0 is not positive. However, as he says, 0 is not negative either. To fix this, he proposes returning an error condition as well:

// Positive returns true if the number is positive, false if it is negative.
// The second return value indicates if the result is valid, which in the case
// of n == 0, is not valid.
func Positive(n int) (bool, bool) {
    if n == 0 {
        return false, false
    }
    return n > -1, true
}

func Check(n int) {
    pos, ok := Positive(n)
    if !ok {
        fmt.Println(n, "is neither")
        return
    }
    if pos {
        fmt.Println(n, "is positive")
    } else {
        fmt.Println(n, "is negative")
    }
}

Here’s the flaw.

What a function does should be defined by its name and any relevant established conventions.

The flaw in Dave’s logic is that he is trying to use the Positive() function for purposes for which it should not be intended. There is nothing in the function’s name that tells you that it will determine whether or not a number is negative. It only tells you that it will determine whether or not it is positive.

You can see this more clearly if we change the requirements a bit. What happens if he was asked to produce a program that, rather than telling whether a number is positive or negative, would tell whether it was prime or Fibonacci? The series of prime numbers goes 2, 3, 5, 7, 11, 13, 17, 19 … whereas the series of Fibonacci numbers goes 1, 1, 2, 3, 5, 8, 13 and so on. But should we have functions IsFibonacci() and IsPrime() that throw errors for 4, 6, 9, 10, 12, 14, 15, 16, 18, 20? Of course not!

What he needs to do instead is declare a second function, Negative():

func Positive(n int) (bool) {
    return n > 0
}

func Negative(n int) (bool) {
    return n < 0
}

func Check(n int) {
    pos := Positive(n)
    neg := Negative(n)
    if pos {
        fmt.Println(n, "is positive")
    } else if neg {
        fmt.Println(n, "is negative")
    } else {
        fmt.Println(n, "is neither")
    }
}

Neither a function called Positive() nor one called Negative() should have any preconditions whatsoever. A number — any number, zero included — is either positive or it isn’t. Zero is not positive, so you would return false. If we were dealing with floating point numbers, NaN (not a number) would not be positive, so you would return false. Strictly speaking, in dynamically typed languages, null, "Hello world", an HTTP client class, or an aardvark, are not positive either, so you would return false.

Remember that, whether you are using exceptions or error codes, an error indicates that your function could not do what its name says that it does. Making zero — or anything else — an error condition violates this rule, is outside of the scope implied by the function’s name and any well known conventions I can think of, and as such, it is counterintuitive, confusing, and wrong.

Supergiant stars are hot vacuums

You may have seen this video showing the relative sizes of different stars, and how massive some of them are:

What’s often not appreciated is that as well as being mind-bogglingly large, the largest supergiant stars are also incredibly tenuous.

Just how tenuous? Let’s take Betelgeuse as an example. There’s some uncertainty about the exact figures, but it has a mass about eleven times that of the sun, and a radius of about 900 solar radii. In other words:

Mass: 2.2×1031 kg
Radius: 6.3×1011 m
Volume: 1036 m3
Density: 2×10-5 kg/m3

On earth, a density of just 2×10-5 kg/m3 would be considered a hard vacuum. It is just one sixty thousandth of the density of the earth’s atmosphere, 1.2 kg/m3. The density of the largest known star in the video, VY Canis Majoris, is roughly similar.

Yet despite this, this gives Betelgeuse an escape velocity of about 70 km/second at its outer extremity. Betelgeuse’s enormous size is due to extreme nuclear fusion reactions in its core, where helium is converted into carbon and oxygen, and then subsequently into heavier elements, culminating in the conversion of silicon to iron. Once its supply of silicon is exhausted, nuclear fusion is no longer possible, the star collapses in on itself, and a supernova explosion results. For Betelgeuse, it is estimated that this will happen sometime in the next million years or so.

What is xkcd “Time” all about?

You can’t go far on the Internet these days without coming across the webcomic, xkcd, by Randall Munroe. Three times a week, he publishes comics giving a twist on science, pop culture, and general knowledge, that are amusing, informative, and at times pretty cryptic. Every so often, he posts something that reaches epic proportions. One of his most spectacular creations was comic number 1190 — “Time.”

At first glance it appears to show this particularly puzzling looking picture:

What is it? And what does it represent?

A brief history of Time

“Time” is, in actual fact, a masterpiece of storytelling that won the 2014 Hugo Award for Best Graphic Story. What you see here is the last frame in an epic series of 3,099 different images that were updated on the hour, every hour, starting on March 25, 2013. For the first five days, they were updated every half hour, but the frequency changed to hourly on March 30. It continued updating with a new picture every day until late July that year. If you click through the picture on the xkcd site, you will be taken to another website, which allows you to scroll through the images and follow the story.

The first picture was every bit as puzzling as the last. It simply showed two characters (known to xkcd fans as Cueball and Megan) sitting on a beach:

The tooltip that appeared when you hovered over the picture initially said, “Wait for it.” So we waited.

Over time, as the story unfolded, we saw them building a sandcastle of epic proportions:

As they worked, they noticed something strange: that the sea was rising. Further, and faster, than they’d ever known it to rise before.

Why is the sea rising? Where is the sea anyway? If you want to see the whole story first, head over to the geekwagon viewer. You may want to do so now because the rest of this post contains spoilers.

Eventually, our twosome decide to set off on a journey to try and find out why the sea is rising. On the way, they encounter some epic scenery, including rivers:

Baobabs:

Waterfalls:

Abandoned dwellings:

They get attacked by a large cat:

After fending off the cat, they spend a tense night out in the open, sleeping beneath the stars:

And finally they meet up with some people in woolly hats who speak a strange language:

The Beanie Bunch take them to their leader, who explains to them why the sea is rising:

The leader then shows them a map telling them how far she thinks the sea will rise, and at this point, all is revealed:

What happened to the Mediterranean?

It turns out that the Straits of Gibraltar had become blocked, the Mediterranean had dried up, and Cueball and Megan were members of a small settlement of just forty people living by the shores of a small, hyper-saline sea at the bottom of the Mediterranean basin, a bit like the Dead Sea of today. But now the blockage at the straits had been breached, water was starting to flow into the basin, and that was why the sea was rising.

This is something that has actually happened before. There is a lot of evidence that 5.97 million years ago, the Straits of Gibraltar were blocked up by the northward movement of the African continental plate towards Europe. This precipitated a period called the Messinian Salinity Crisis, in which the Mediterranean Sea dried up. We can see the legacy of this event in the vast deposits of halite (i.e. salt) on the floor of the Mediterranean basin, and also in large, now filled-in canyons that were carved by the Nile and other rivers that flow into it.

The Messinian Salinity Crisis came to an end about 5.33 million years ago, when the Straits of Gibraltar were breached one last time, in an event called the Zanclean Flood. It is estimated that the Mediterranean took several months to two years to fill up again, with torrents of water flowing in from the Atlantic at a rate a thousand times greater than the flow of the Amazon river. It left its mark by cutting the Straits of Gibraltar into a deep and wide channel, whose maximum depth today is 900 metres.

It turns out that the events of Time are set about 10,000 years into the future — in April, 13291 AD to be precise. What caused the Mediterranean to be blocked up this time round is not stated, but it could perhaps have been an engineering mega-project sometime in the next thousand years or so. There have been ambitious proposals made in the past to dam up the Straits of Gibraltar, lower the level of the Mediterranean, and create massive amounts of hydroelectric power and open up new lands for settlement. The likelihood of such proposals ever being implemented, however, is low, to say the least.

What happened to Cueball and Megan?

Ah, spoilers! You’ll just have to visit the Geekwagon viewer to see the whole story.

Now can somebody please make this delightful epic into a Hollywood movie?

Error handling is the one thing that puts me off learning Go

The Go-pherIt seems that all the cool kids are learning Go these days. It’s certainly appealing to be able to get the performance of C without all the headaches, and to be able to package your program up into a single, tight binary without masses and masses of bloated dependencies. Furthermore, since it’s the language of choice for a lot of important software, such as Kubernetes and Terraform, I’m probably going to have to get my head round it one way or another sooner or later.

But there’s one thing about Go that I really, really do not like one little bit: its approach to error handling. Rather than having exceptions, it reports errors as return codes.

Go’s justification for not having exceptions is as follows:

We believe that coupling exceptions to a control structure, as in the try-catch-finally idiom, results in convoluted code. It also tends to encourage programmers to label too many ordinary errors, such as failing to open a file, as exceptional.

Go takes a different approach. For plain error handling, Go’s multi-value returns make it easy to report an error without overloading the return value. A canonical error type, coupled with Go’s other features, makes error handling pleasant but quite different from that in other languages.

They really need to give some examples to back up this assertion, because when they say that exceptions result in convoluted code, I have no idea what on earth they are talking about. Sure, I’ve seen code that gets exception handling wrong, but that was more due to the code itself being bad rather than any problem with the concept of exceptions itself. I’ve also worked with codebases that get it right, and all I can say is that exceptions done right are much easier to work with and reason about than error codes.

Wrong reasons for objecting to exceptions

There are various reasons why people don’t like exceptions. Some people react against them because they were popularised by Java, and a whole lot of other Bad Things were popularised by Java as well. And yes, maybe Java did make some mistakes by implementing checked exceptions, but please, don’t throw out the baby with the bathwater.

Others complain about exceptions because people get them wrong, doing stupid things like this:

try:
    do_something()
except:
    pass

Please, people. The correct response to misuse is not disuse, but proper use. People will do stupid things with any programming language construct. It doesn’t mean that the constructs themselves are bad.

Others consider not having to write extra error handling code as laziness. But so what? Work is not about favouring busyness over laziness; work is about delivering value to your customers. If you can deliver the same value in half the time with half the bugs and half as many lines of code, you’re not being lazy; you’re being efficient.

Others complain about exceptions crashing their Python or C# code with an ugly looking stack trace. But this is easy to fix, simply by implementing a global exception handler at the top of your code, sending the stack trace to a logging service such as ElasticSearch, and just showing an appropriate message. In any case, which is worse — a stack trace, or a program that silently corrupts your data?

Others complain that they mean that you don’t know which functions might throw errors and which might not. But the only safe assumption that you can make is that any line of code might throw an error — and often for reasons that you are not expecting and did not anticipate.

What are you supposed to do with errors anyway?

The most important thing you need to realise is that both exceptions and error codes should have a very specific meaning — namely, that your method was unable to do what its specification says that it does. It could have failed for any number of reasons: bad user input, missing dependencies, external services having gone offline, timeouts, foreign key violations, null references, division by zero, out of memory, stack overflow, array bounds errors, or even literal bugs. But the important information that they convey is that you asked some other code to do X, and for whatever reason, it did not do X.

It is completely unhelpful to try to categorise them as “ordinary errors” or “exceptional errors.” This distinction is so vague and ambiguous as to be effectively meaningless, and in any case will depend more on context and your own use cases than on any intrinsic properties of the exceptions themselves. The main distinction that you need to make with errors is between those that you have anticipated and can meaningfully correct, and everything else.

For errors that you are able to handle, the correct action will usually be specified in your user stories, and as such, they will need to be handled on a case by case basis. But for errors that you have not yet anticipated, 95% of the time the correct action will be to assume that your own code is also unable to do what it is supposed to, stop what it is doing, and report a failure to the caller.

It is almost never appropriate for your code to carry on regardless after an error. If it does so, it will be running under assumptions that are incorrect. At best, it will result in further errors. At worst, it will silently corrupt your data.

Exceptions are about convention over configuration and safe defaults

Convention over configuration is a principle of language and framework design that says that if one specific course of action predominates, it should be made an implicit convention, and extra code should only be needed to override it.

With error codes, the default behaviour is to do precisely what you are not supposed to do — carry on regardless. Consequently, every single function call needs to be followed by a test for the return value. And the code to do so will be mindlessly, frustratingly repetitive:

    if err := datastore.Get(c, key, record); err != nil {
        return &appError{err, "Record not found", 404}
    }
    if err := viewTemplate.Execute(w, record); err != nil {
        return &appError{err, "Can't display record", 500}
    }

But what this is doing is exactly what exceptions do anyway! The whole point of exceptions is to take this repetitive boilerplate code and make it implicit. Other mechanisms, such as try/catch/finally blocks, exist to provide clear and specific ways to override this convention. The result is code that is clearer and easier to understand, with a significantly improved signal-to-noise ratio.

Yet the language designers of Go consider this boilerplate a virtue!

In Go, error handling is important. The language’s design and conventions encourage you to explicitly check for errors where they occur (as distinct from the convention in other languages of throwing exceptions and sometimes catching them). In some cases this makes Go code verbose, but fortunately there are some techniques you can use to minimize repetitive error handling.

Whatever happened to DRY? Whatever happened to convention over configuration?

Cleaning up

When your exception handling code demands more complex scenarios than just propagating the error up the call stack, 90% of the time it will simply be to clean up: close file handles, release locks, roll back transactions, and then propagate the error condition up the call stack. Furthermore, the cleanup code will usually be common to a whole block of other method calls, any of which could raise an error. This Python code is an example:

f = open('data.json')
try:
    transaction = repository.start_transaction()
    data = json.load(f)
    repository.update(data, transaction)
    transaction.commit()
except:
    transaction.rollback()
    raise
finally:
    f.close()

I shall leave it as an exercise for the reader to translate this block of code into Go. Now Go gives you the defer instruction to allow you to queue up functions to run at the end of your method, but if you have to run separate code for success (transaction.commit()) and failure (transaction.rollback()), as is the case here, your code will be significantly more complex. Additionally, many exception-based languages give you syntactic sugar for these most common exception handling cases — in particular, using in C#, or with in Python.

Stick to the conventions of your language

Of course, if you’re using Go, handling error codes is what you have to do. The Go designers decided to do without exceptions, and to introduce them at this stage would just cause confusion. You would end up with some functions returning error codes while others throw exceptions under exactly the same failure conditions.

Now Go does have a panic/recover construct that is similar to exceptions in some respects. But it is rarely — and inconsistently — used. We are told that it is supposed to be reserved for “truly exceptional conditions.” But what, exactly, makes one condition “truly exceptional” and another not? Why, for example, are array bounds errors exceptional, but Println errors, bad format strings, and broken connections are not? There is neither rhyme nor reason to the distinction.

The Go community seems to be avoiding problems with error handling for now, but that is mainly because Go programmers tend to be experienced, high-end developers who are used to the discipline of meticulously writing all the extra error-handling code. But with the rise in Go’s popularity, sooner or later it is going to experience an eternal September, with newbies piling in and forgetting the all-important error-handling boilerplate code left, right and centre, and when that happens, they will discover that return codes instead of exceptions are no panacea.

You have to tell AWS CLI that your EC2 instance is not in Virginia

Here’s a little gotcha with AWS that I keep running into time and time again. By default, the aws command line interface, and AWS API libraries such as boto3, will always use the us-east-1 (Virginia) region by default, even when running on EC2 instances in other regions.

This is not what you expect, and it is almost never what you want.

There is an issue on the awscli GitHub issue tracker to fix this, but it is still open four years after first being raised, with no indication when (or even whether) it will ever be addressed.

User @BradErz suggests including these lines in your user_data to set the default region:

region=$(curl http://169.254.169.254/latest/dynamic/instance-identity/document|grep region|awk -F\" '{print $4}')
echo "[default]" > /root/.aws/config
echo "region = ${region}" >> /root/.aws/config

Note however that this will only set the default region for the root user; you will need to configure aws-cli separately for any other logins on your instance.

Annoying as this behaviour is, I would be surprised to see it fixed any time soon, as it would be a breaking change.