james mckay dot net
because there are few things that are less logical than business logic

Accurate and honest metric weights and measurements

My height is 1 metre and 78 centimetres. I refuse point-blank to quote that in feet and inches.

My weight, as of 12:30 on Saturday 19 August, is 79.8 kilograms. One again, I don’t care what that is in stones and pounds, so working it out is left as an exercise for the reader.

Over the past three months I have made it my goal to take a five kilometre walk every day that I can. Once again, converting that into miles is left as an exercise for the reader.

Those who have engaged in discussions with me in (sometimes lively) debates about science and faith will be aware that one particular passage from the Bible that I am always quoting, over and over again, is Deuteronomy 25:13-16, which says this:

13Do not have two differing weights in your bag — one heavy, one light. 14Do not have two differing measures in your house — one large, one small. 15You must have accurate and honest weights and measures, so that you may live long in the land the Lord your God is giving you. 16For the Lord your God detests anyone who does these things, anyone who deals dishonestly.

It should come as no surprise, therefore, to learn that I am an ardent proponent of metrication, frustrated at the lack of progress that the UK has made in this area since the initial push in the 1960s and 1970s, and totally opposed to any attempt to head in the opposite direction.

This one should be a no-brainer. With the exception of the UK, the USA, Myanmar and Liberia, almost every other nation on Earth uses metric units exclusively. It’s not hard to see why either. Metric units of measurement make sense. Different measurements of the same units are related to each other by multiples of ten, with a consistent set of prefixes denoting their relationships. So getting from metres to kilometres, grams to kilograms, bytes to kilobytes and so on, you just notice the word “kilo” at the start and multiply by one thousand.

On top of that, metric units, or SI units, are the foundation for measurement in science, engineering, commerce, finance, law, education and just about every other context where measurement is used. They are based on well defined, easily measurable, high-precision quantities. They are consistent, unambiguous, precise, easy to understand, easy to work with, internationally recognised, and exactly the same everywhere you go. They are the lingua franca of accurate and honest weights and measurements worldwide.

Imperial measurements, by contrast, are a mess. There are sixteen ounces in a pound and fourteen pounds in a stone—or is it the other way around? Up until the nineteenth century, the number of pounds in a stone varied depending on where you were and on what was being measured. There are eight furlongs in a mile, ten chains in a furlong, 22 yards in a chain, three feet in a yard, twelve inches in a foot … how on earth are you supposed to remember all the details? British and American gallons are different. British and American tons are different. I have no idea how much a fluid ounce is supposed to be. Repeat after me: an acre is the area of a rectangle whose length is one furlong and whose width is one tenth. None of it offers you a shred of sense or coherence whatsoever.

Someone asked me the other day on Facebook, in response to my quoting of Deuteronomy 25 yet again, whether I thought that having a mixture of imperial and metric measurements was unbiblical. I replied that it quite possibly could be. Having two different systems of measurement makes it a whole lot harder for consumers to compare like for like when trying to figure out how much something costs. Supermarkets being supermarkets, they will take every opportunity they legally can to pull off shenanigans like that. This was a concern back in the 1970s when the push for metrication was in full swing, and it would be a concern if we were to try to turn the clock back to imperial ones again.

Unfortunately, there are certain politicians here in the UK who, in the wake of Brexit, want to do precisely that. One of the most prominent of these is, of course, the Right Honourable Member for the eighteenth century, Sir Jacob Rees Mogg, who instructs his staff, in no uncertain terms, that imperial measurements they must use. The only reason why anyone would want to do things such as this is some sort of misguided rose-tinted nostalgia for the good old days of the 1940s and 1950s or earlier. Why don’t we just bring back post-war rationing, outside loos, black and white TV, and horses and carriages while we’re at it?

The time I forgot about the speed of light

Laserstrahl

The Fallacies of Distributed Computing are a set of eight assumptions, originally noted by L Peter Deutsch and others at Sun Microsystems, that are commonly made by programmers and architects who are new to distributed computing and network architecture. They are:

  1. The network is reliable;
  2. Latency is zero;
  3. Bandwidth is infinite;
  4. The network is secure;
  5. Topology doesn’t change;
  6. There is one administrator;
  7. Transport cost is zero;
  8. The network is homogeneous.

All eight of these assumptions are wrong. In fact, these eight fallacies are the reason why Martin Fowler came up with his First Law of Distributed Object Design: don’t distribute your objects.

The first time I fell foul of these fallacies was particularly embarrassing, because it was in a situation where I should have known better.

I was working for a small web agency at the time. A lot of our work was graphic design, webmastering and SEO for local businesses, but we had a few larger clients on our books for whom we had to do some actual coding and server administration. One of them was an airport taxi reservations company who wanted a new front end for their online portal.

Their database was running on SQL Server Express Edition (the free but hopelessly under-powered version), on the same server as the web front end, hosted in a data centre in Germany. Because this was running at pretty much full capacity, they asked us to move it to a larger, beefier server. Since this meant moving up to one of the paid editions, my boss did a bit of shopping around and came to the conclusion that we could save several thousand euros by moving the database to a hosting provider in the USA. Due to the complexity of the code and the fact that it was running on a snowflake server, however, the web front end had to stay put in Germany.

He asked me what I thought about the idea. I may have raised an eyebrow at it, but I didn’t say anything. It sounded like a bit of an odd idea, but I didn’t see any reason why it shouldn’t work.

But I should have. There was one massive, glaring reason — one that I, in possession of a physics degree, should have spotted straight away.

The speed of light.

The ultimate speed limit of the universe.

It is 299,792,458 metres per second now and it was 299,792,458 metres per second then. It has been 299,792,458 metres per second everywhere in the visible universe for the past 13.8 billion years, and expecting it to change to something bigger in time for our launch date would have been, let’s just say, a tad optimistic.

Now that distance may sound like a lot, but it’s only 48 times the distance from Frankfurt to New York. It simply isn’t physically possible for a web page in Germany to make more than twenty or so consecutive requests a second to a database in America — and many web pages in data-driven applications, ours included, need a whole lot more requests than that.

Needless to say, when we flipped the switch, the site crashed.

To say this was an embarrassment to me is an understatement. I should have spotted it immediately. I have a university degree in physics, and the physics that I needed to spot this was stuff that I learned in school. What was I thinking?

Featured image by WikiImages from Pixabay

Light speed

As you will no doubt be aware, the speed of light in a vacuum is 299,792,458 metres per second exactly. It always has been, and it always will be. In fact, so confident are physicists that it has never changed that since 1983, it has been used as the definition of the metre in the SI system of units.

And now, here it is, in a sudoku.

Normal sudoku rules apply. Digits on thermometers increase from the bulb end. Cells separated by a knight’s move in chess can not contain the same digit.

Try it on f-puzzles.com.

Finding bugs in your code quickly using git bisect

git bisect is one of my favourite features of Git. It is a binary search tool that lets you quickly track down the revision that introduced a bug. Surprisingly, it doesn’t seem to be all that well known, so I thought it would be worth writing a refresher on what it is and how to use it.

git bisect: an introduction

The idea is very simple. If you know that your latest revision has a bug that wasn’t there a few weeks ago, and you can find a “known good” revision from round about that time, you can conduct a binary search of the revisions in between to find out which one introduced it.

So let’s say that you have 500 revisions to start off with. You’d mark the latest one as bad, then test, say, the 100th revision, find that it works as expected, and mark that as your last known good revision. Git will then automatically update to the 300th revision (halfway in between) for you to test. Mark as good or bad as appropriate, lather, rinse and repeat until you’re done.

Each test halves the range of revisions left to be tested, quickly narrowing the gap. In total, you have to test just \mathcal{O}(\log_2 n) revisions. This means that 1,000 revisions would only take one more test than 500, and one million would only take one more test than 500,000 and ten more tests than a thousand. Once you’ve found the offending change, you can very easily zoom right in on the problematic lines of code, rather than having to spend ages stepping through it all in the debugger.

How to use it

Before you start your bisect session, save your work using git commit or git stash. Then to start off your bisect session, type:

$ git bisect start

Next you need to tell Git the range to start off with. If your current HEAD revision is the bad one, you can just mark it as bad as follows:

$ git bisect bad

Next check out a revision that you know to be good and tell Git that it is a good one:

$ git checkout KNOWN_GOOD_REVISION
$ git bisect good

Git will now move to a revision halfway in between the two, choosing the next revision for us to test. You will see something like this:

Bisecting: 31 revisions left to test after this (roughly 5 steps)
[89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version

Recompile and re-test your code at this revision. Look for the specific bug that you are trying to track down (ignore any other bugs for the time being) and mark as either bad or good as required:

$ git bisect bad
$ git bisect good

After each of these steps, Git will choose another revision halfway in between, until you end up with the revision that introduced the bug:

$ git bisect bad
164f5061d3f54ab5cba9d5d14ac04c71d4690a71 is the first bad commit
commit 164f5061d3f54ab5cba9d5d14ac04c71d4690a71
Author: James McKay <code@jamesmckay.net>
Date:   Sun Nov 11 14:18:44 2018 +0000

    Move some test fixtures about for consistency.

:040000 040000 d8dc665d03d1e9b37c5ee2dcde8acc032e306de8 0077c62618b69a20e5dbf6a61b42701a3ba2c156 Msrc

Once you’ve found the offending commit, reset to go back to where you started:

$ git bisect reset

Some useful tips

Use git bisect log to see a list of all the revisions you’ve checked so far:

$ git bisect log
git bisect start
# bad: [e38970b3100deecfdbc0ec183c527b49a6e68157] Don't auto-register types by default. Resolves #27.
git bisect bad e38970b3100deecfdbc0ec183c527b49a6e68157
# good: [dcb6a346e9130e736f45f65761ee57fd337483d7] Bit of tidying up.
git bisect good dcb6a346e9130e736f45f65761ee57fd337483d7
# good: [89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version
git bisect good 89f7bc018b5fc34c01bea545e3641ee2c77241ac
# bad: [c08ed22ef9ac9cc66c56562b01143333fd61beae] Builders for conventions by name and by scan.
git bisect bad c08ed22ef9ac9cc66c56562b01143333fd61beae
# bad: [3fbc17dc37c35f963c5cea22814408ceac61787f] Bump version: release 0.2.0.
git bisect bad 3fbc17dc37c35f963c5cea22814408ceac61787f
# good: [e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a] Add link to documentation
git bisect good e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a
# good: [052e765169b71e691c70b7f458593f5552c75d41] Add resolution for arrays.
git bisect good 052e765169b71e691c70b7f458593f5552c75d41
# bad: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency.
git bisect bad 164f5061d3f54ab5cba9d5d14ac04c71d4690a71
# first bad commit: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency.

Use git bisect visualize to show your bisect progress in a GUI tool:

$ git bisect visualize

If you can’t tell whether a revision is bad or good (for example, because it won’t compile), use git bisect skip:

$ git bisect skip

On a final note, you don’t need to worry if you haven’t been meticulous about using git rebase to keep your source history linear. git bisect is smart enough to handle branches.

All in all, git bisect is a really useful tool. It allows you to zoom in on bugs in your source code very quickly even in large repositories with extensive histories. Using it is a skill that I would heartily recommend for every developer and tester’s toolbox.

On the “reproducibility crisis” in science

I’ve had two or three people tell me about the “reproducibility crisis” in science in the past few months. The most recent such comment was at the weekend, which coincidentally came right at the time when a 2016 Nature article on the subject was at the top of Hacker News. Here are some thoughts on the matter.

First of all, I’d like to make it clear that the reproducibility crisis doesn’t call the entire scientific method into question right across the board. There may be a lot of papers published in the scientific literature that can’t be replicated, but there are also vast swathes of others that can and are — often by multiple independent methods. The fact that some studies can’t be reproduced says nothing whatsoever about the validity of the ones that can, and it’s the ones that can that go on to establish the scientific consensus and make their way into school and university textbooks.

In fact, it’s only to be expected that the scientific literature would contain a sizeable proportion — perhaps even a majority — of non-reproducible studies. Scientists are only human, and if they rarely if ever made any mistakes, then that would suggest there was some form of underhanded collusion going on. It’s all too easy for them to inadvertently end up making mistakes, taking shortcuts, or writing down lab notes that don’t accurately describe exactly what they did. But that is why science demands reproducibility in the first place — to filter out problems such as these.

It’s important to realise that the reproducibility crisis only really affects the very frontiers of science — cutting edge research where the practices and protocols are often still being developed. There will always be a certain amount of churn in areas such as these. It rarely if ever affects more well established results, and it’s not even remotely realistic to expect it to cast any doubt on the core fundamentals. We can be absolutely confident that subjects such as relativity, quantum mechanics, Maxwell’s Equations, thermodynamics, the Periodic Table, evolution, radiometric dating, Big Bang cosmology and so on are here to stay.

Furthermore, scientists are actively working on ways to improve things. There is a whole scientific discipline called “meta-science,” which is devoted to increasing quality while reducing waste in scientific research. That is why scientists have adopted techniques such as peer review, blind studies, statistical methods to detect fraud (using techniques such as Benford’s Law) and the like. One recent innovation has been pre-registration of clinical trials as a means to combat publication bias and selective reporting: in many cases, the studies are peer reviewed before the results are taken rather than after the fact.

Interestingly, the disciplines that are most profoundly affected by the “reproducibility crisis” are the social sciences — sociology, psychology, medicine, and so on. These are subjects which first and foremost concern the vagaries of humans and other living beings, which deal with very imprecise data sets with wide spreads of results, and which predominantly rely on statistics and correlations that are much more open to interpretation and studies that are qualitative rather than quantitative in nature. It is less of a problem for the more exact sciences, such as physics, chemistry, mathematics, geology, astronomy, or computer science.

The thing about science is that its foundations of testability and rigorous fact-checking tend to bring it into direct conflict with dishonest people, hidden agendas, and vested commercial or political interests. Consequently there is no shortage of people who will do whatever they can to try and undermine public trust in the scientific community and even the scientific method itself. One of the ways that they do so is to take real or perceived imperfections and shortcomings in science, blow them out of all proportion, and make them appear far more significant and far more damaging to the legitimacy of scientific scrutiny than they really are. But that’s just dishonest. Science may not be perfect, and non-reproducible papers may be plentiful, but nobody gets a free pass to reject anything and everything about science that they don’t like.

Featured image: United States Air Force Academy