james mckay dot net
because there are few things that are less logical than business logic

# The time I forgot about the speed of light

The Fallacies of Distributed Computing are a set of eight assumptions, originally noted by L Peter Deutsch and others at Sun Microsystems, that are commonly made by programmers and architects who are new to distributed computing and network architecture. They are:

1. The network is reliable;
2. Latency is zero;
3. Bandwidth is infinite;
4. The network is secure;
5. Topology doesn’t change;
7. Transport cost is zero;
8. The network is homogeneous.

All eight of these assumptions are wrong. In fact, these eight fallacies are the reason why Martin Fowler came up with his First Law of Distributed Object Design: don’t distribute your objects.

The first time I fell foul of these fallacies was particularly embarrassing, because it was in a situation where I should have known better.

I was working for a small web agency at the time. A lot of our work was graphic design, webmastering and SEO for local businesses, but we had a few larger clients on our books for whom we had to do some actual coding and server administration. One of them was an airport taxi reservations company who wanted a new front end for their online portal.

Their database was running on SQL Server Express Edition (the free but hopelessly under-powered version), on the same server as the web front end, hosted in a data centre in Germany. Because this was running at pretty much full capacity, they asked us to move it to a larger, beefier server. Since this meant moving up to one of the paid editions, my boss did a bit of shopping around and came to the conclusion that we could save several thousand euros by moving the database to a hosting provider in the USA. Due to the complexity of the code and the fact that it was running on a snowflake server, however, the web front end had to stay put in Germany.

He asked me what I thought about the idea. I may have raised an eyebrow at it, but I didn’t say anything. It sounded like a bit of an odd idea, but I didn’t see any reason why it shouldn’t work.

But I should have. There was one massive, glaring reason — one that I, in possession of a physics degree, should have spotted straight away.

The speed of light.

The ultimate speed limit of the universe.

It is 299,792,458 metres per second now and it was 299,792,458 metres per second then. It has been 299,792,458 metres per second everywhere in the visible universe for the past 13.8 billion years, and expecting it to change to something bigger in time for our launch date would have been, let’s just say, a tad optimistic.

Now that distance may sound like a lot, but it’s only 48 times the distance from Frankfurt to New York. It simply isn’t physically possible for a web page in Germany to make more than twenty or so consecutive requests a second to a database in America — and many web pages in data-driven applications, ours included, need a whole lot more requests than that.

Needless to say, when we flipped the switch, the site crashed.

To say this was an embarrassment to me is an understatement. I should have spotted it immediately. I have a university degree in physics, and the physics that I needed to spot this was stuff that I learned in school. What was I thinking?

Featured image by WikiImages from Pixabay

# Light speed

As you will no doubt be aware, the speed of light in a vacuum is 299,792,458 metres per second exactly. It always has been, and it always will be. In fact, so confident are physicists that it has never changed that since 1983, it has been used as the definition of the metre in the SI system of units.

And now, here it is, in a sudoku.

Normal sudoku rules apply. Digits on thermometers increase from the bulb end. Cells separated by a knight’s move in chess can not contain the same digit.

# Finding bugs in your code quickly using git bisect

git bisect is one of my favourite features of Git. It is a binary search tool that lets you quickly track down the revision that introduced a bug. Surprisingly, it doesn’t seem to be all that well known, so I thought it would be worth writing a refresher on what it is and how to use it.

## git bisect: an introduction

The idea is very simple. If you know that your latest revision has a bug that wasn’t there a few weeks ago, and you can find a “known good” revision from round about that time, you can conduct a binary search of the revisions in between to find out which one introduced it.

So let’s say that you have 500 revisions to start off with. You’d mark the latest one as bad, then test, say, the 100th revision, find that it works as expected, and mark that as your last known good revision. Git will then automatically update to the 300th revision (halfway in between) for you to test. Mark as good or bad as appropriate, lather, rinse and repeat until you’re done.

Each test halves the range of revisions left to be tested, quickly narrowing the gap. In total, you have to test just $\mathcal{O}(\log_2 n)$ revisions. This means that 1,000 revisions would only take one more test than 500, and one million would only take one more test than 500,000 and ten more tests than a thousand. Once you’ve found the offending change, you can very easily zoom right in on the problematic lines of code, rather than having to spend ages stepping through it all in the debugger.

## How to use it

Before you start your bisect session, save your work using git commit or git stash. Then to start off your bisect session, type:

$git bisect start  Next you need to tell Git the range to start off with. If your current HEAD revision is the bad one, you can just mark it as bad as follows: $ git bisect bad


Next check out a revision that you know to be good and tell Git that it is a good one:

$git checkout KNOWN_GOOD_REVISION$ git bisect good


Git will now move to a revision halfway in between the two, choosing the next revision for us to test. You will see something like this:

Bisecting: 31 revisions left to test after this (roughly 5 steps)
[89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version


Recompile and re-test your code at this revision. Look for the specific bug that you are trying to track down (ignore any other bugs for the time being) and mark as either bad or good as required:

$git bisect bad$ git bisect good


After each of these steps, Git will choose another revision halfway in between, until you end up with the revision that introduced the bug:

$git bisect bad 164f5061d3f54ab5cba9d5d14ac04c71d4690a71 is the first bad commit commit 164f5061d3f54ab5cba9d5d14ac04c71d4690a71 Author: James McKay <code@jamesmckay.net> Date: Sun Nov 11 14:18:44 2018 +0000 Move some test fixtures about for consistency. :040000 040000 d8dc665d03d1e9b37c5ee2dcde8acc032e306de8 0077c62618b69a20e5dbf6a61b42701a3ba2c156 Msrc  Once you’ve found the offending commit, reset to go back to where you started: $ git bisect reset


## Some useful tips

Use git bisect log to see a list of all the revisions you’ve checked so far:

$git bisect log git bisect start # bad: [e38970b3100deecfdbc0ec183c527b49a6e68157] Don't auto-register types by default. Resolves #27. git bisect bad e38970b3100deecfdbc0ec183c527b49a6e68157 # good: [dcb6a346e9130e736f45f65761ee57fd337483d7] Bit of tidying up. git bisect good dcb6a346e9130e736f45f65761ee57fd337483d7 # good: [89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version git bisect good 89f7bc018b5fc34c01bea545e3641ee2c77241ac # bad: [c08ed22ef9ac9cc66c56562b01143333fd61beae] Builders for conventions by name and by scan. git bisect bad c08ed22ef9ac9cc66c56562b01143333fd61beae # bad: [3fbc17dc37c35f963c5cea22814408ceac61787f] Bump version: release 0.2.0. git bisect bad 3fbc17dc37c35f963c5cea22814408ceac61787f # good: [e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a] Add link to documentation git bisect good e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a # good: [052e765169b71e691c70b7f458593f5552c75d41] Add resolution for arrays. git bisect good 052e765169b71e691c70b7f458593f5552c75d41 # bad: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency. git bisect bad 164f5061d3f54ab5cba9d5d14ac04c71d4690a71 # first bad commit: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency.  Use git bisect visualize to show your bisect progress in a GUI tool: $ git bisect visualize


If you can’t tell whether a revision is bad or good (for example, because it won’t compile), use git bisect skip:

\$ git bisect skip


On a final note, you don’t need to worry if you haven’t been meticulous about using git rebase to keep your source history linear. git bisect is smart enough to handle branches.

All in all, git bisect is a really useful tool. It allows you to zoom in on bugs in your source code very quickly even in large repositories with extensive histories. Using it is a skill that I would heartily recommend for every developer and tester’s toolbox.

# On the “reproducibility crisis” in science

I’ve had two or three people tell me about the “reproducibility crisis” in science in the past few months. The most recent such comment was at the weekend, which coincidentally came right at the time when a 2016 Nature article on the subject was at the top of Hacker News. Here are some thoughts on the matter.

First of all, I’d like to make it clear that the reproducibility crisis doesn’t call the entire scientific method into question right across the board. There may be a lot of papers published in the scientific literature that can’t be replicated, but there are also vast swathes of others that can and are — often by multiple independent methods. The fact that some studies can’t be reproduced says nothing whatsoever about the validity of the ones that can, and it’s the ones that can that go on to establish the scientific consensus and make their way into school and university textbooks.

In fact, it’s only to be expected that the scientific literature would contain a sizeable proportion — perhaps even a majority — of non-reproducible studies. Scientists are only human, and if they rarely if ever made any mistakes, then that would suggest there was some form of underhanded collusion going on. It’s all too easy for them to inadvertently end up making mistakes, taking shortcuts, or writing down lab notes that don’t accurately describe exactly what they did. But that is why science demands reproducibility in the first place — to filter out problems such as these.

It’s important to realise that the reproducibility crisis only really affects the very frontiers of science — cutting edge research where the practices and protocols are often still being developed. There will always be a certain amount of churn in areas such as these. It rarely if ever affects more well established results, and it’s not even remotely realistic to expect it to cast any doubt on the core fundamentals. We can be absolutely confident that subjects such as relativity, quantum mechanics, Maxwell’s Equations, thermodynamics, the Periodic Table, evolution, radiometric dating, Big Bang cosmology and so on are here to stay.

Furthermore, scientists are actively working on ways to improve things. There is a whole scientific discipline called “meta-science,” which is devoted to increasing quality while reducing waste in scientific research. That is why scientists have adopted techniques such as peer review, blind studies, statistical methods to detect fraud (using techniques such as Benford’s Law) and the like. One recent innovation has been pre-registration of clinical trials as a means to combat publication bias and selective reporting: in many cases, the studies are peer reviewed before the results are taken rather than after the fact.

Interestingly, the disciplines that are most profoundly affected by the “reproducibility crisis” are the social sciences — sociology, psychology, medicine, and so on. These are subjects which first and foremost concern the vagaries of humans and other living beings, which deal with very imprecise data sets with wide spreads of results, and which predominantly rely on statistics and correlations that are much more open to interpretation and studies that are qualitative rather than quantitative in nature. It is less of a problem for the more exact sciences, such as physics, chemistry, mathematics, geology, astronomy, or computer science.

The thing about science is that its foundations of testability and rigorous fact-checking tend to bring it into direct conflict with dishonest people, hidden agendas, and vested commercial or political interests. Consequently there is no shortage of people who will do whatever they can to try and undermine public trust in the scientific community and even the scientific method itself. One of the ways that they do so is to take real or perceived imperfections and shortcomings in science, blow them out of all proportion, and make them appear far more significant and far more damaging to the legitimacy of scientific scrutiny than they really are. But that’s just dishonest. Science may not be perfect, and non-reproducible papers may be plentiful, but nobody gets a free pass to reject anything and everything about science that they don’t like.

Featured image: United States Air Force Academy

# How not to stop Brexit

For better or for worse, the Conservatives under Boris Johnson have won the General Election with a majority of either 78 or 80, depending on which way the result in St Ives turns out. This means that, for better or for worse, Brexit is definitely going ahead, and there will not be a second referendum.

I personally voted Remain in 2016. Leaving the EU didn’t make much sense to me from either an economic or a logistical perspective, and I was particularly unimpressed with the arguments I was seeing from the “Leave” side, many of which seemed anti-intellectual, tin-foil hat conspiratorial, or simply not true. And I’ve never been impressed with the incessant references to the referendum result as “The Will Of The People.” The 48.1% of us who voted Remain are people too.

But Brexiteers have one legitimate concern that I have to agree with. The EU has a problem with taking “no” for an answer.

I’ve seen this playing out time and time again for over a quarter of a century. We saw it, for example, with the Maastricht Treaty and with the Lisbon Treaty (which was just a rebranding of the EU Constitution). Whenever an EU member state has a referendum that gives a result that Brussels doesn’t like, they simply make them vote again until they come up with the “right” result.

This isn’t democracy: it’s democracy theatre. It’s a complete sham, and if truth be told it makes the idea of a so-called “People’s Vote” seem really, really creepy, because it would just be more of the same. It’s a toxic, anti-democratic practice that needs to be broken.

Nevertheless, the 2016 referendum could potentially have been undone if only Remainers had gone about it the right way. If the UK were to leave the EU wth some kind of interim arrangement in place, and then have a “rejoin” referendum some months later, that would respect the mandate from 2016, avoid the mathematical problems with having three options on the ballot paper (deal/no deal/remain) rather than two, and generally have a much more credible claim towards being truly democratic. It would be clean, fair and above board.

Unfortunately, no political party proposed this option. Instead, far too many politicians did everything that they could to try to undermine and frustrate the referendum result before it could be carried out. In fighting tooth and nail for approaches that were not democratically credible, Remainers failed to come up with one that was. And in so doing, they made the whole process far, far, far more chaotic, stressful and acrimonious than it could otherwise have been.

Featured image credit: Tim Reckmann