MODSIM Day 1: Honeybee Colony Collapse

The MODSIM 2015 International Congress on Modelling and Simulation opened today with a plenary talk by Mary Myerscough on honeybee colony collapse disorder. The talk was based on work published in PLOS ONE and in PNAS.

Mathematical modelling strongly suggests that the problem is caused by the death of foraging bees. The colony reacts by drafting younger hive bees into the foraging role. This strategy works well as a response to short-term problems but, since younger bees are less effective foragers, it sets up a positive feedback loop which can cause colony collapse. What is worse, the signs of impending collapse are subtle, being reflected only in the number of adult bees.

This interesting talk also provided a wonderful answer to the perennial question “how is mathematics useful?” The mathematics was accessible to anyone who could understand differential equations, and the problem was accessible to anyone at all. And, because of their role as pollinators, bees are very, very important.


Pseudoscience: Essential oils


Four essential oils: tarragon, apricot seed, lemon, and mandarin

One of my favourite laboratory exercises from my undergraduate chemistry days was extracting an essential oil by steam distillation, and then analysing it using infrared spectroscopy and other methods. Knowing something about essential oils, I was rather surprised to read this on the Internet recently:

“Bruce Tanio, of Tainio Technology and head of the Department of Agriculture at Eastern Washington University, has developed a Calibrated Frequency Monitor (CFM) that has been used to measure the frequencies of essential oils and their effect on human frequencies when applied to the body. Therapeutic Grade Essential Oils begin at 52 and go as high as 320 MHz. For example: Rose 320 MHz, Helichrysum 181 MHz, Frankincense 147 MHz, Ravensara 134 MHz, Lavender 118 MHz, Myrrh 105 MHz, German Camomile 105 MHz, Juniper 98 MHz, Sandalwood 96 MHz, Angelica 85 MHz, Peppermint 78 MHz.”

“A healthy body, from head to foot, typically has a frequency ranging from 62 to 78 MHz, while disease begins at 58MHz. During some testing with frequency and the frequency of essential oils it was measured that: Holding a cup of coffee dropped one man’s frequency from 66 MHz to 58 MHz in just 3 seconds. It took three days for his frequency to return to normal. Another man drank the coffee and his frequency dropped from 66 MHz to 52 MHz. After inhaling the pure therapeutic grade essential oil, his frequency returned to 66 MHz in just 21 seconds.”

“In another case: A man’s frequency dropped from 65 MHz to 48 MHz when he simply held a cigarette. When he smoked the cigarette, his frequency dropped to 42 MHz, the same frequency as cancer. Other studies show that: Negative thoughts lower our frequency on average 12 MHz. Positive thoughts raises our frequency on average 10 MHz.”

 
The oldest versions of these claims on the Internet seem to date from around the year 2000, and appear to have been systematically recopied and elaborated since then. They are generally associated with the false claim that essential oils can cure a range of diseases such as cancer and Ebola, which of course they can not (in fact, used inappropriately, essential oils can be quite dangerous).

All the stuff about frequencies is of course complete nonsense – human bodies and essential oils do not in fact have characteristic frequencies, nor do they broadcast radio waves in the VHF (30–300 MHz) band, nor is there any association between frequency and disease (individual chemical bonds within molecules have characteristic frequencies, in the infrared or visible-light range, but that is not what is being discussed here). Scientific words are being used here in a nonsensical way, in an attempt to give credibility to the associated medical claims. The link to Eastern Washington University is being used in the same way. In fact, Eastern Washington University does not even have a Department of Agriculture (so that the late Bruce Tainio could not have headed it), nor is the company founded by Tainio mentioned on the university’s web site at all. But yet, inexplicably, people seem to believe this stuff. Why?


Mathematics in action: returning from a random walk


Three 2-dimensional random walks. All three start at the black circle and finish, after 100 steps, at a coloured square. Later steps are in darker colours. Considerable backtracking occurs.

We have discussed one-dimensional random walks, but it is possible to have random walks in more than one dimension. In two dimensions (above), we can go left, right, forward, and back. A random walk in two dimensions can be played as a kind of game (as can one-dimensional random walks). In three dimensions (below) we can also move vertically. Three-dimensional random walks are related to the motion of molecules in a gas or liquid.


Three 3-dimensional random walks. All three start at the black circle (in the centre of the cube) and finish, after 100 steps, at a coloured square. Later steps are in darker colours.

One very interesting question is whether a random walk ever returns to its starting point. In one dimension, the probability of returning in exactly n ≥ 1 steps is 0 if n is odd, and C(nn/2) / 2n if n is even, where C(nk) is the number of ways of choosing k items out of n, which is defined by C(nk) = n! / (k! (nk)!).

For large numbers n, Stirling’s approximation says that n! is approximately sqrt(2πn)(n/e)n. If we let m = n/2, some tedious algebra gives the probability of returning in exactly n = 2m steps as 1/sqrt(πm) ≈ 0.56/sqrt(m). When I ran some experiments I actually got a factor of 0.55, which is pretty close. Given infinite time, the expected number of times we return to the starting point is then:

0.56 (1 + 1/sqrt(2) + 1/sqrt(3) + 1/sqrt(4) + …) = ∞

This means that an eventual return to the starting point is certain. It may take a while, however – in 100 random walks, summarised in the histogram below, I once had to wait for 11452 steps for a return to the starting point.

Random walks in two dimensions can be understood as two random walks in one dimension happening simultaneously. We return in exactly n = 2m steps if both one-dimensional walks return together. The probability is therefore the one above squared, i.e. 1/(πm) ≈ 0.318/m. Again, given infinite time, the expected number of times we return to the starting point is:

0.318 (1 + 1/2 + 1/3 + 1/4 + …) = ∞

This means that a return to the starting point is also theoretically certain, although it will take much, much longer than for the one-dimensional case. In a simple experiment, four random walks returned to the starting point in 6814, 2, 21876, and 38 steps respectively, but the fifth attempt took so long that I gave up. In three or more dimensions, a return to the starting point might never occur.


Games: the Good, the Bad, the Ugly

I recently redrew a classic graph by Oliver Roeder from fivethirtyeight.com, showing the ratings of various board and card games at boardgamegeek.com. These ratings run from 1 (“Defies description of a game. You won’t catch me dead playing this. Clearly broken.”) to 10 (“Outstanding. Always want to play, expect this will never change.”). I have used the same dataset (downloaded by Rasmus Greve in 2014, so slightly old now), but removed games rated by less than 100 people, leaving a total of 5121 games. The average rating for these games is 6.42 (or 6.92 for the average weighted by number of ratings).

I’ve labelled three kinds of outlier in the graph above, and listed the corresponding games below. The Frequently Rated Games on the right are rated often because they are played often, and so they are generally very good games (the graph shows a weak correlation, reflecting this popularity–quality link). These games include Carcassonne (a superb family game, because very young children can join in if they are given hints about the best move), Dominion (my favourite card game), and Pandemic (one of the best collaborative games). Overlapping with this category are the Highly Rated Games at the top, some of which are aimed at hard-core gamers, while others (like Puerto Rico) are more widely popular. It should be noted, however, that game expansions tend to get deceptively high ratings, since they are generally only played by fans of the original game.

At the bottom are a number of Poorly Rated Games, which (sadly!) includes many of the games I grew up with. These flawed games include those which are too simple (Tic-Tac-Toe, Battleship); which are too heavily based on chance (Snakes and Ladders, Risk); which eliminate players before the end of the game (Risk, Monopoly); which take an unpredictable amount of time (Risk, Monopoly); or which have other problems. This category includes seven of the eight games on this list of flawed games by Ben Guarino, and all six in this post by Luke McKinney (which recommends Power Grid, Settlers of Catan, Ricochet Robots, Alien Frontiers, Ticket to Ride, and King of Tokyo as substitutes for Monopoly, Risk, Battleship, Connect Four, The Game of Life, and Snakes and Ladders, respectively).

Frequently Rated Games

Highly Rated Games

Poorly Rated Games

Mathematics in action: Risky Random Walks


A game of Risk (photo: A.R.N. Rødner)

The board game Risk, though far from being my favourite game (and rated only 5.59/10 on Board Game Geek), nevertheless has some interesting strategic aspects and some interesting mathematical ones.


Combat units in Risk (photo: “Tambako The Jaguar”)

A key feature of the game is a combat between a group of N attacking units and a group of M defending units. The combat involves several steps, in each of which the attacker rolls 3 dice (or N if N < 3) and the defender rolls 2 dice (or 1 if M = 1). The highest value rolled by the attacker is compared against the highest rolled by the defender, and ditto for the second highest values, as shown in the picture below. For each comparison, if the attacker has a higher value, the defender loses a unit, while if the values are tied, or the defender has a higher value, the attacker loses a unit.


Comparing attacker (left) and defender (right) dice in Risk (photo: “Val42”)

Working through the 65 possibilities, the attacker will be down 2 units 29.3% of the time, both sides will have equal losses 33.6% of the time, and the attacker will be up 2 units (relative to the defender) 37.2% of the time. On average, the attacker will be up very slightly (0.1582 of a unit). A fairly simple computation (square each of the outcome-mean differences −2.1582, −0.1582, and 1.8418; multiply by the corresponding probabilities 0.293, 0.336, and 0.372 and sum; then take the square root) shows that the standard deviation of the outcomes is 1.6223.

When this basic combat step is repeated multiple times, the result is a random walk. For example, with 10 steps, the mean attacker advantage is 1.582 units, and (by the standard formula for random walks discussed in a previous post) the standard deviation is 1.6223 times the square root of the number of steps, i.e. 5.1302.

The histogram below shows the probability of the various outcomes after 10 steps, ranging from the attacker being 20 units down (0.0005% of the time) to the attacker being 20 units up (0.005% of the time). Superimposed on the plot are a bell curve with the appropriate mean and standard deviation, together with five actual ten-step random walks. While the outcome does indeed favour the attacker, there is considerable random variability here – which makes the game rather unpredictable.


Mathematics in action: Flipping coins

Our second post about probability is about flipping coins and random walks. Once again, I’ve used random numbers from www.random.org, but I’ve represented the coin flips as 1 for heads and −1 for tails. The expected mean is therefore 0 (I actually got 0.0034), and the expected variance is s − 02 = 1, where s = 1 is the mean of the squares of the numbers −1 and 1 (for the variance, I actually got 1.000088).

We then consider a random walk where we repeatedly flip a coin and walk a block west if it’s tails and a block east if it’s heads. In particular, we consider doing so 144 times. How far would we expect to get that way? Well, on average, nowhere – we are adding 144 coin flips, and the mean distance travelled will be 0. The coloured lines in the diagram above show ten example random walks (with time running vertically upwards). These finish up between 18 blocks west and 20 blocks east of the starting point, so the mean of 0 represents an average of outcomes where we wind up several blocks west or east.

Since we can add variances, the variance for the random walk will be the variance of a single step times the number of steps. Alternatively, the standard deviation will be the standard deviation of a single step times the square root of the number of steps. In this case, the expected standard deviation of the random walk is 12 (for the ten random walks in the diagram above, I actually got 14.55; for a larger sample, 12.086). The width of the bell curve in the diagram illustrates the theoretical standard deviation (the height of the bell curve is not meaningful).

The expected absolute value of the distance travelled depends on the mean value of half a bell curve: it is 12 × sqrt(2/π) = 9.5746 (I actually got 12.6; for a larger sample, 9.631). So, for our random walk, we can expect to wind up around 10 blocks from the starting point – sometimes more, sometimes less. Naturally, this is just a simple example – there’s a lot more interesting mathematics in the theory of random walks, especially when we consider travelling in more than one dimension.


GPS Overestimates Distance Traveled

This is interesting: GPS units will systematically overestimate distance travelled (by as much as 2%), according to researchers from Salzburg and Delft. See this IEEE Spectrum article for details, or this paper. Systematic bias in measurement error is the cause. This issue has relevance to long road races like the World Solar Challenge, if teams are using a GPS unit to calculate distance travelled – they may not be as close to the finish line as they think!


Mathematics in action: Probability


Dice

Today we begin a series of three posts on probability. We begin with some experiments on rolling dice. To save my wrists, I’ve used random numbers from www.random.org instead.

We begin with individual dice rolls. Rolling 1200 times, we expect each number to come up about 200 times, and the histogram below shows that that’s pretty much what happens. We expect the average (mean) of the numbers rolled to be 3.5, and for my 1200 rolls it was actually very close to that (3.495833). We expect the standard deviation of the numbers rolled to be sqrt(s − 3.52) = sqrt(35/12) = 1.707825, where s is the mean of the squares of the numbers (1, 4, 9, 16, 25, 36). The actual sample standard deviation was 1.704133. It is generally more convenient to work with the variance, however, which is the square of the standard deviation (2.904069, which is very close to 35/12 = 2.916667).


Histogram of 1200 individual dice rolls. The dark blue dots and line show the expected results.

Rolling pairs of dice, we expect a sum of 2 to come up, on average once in 36 rolls. We expect 3 to come up twice (as 1+2 and as 2+1), 4 to come up three times (as 1+3, 2+2, and 3+1), etc. The histogram below shows what actually happens.

We expect the average (mean) of the numbers rolled to be 7, and it was actually very close to that (7.019167). The great thing about adding random numbers is that the variance of the sum is the sum of the individual variances, so we expect the variance to be 70/12 = 5.833333. The actual sample variance for my 1200 rolls was 5.831993.


Histogram of 1200 dice pair rolls. The dark blue dots and line show the expected results.

There is an amusing take on dice pair rolls in Asterix and the Soothsayer, where the soothsayer is trying not to predict the result of a dice pair roll. Given the choice of numbers from I to XII, he guesses VII, which is of course the most likely outcome, and the one which actually comes up (which is the least likely outcome?).

Rolling a larger number of dice, the expected outcome follows a bell curve. The expected mean for twenty dice is 20 × 3.5 = 70 (for my 800 rolls of twenty dice it was actually 70.12). The expected variance is 20 × 35 / 12 = 58.33333 (for my 800 rolls of twenty dice the sample variance was actually 60.42113). The standard deviation (the square root of the variance) determines the “width” of the bell curve. Similar bell curves occur whenever a result is composed of a large number of independent factors (of roughly equal weight) added together.


Histogram of 800 rolls of twenty dice. The dark blue bell curve shows the expected results.