Something is going on with the primes…

The chart below illustrates the Erdős–Kac theorem. This relates to the number of distinct prime factors of large numbers (integer sequence A001221 in the On-Line Encyclopedia):

Number No. of prime factors No. of distinct prime factors
1 0 0
2 (prime) 1 1
3 (prime) 1 1
4 = 2×2 2 1
5 (prime) 1 1
6 = 2×3 2 2
7 (prime) 1 1
8 = 2×2×2 3 1
9 = 3×3 2 1
10 = 2×5 2 2
11 (prime) 1 1
12 = 2×2×3 3 2
13 (prime) 1 1
14 = 2×7 2 2
15 = 3×5 2 2
16 = 2×2×2×2 4 1

The Erdős–Kac theorem says that, for large numbers n, the number of distinct prime factors of numbers near n approaches a normal distribution with mean and variance log(log(n)), where the logarithms are to the base e. That seems to be saying that prime numbers are (in some sense) randomly distributed, which is very odd indeed.

In the chart, the observed mean of 3.32 is close to log(log(109)) = 3.03, although the observed variance of 1.36 is smaller. The sample in the chart includes 17 numbers with 8 distinct factors, including 1,000,081,530 = 2×3×3×5×7×19×29×43×67 (9 factors, 8 of which are distinct).

The Erdős–Kac theorem led to an episode where, following the death of Paul Erdős in 1996, Carl Pomerance spoke about the theorem at a conference session in honour of Erdős in 1997. Quoting Albert Einstein (“God does not play dice with the universe”), Pomerance went on to say that he would like to think that Erdős and [Mark] Kac replied “Maybe so, but something is going on with the primes.” The quote is now widely misattributed to Erdős himself.


Recreational mathematics


The wolf, the goat, and the cabbages

Dancing alongside the more serious practitioners of mainstream mathematics are the purveyors of mathematical puzzles and problems. These go back at least as far as Diophantus (c. 200–284), the Alexandrian “father of algebra.” Alcuin of York (c. 735–804) produced a collection of problems that included the the wolf, the goat, and the cabbages (above); the three men who need to cross a river with their sisters; and problems similar to the bird puzzle published by Fibonacci a few centuries later. In more modern times, Martin Gardner (1914–2010) has done more than anyone else to popularise this offshoot of mathematics. It is often called “recreational mathematics,” because people do it for fun (in part because they are not told that it is mathematics).

Particularly popular in recent times have been Sudoku (which is really a network colouring problem in disguise) and the Rubik’s Cube (which illustrates many concepts of group theory, although it was not invented with that in mind). Sudoku puzzles have been printed in more than 600 newspapers worldwide, and more than 20 million copies of Sudoku books have been sold. The Rubik’s Cube has been even more popular: more than 350 million have been sold.


A Soma cube, assembled

Recreational puzzles may be based on networks, as in Hashi (“Bridges”). They may be based on fitting two-dimensional or three-dimensional shapes together, as in pentominoes or the Soma cube. They may be based on transformations, as in the Rubik’s Cube. They may even be based on arithmetic, as in Fibonacci’s problem of the birds, or the various barrel problems, which go back at least as far as the Middle Ages.

In one barrel problem, two men acquire an 8-gallon barrel of wine, which they wish to divide exactly in half. They have an empty 5-gallon barrel and an empty 3-gallon barrel to assist with this. How can this be done? It is impossible to accurately gauge how much wine is inside a barrel, so that all that the men can do is pour wine from one barrel to another, stopping when one barrel is empty, or the other is full [highlight to show solution → (8, 0, 0) → (3, 5, 0) → (3, 2, 3) → (6, 2, 0) → (6, 0, 2) → (1, 5, 2) → (1, 4, 3) → (4, 4, 0)]. There is a similar problem where the barrel sizes are 10, 7, and 3.


The barrels

Apart from being fun, puzzles of this kind have an educational benefit, training people to think. For this reason, Alcuin called his collection of problems Propositiones ad Acuendos Juvenes (Problems to Sharpen the Young). Problems like these may also benefit the elderly – the Alzheimer’s Association in the United States suggests that they may slow the onset of dementia. This is plausible, in that thinking hard boosts blood flow to the brain, and research supports the idea (playing board games and playing musical instruments are even better).


Mathematics and Art: Why can’t we be friends?


The figures of Geometry and Arithmetic by the Coëtivy Master, late 15th century (detail from Philosophy Presenting the Seven Liberal Arts to Boethius)

For most of history, mathematics and the visual arts have been friends. Art was not distinguished from what we now call “craft,” and mathematics – geometry and arithmetic – provided both a source of inspiration and a set of tools. Polykleitos, for example, in the 5th century BC, outlined a set of “ideal” proportions for use in sculpture, based on the square root of two (1.414…). Some later artists used the golden ratio (1.618…) instead.

Symmetry has also been an important part of art, as in the Navajo rug below, as well as a topic of investigation for mathematicians.


Navajo woollen rug, early 20th century (Honolulu Museum of Art)

The Renaissance saw the beginning of the modern idolisation of artists, with Giorgio Vasari’s The Lives of the Most Excellent Painters, Sculptors, and Architects. However, the friendship between mathematics and art became even closer. The theory of perspective was developed during 14th and 15th centuries, so that paintings of the time have one or more “vanishing points,” much like the photograph below.


Perspective in the Galerie des Batailles at Versailles (base image: 1890s Photochrom print, Library of Congress)

Along with the theory of perspective, there was in increasing interest in the mathematics of shape. In particular, the 13 solid shapes known as Archimedean polyhedra were rediscovered. Piero della Francesca rediscovered six, and other artists, such as Luca Pacioli rediscovered others (the last few were rediscovered by Johannes Kepler in the early 17th century). Perspective, polyhedra, and proportion also come together in the work of Albrecht Dürer. Illustrations of the Archimedean polyhedra by Leonardo da Vinci appear in Luca Pacioli’s book De Divina Proportione.


Illustration of a Cuboctahedron by Leonardo da Vinci for Luca Pacioli’s De Divina Proportione (1509)

Some modern artists have continued friendly relations with mathematics. The Dutch artist M. C. Escher (reminiscent of Dürer in some ways) sought inspirations in the diagrams of scientific publications, for example.


Tiling by M. C. Escher on the wall of a museum in Leeuwarden (photo: Bouwe Brouwer)

Today it is possible to follow in Escher’s footsteps by studying a Bachelor of Fine Arts / Bachelor of Science double degree at some institutions. There is also a renewed interest in the beauty of mathematical objects, whether three-dimensional (such as polyhedra) or two-dimensional (such as the Mandelbrot set). The role of the artist then becomes that of bringing out the beauty of the object through rendering, colouring, choice of materials, sculptural techniques, and the like.


View of the Mandelbrot set at −0.7435669 + 0.1314023 i with width 0.0022878 (image: Wolfgang Beyer)

Artistic techniques such as these (“must we call them “craft” or “graphic design”?) are also important in the field of data visualisation, and are recognised by the “Information is Beautiful” Awards. Speaking of which, this year’s awards are now open for submissions.


Sequences, R, and the Free Monoid

An important concept in computer science is the free monoid on a set A, which essentially consists of sequencesa1an⟩ of elements drawn from A. The key operations on the free monoid are:

  • a⟩, forming a singleton sequence from a single element of A
  • xy, concatenation of the sequences x and y, which satisfies the associative law: (xy)⊕z = x⊕(yz)
  • ⟨⟩, the empty sequence, which acts as an identity for concatenation: ⟨⟩⊕x = x⊕⟨⟩ = x

The free monoid satisfies the mathematical definition of a monoid, and is free in the sense of satisfying nothing else. There are many possible implementations of the free monoid, but they are all mathematically equivalent, which justifies calling it the free monoid.

In the R language, there are four main implementations of the free monoid: vectors, lists, dataframes (considered as sequences of rows), and strings (although for strings it’s difficult to tell where elements start and stop). The key operations are:

Vectors Lists Dataframes Strings
⟨⟩, empty c() list() data.frame(n=c()) ""
a⟩, singleton implicit (single values are 1-element vectors) list(a) data.frame(n=a) as.character(a)
xy, concatenation c(x,y) c(x,y) rbind(x,y) paste0(x,y)

An arbitrary monoid on a set A is a set B equipped with:

  • a function f from A to B
  • a binary operation xy, which again satisfies the associative law: (xy)⊗z = x⊗(yz)
  • an element e which acts as an identity for the binary operator: ex = xe = x

As an example, we might have A = {2, 3, 5, …} be the prime numbers, B = {1, 2, 3, 4, 5, …} be the positive whole numbers, f(n) = n be the obvious injection function, ⊗ be multiplication, and (of course) e = 1. Then B is a monoid on A.

A homomorphism from the free monoid to B is a function h which respects the monoid-on-A structure. That is:

  • h(⟨⟩) = e
  • h(⟨a⟩) = f(a)
  • h(xy) = h(x) ⊗ h(y)

As a matter of fact, these restrictions uniquely define the homomorphism from the free monoid to B to be the function which maps the sequence ⟨a1an⟩ to f(a1) ⊗ ⋯ ⊗ f(an).

In other words, simply specifying the monoid B with its function f from A to B and its binary operator ⊗ uniquely defines the homomorphism from the free monoid on A. Furthermore, this homomorphism logically splits into two parts:

  • Map: apply the function f to every element of the input sequence ⟨a1an
  • Reduce: combine the results of mapping using the binary operator, to give f(a1) ⊗ ⋯ ⊗ f(an)

The combination of map and reduce is inherently parallel, since the binary operator ⊗ is associative. If our input sequence is spread out over a hundred computers, each can apply map and reduce to its own segment. The hundred results can then be sent to a central computer where the final 99 ⊗ operations are performed. Among other organisations, Google has made heavy use of this MapReduce paradigm, which goes back to Lisp and APL.

R also provides support for the basic map and reduce operations (albeit with some inconsistencies):

Vectors Lists Dataframes Strings
Map with f sapply(v,f), purrr::map_dbl(v,f) and related operators, or simply f(v) for vectorized functions lapply(x,f) or purrr::map(x,f) Vector operations on columns, possibly with dplyr::mutate, dplyr::transmute, purrr::pmap, or mapply Not possible, unless strsplit or tokenisation is used
Reduce with ⊗ Reduce(g,v), purrr::reduce(v,g), or specific functions like sum, prod, and min purrr::reduce(x,g) Vector operations on columns, or specific functions like colSums, with purrr::reduce2(x,y,g) useful for two-column dataframes Not possible, unless strsplit or tokenisation is used

It can be seen that it is particularly the conceptual reduce operator on dataframes that is poorly supported by the R language. Nevertheless, the map and reduce operations are both powerful mechanisms for manipulating data.

For non-associative binary operators, purrr::reduce(x,g) and similar functions remain extremely useful, but they become inherently sequential.

For more about purrr, see purrr.tidyverse.org.


ASC 16: Tires


Tire on Bochum’s 2017 car (photo: Anthony Dekker)

In response to my Challenger strategy post, someone asked “what about tires?”

Yes, the rolling resistance of tires is a factor with solar cars. As tires rotate, the rubber in them flexes, and some energy gets lost this way. For a top Challenger class car, with good tires, about 85% of the solar energy is lost through aerodynamic drag, and about 15% through tires. That’s why my simple car model considered only aerodynamic drag.

The force required to overcome rolling resistance is g = 9.8 times the weight of the car times a tire-specific coefficient (ranging from 0.002 for an expensive solar car tire to 0.01 for a typical car tire). For a top Challenger class car, car and driver together will weigh around 250 kg. For a four-person Cruiser, it’s more like 800 kg – about triple. Bearing in mind that Cruisers often have tires with a higher coefficient of rolling resistance, this means that rolling resistance becomes quite significant with Cruisers. That is why Cruisers will sometimes turf out passengers to cut their energy use – even when running on a flat road. Climbing the mountains, overcoming gravity will come into play, and that uses up even more energy.


In ASC race news, I have updated my information page and teams list with news about day 1 of scrutineering.


ASC 13: Challenger strategy

This will be a lengthy post on race strategy. I will concentrate on Challenger-class cars, for which the main decision is at what speed to drive (there are also some tactical and psychological issues associated with overtaking, but I won’t get into them here).

The kind of analysis I’m doing assumes that teams have a good predictive model of how their car will perform under various conditions (collecting the data required to build such a model is yet another reason to do lots of testing). I’m using a very simplistic model of a hypothetical car – I only took half a day to build it (in R, of course); a real solar car strategy guy would take months and produce something much more detailed. I’m assuming that aerodynamic drag is the main consumer of energy. Remember that the drag force is:

F = ½ Cd A ρ v2

where Cd is the drag coefficient, A is the frontal area of the car, ρ is the density of air (about 1.2), and v is the speed of the car. The key thing here is that faster speeds burn up more energy.

I’m further assuming a solar model loosely based on the region of the American Solar Challenge, a perfectly flat road (no, that’s not realistic), a race start time of 8:00 AM, and a finish time of 5:00 PM with no rest breaks (no, that’s not realistic either). My hypothetical car has 4 square metres of solar panel with 25% efficiency, and a 5 kWh battery pack which is 80% full at the start of the day. In my first simulation, my car runs at constant speed until 5:00 PM or the battery dies, whichever comes first (solar cars are not intended to run with totally empty batteries). This chart shows what happens:

Initially, the graph is a straight line. The faster you drive, the further you go. But this only lasts up to 73.7 km/h, which takes you a distance of 663 km. Faster than that, and your battery dies before the end of the day, so that you go less far in total. In reality, of course, the driver would slow down before the battery died completely, but I’m not modelling that.

My second chart looks at varying the speed of the car. We start the day at one speed (horizontal axis), gradually speeding up or slowing down to reach a second speed at solar noon (vertical axis), and then slowly shifting back so as to end the day at the start speed:

In the chart, the distance travelled is shown by colour and number. It can be seen that, at least for my hypothetical car, the optimum is fairly forgiving: speeds between about 60 and 85 km/h are OK, as long as they average out to about 73 km/h, which means 653 km travelled. You will also notice that there is no incentive to break the speed limit – the rules have been carefully constructed to ensure that this is true.

In my third chart, I’m assuming a patch of cloud on the road ahead, between 200 km and 400 km from the start. We start the day at one speed (horizontal axis), switch to a second speed once inside the cloud (vertical axis), and then switch to a third speed when leaving the cloud (the third speed is chosen to be whatever works best with the remaining battery charge).

Two vertical stripes are noticeable on the left and right sides of the chart. At 20 km/h, the race day ends before you reach the cloud, so that the choice of “cloud speed” doesn’t actually matter. And at 100 km/h, your battery dies before you reach the cloud, so that the choice of “cloud speed” doesn’t matter either. The optimum is to run at a constant 70 km/h initially, and then to speed up to 75 km/h inside the cloud (so as to get back into sunshine that little bit sooner). If you run at an optimal 63 km/h after leaving the cloud, this lets you cover 619 km.

The final scenario changes the one above to have a moving cloud. This cloud is circular and once again 200 km in diameter, with a centre 300 km away from the start position. The edge of the cloud is initially 210 km from the road, but moving at a rapid 50 km/h towards and across it:

The optimum here is to outrun the cloud: to run at 85 km/h till the cloud is past, and then to drop back to whatever speed gives you a 73 km/h average. Not surprisingly, this gives essentially the same outcome as my first varying-speed example. A solution that is almost as good is to run at 70 km/h initially, and speed up to 80 km/h once inside the cloud. This gives you about an hour and 20 minutes inside the cloud. Slowing down to an optimal 70 km/h after leaving the cloud then lets you cover 642 km.

In real life, outrunning clouds and speeding up slightly through them can both be feasible options – if you have the right prediction software, and if know exactly what the weather is doing (which is the hard part). This is why the strategy guys in the chase car are always so busy!


Nuon’s WSC 2011 chase car (image credit)


ASC 9: Aerodynamics

For both the Challenger class and the Cruiser class, solar car racing is to a large extent about aerodynamic drag. That’s overwhelmingly what the hard-earned solar energy is being wasted on, and therefore it’s what teams need to concentrate on minimising. The drag force on a car is given by the equation:

F = ½ Cd A ρ v2

Breaking that down, v is the speed of the car, ρ is the density of air (about 1.2), A is the frontal area of the car, and Cd is the drag coefficient, a number which indicates how aerodynamic (and therefore, how energy-efficient) the shape of the car is. For Challengers, minimising Cd allows the speed v to be increased, while for Cruisers (for which the average v is essentially given), minimising Cd allows non-solar energy use to be minimised. Of course, minimising frontal area is important too (and that is the motivation behind asymmetric Challenger cars).

To give a feeling for the all-important Cd, here are some vehicles with values ranging from 0.19 to 0.57:


Drag coefficients for a selection of vehicles. Clockwise from top left: 0.57 – Hummer H2 (photo: Thomas Doerfer); 0.30 – Saab 92 (photo: “Liftarn”); 0.26 – BMW i8 (photo: “youkeys”); 0.19 – General Motors EV1 (photo: Rick Rowen)

Because Cd is so all-important, it is the one thing that solar car teams are really secretive about. Challengers generally have values under 0.1. With no need for practicality, they chase their way towards the impossible goal of Cd = 0, trying to come up with the perfect race car, which will slice through air like a hot knife through butter:


Nuon’s 2005 car, Nuna 3, with Cd = 0.07 (photo: Hans-Peter van Velthoven)

Cruisers, on the other hand, have to balance aerodynamics with practicality. Bochum’s early SolarWorld GT had Cd = 0.137:


Bochum’s 2011 car, SolarWorld GT, with Cd = 0.137 (photo: “SolarLabor”)

Eindhoven’s recent Stella Vie, with its sleek aerodynamic shape, does much better than that (but they won’t say how much better):


Eindhoven’s 2017 car, Stella Vie (photo: TU Eindhoven, Bart van Overbeeke)

I understand that Sunswift’s 2013–2015 car eVe had Cd = 0.16. Appalachian State (Sunergy) have stated that their newly-built ROSE has Cd = 0.17. PrISUm’s Penumbra has a higher value (Cd = 0.2), because of the blunt end which they chose for practicality reasons (although they did do a few clever things to reduce the impact of that blunt end). I’m not aware of the Cd values for other ASC cars.


Appalachian State’s beautiful ROSE, with Cd = 0.17 (image credit)