Sequences, R, and the Free Monoid

An important concept in computer science is the free monoid on a set A, which essentially consists of sequencesa1an⟩ of elements drawn from A. The key operations on the free monoid are:

  • a⟩, forming a singleton sequence from a single element of A
  • xy, concatenation of the sequences x and y, which satisfies the associative law: (xy)⊕z = x⊕(yz)
  • ⟨⟩, the empty sequence, which acts as an identity for concatenation: ⟨⟩⊕x = x⊕⟨⟩ = x

The free monoid satisfies the mathematical definition of a monoid, and is free in the sense of satisfying nothing else. There are many possible implementations of the free monoid, but they are all mathematically equivalent, which justifies calling it the free monoid.

In the R language, there are four main implementations of the free monoid: vectors, lists, dataframes (considered as sequences of rows), and strings (although for strings it’s difficult to tell where elements start and stop). The key operations are:

Vectors Lists Dataframes Strings
⟨⟩, empty c() list() data.frame(n=c()) ""
a⟩, singleton implicit (single values are 1-element vectors) list(a) data.frame(n=a) as.character(a)
xy, concatenation c(x,y) c(x,y) rbind(x,y) paste0(x,y)

An arbitrary monoid on a set A is a set B equipped with:

  • a function f from A to B
  • a binary operation xy, which again satisfies the associative law: (xy)⊗z = x⊗(yz)
  • an element e which acts as an identity for the binary operator: ex = xe = x

As an example, we might have A = {2, 3, 5, …} be the prime numbers, B = {1, 2, 3, 4, 5, …} be the positive whole numbers, f(n) = n be the obvious injection function, ⊗ be multiplication, and (of course) e = 1. Then B is a monoid on A.

A homomorphism from the free monoid to B is a function h which respects the monoid-on-A structure. That is:

  • h(⟨⟩) = e
  • h(⟨a⟩) = f(a)
  • h(xy) = h(x) ⊗ h(y)

As a matter of fact, these restrictions uniquely define the homomorphism from the free monoid to B to be the function which maps the sequence ⟨a1an⟩ to f(a1) ⊗ ⋯ ⊗ f(an).

In other words, simply specifying the monoid B with its function f from A to B and its binary operator ⊗ uniquely defines the homomorphism from the free monoid on A. Furthermore, this homomorphism logically splits into two parts:

  • Map: apply the function f to every element of the input sequence ⟨a1an
  • Reduce: combine the results of mapping using the binary operator, to give f(a1) ⊗ ⋯ ⊗ f(an)

The combination of map and reduce is inherently parallel, since the binary operator ⊗ is associative. If our input sequence is spread out over a hundred computers, each can apply map and reduce to its own segment. The hundred results can then be sent to a central computer where the final 99 ⊗ operations are performed. Among other organisations, Google has made heavy use of this MapReduce paradigm, which goes back to Lisp and APL.

R also provides support for the basic map and reduce operations (albeit with some inconsistencies):

Vectors Lists Dataframes Strings
Map with f sapply(v,f), purrr::map_dbl(v,f) and related operators, or simply f(v) for vectorized functions lapply(x,f) or purrr::map(x,f) Vector operations on columns, possibly with dplyr::mutate, dplyr::transmute, purrr::pmap, or mapply Not possible, unless strsplit or tokenisation is used
Reduce with ⊗ Reduce(g,v), purrr::reduce(v,g), or specific functions like sum, prod, and min purrr::reduce(x,g) Vector operations on columns, or specific functions like colSums, with purrr::reduce2(x,y,g) useful for two-column dataframes Not possible, unless strsplit or tokenisation is used

It can be seen that it is particularly the conceptual reduce operator on dataframes that is poorly supported by the R language. Nevertheless, the map and reduce operations are both powerful mechanisms for manipulating data.

For non-associative binary operators, purrr::reduce(x,g) and similar functions remain extremely useful, but they become inherently sequential.

For more about purrr, see purrr.tidyverse.org.


Advertisements

Mathematics in Action: Vehicle Identification Numbers

Motor vehicles have a 17-character Vehicle Identification Number or VIN on a metal plate like the one below, usually on the driver’s side dashboard, or on the driver’s side door jamb, or in front of the engine block:


A Vehicle Identification Number (VIN) plate (Photo: Michiel1972)

VINs offer an interesting example of check digit calculation. The central digit (or an X representing 10) is a check digit (calculated modulo 11) used to detect errors. Any letters in the rest of the VIN are decoded like this:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 2 3 4 5 6 7 8 1 2 3 4 5 7 9 2 3 4 5 6 7 8 9

The check digit calculation involves decoding the VIN, and multiplying the resulting numbers by the weights shown in blue, giving the products in purple:

VIN L J C P C B L C X 1 1 0 0 0 2 3 7
Decoded 3 1 3 7 3 2 3 3 10 1 1 0 0 0 2 3 7
Weights 8 7 6 5 4 3 2 10 0 9 8 7 6 5 4 3 2
Product 24 7 18 35 12 6 6 30 0 9 8 0 0 0 8 9 14

These products are added up modulo 11 (meaning the sum is divided by 11 and the remainder taken). In this case, the sum is 186 = 10 = X (mod 11), which makes the VIN valid, because it matches the original central X. What about the VIN on your vehicle?


The Sand Reckoner

In his short work The Sand Reckoner, Archimedes (c. 287 BC – c. 212 BC) identifies a number larger than what he believed was the number of grains of sand which would fit into the Universe. He was hampered by the fact that the largest number-word he knew was myriad (10,000), so that he had to invent his own notation for large numbers (I will use modern scientific notation instead).

Archimedes’ began with poppyseeds, which he estimated were at least 0.5 mm in diameter (using modern terminology), and which would contain at most 10,000 grains of sand. This makes the volume of a sand-grain at least 6.5×10−15 cubic metres (in fact, even fine sand-grains have a volume at least 10 times that).

Archimedes estimated the diameter of the sphere containing the fixed stars (yellow in the diagram below) as about 2 light-years or 2×1016 metres (we now know that even the closest star is about 4 light-years away). This makes the volume of the sphere 4×1048 cubic metres which means, as Archimedes shows, that less than 1063 grains of sand will fit.

A more modern figure for the diameter of the observable universe is 93 billion light-years, which means that less than 1095 grains of sand would fit. For atoms packed closely together (as in ordinary matter), less than 10110 atoms would fit. For neutrons packed closely together (as in a neutron star), less than 10126 neutrons would fit. But these are still puny numbers compared to, say, 277,232,917 − 1, the largest known prime!


Snakes and Ladders


Snakes and Ladders board, dated 1966, from the Auckland Museum (credit)

Snakes and Ladders is an ancient board game originating in India. It is totally random, and hence not very interesting. If players start on square #1, then after one turn, they have equal probabilities of being on squares #2, #3, #4, #5, #6, and #7. This image shows the probability distribution:

After two turns, the probability distribution is as follows (the most likely total of two dice rolls is 7, taking a player to square #8 and up a ladder to #26:

After 8 turns, players would be scattered all over the board. There is a 1% chance that any given player has won:

After 19 turns, there is a 24.7% chance that any given player has won:

This probability grows to 50.4% after 35 turns. But no matter how long you play, it remains possible (though increasingly unlikely) that nobody has won yet. Yet another reason why children tend to rapidly tire of the game.

For an alternative view of the probability analysis, see this animation:


The Game of Mu Torere

The New Zealand game of mū tōrere is illustrated above with a beautiful handmade wooden board. The game seems to have been developed by the Māori people in response to the European game of draughts (checkers). Play is quite different from draughts, however. The game starts as shown above, with Black to move first. Legal moves involve moving a piece to an adjacent empty space:

  • along the periphery (kewai), or
  • from the centre (pūtahi) to the periphery, or
  • from the periphery to the centre, provided the moved piece is adjacent to an opponent’s piece.

Game play continues forever until a draw is called (by mutual consent) or a player loses by being unable to move. Neither player can force a win, in general, so a loss is always the result of a mistake. For each player there is one “big trap” and four “small traps.” This is the “big trap” (Black wins in 5 moves):

  
The board on the left is the “big trap” for White – Black can force a win by moving as shown, which leaves only one move for White.

  
Again, Black moves as shown, which leaves only one move for White.

  
Now, when Black moves as shown, White cannot move, which means that White loses.

Here is one of the four “small traps” for White. The obvious move by Black results in White losing (but avoiding this does not require looking quite so far ahead as with the “big trap”):

Here (click to zoom) is the complete network of 86 game states for mū tōrere (40 board positions which can occur in both a “Black to move” and a “White to move” form, plus 6 other “lost” board positions). Light-coloured circles indicate White to move, and dark-coloured circles Black to move, with the start position in blue at the top right. Red and pink circles are a guaranteed win for Black, while green circles are a guaranteed win for White. Arrows indicate moves, with coloured arrows being forced moves. The diagram (produced in R) does not fully indicate the symmetry of the network. Many of the cycles are clearly visible, however:


Fibonacci and his birds (solution)

In the previous post, we described Fibonacci’s “problem of the birds” (“the problem of the man who buys thirty birds of three kinds for 30 denari”). In English:

“A man buys 30 birds of three kinds (partridges, doves, and sparrows) for 30 denari. He buys a partridge for 3 denari, a dove for 2 denari, and 2 sparrows for 1 denaro, that is, 1 sparrow for ½ denaro. How many birds of each kind does he buy?”

The man must buy at least one of each kind of bird, or he wouldn’t be buying “birds of three kinds.” Also, he must buy less than 10 partridges, because 10 partridges (at 3 denari each) would use up all his money. Similarly, he must buy less than 15 doves. We can thus make up a table of possible solutions:

Of those 126 possible solutions, only one works out correctly in terms of cost, and that’s the answer. But that’s an unbelievably tedious way of getting the answer, and you’d be rather foolish to try to solve the problem that way. The obvious approach is to use algebra. Write p for the number of partridges bought, d for the number of doves, and s for the number of sparrows. Because the man buys 30 birds, we have the equation:

p + d + s = 30

And because the costs add up to 30 denari, we have:

3 p + 2 d + ½ s = 30

Doubling that second equation gets rid of the annoying fraction:

6 p + 4 d + s = 60

If you’ve done any high school algebra, no doubt you want to subtract the first equation from this, which will eliminate the variable s:

5 p + 3 d = 30

But now what? That gives a relationship between the variables p and d, but there doesn’t seem to be enough information to get specific values for those variables.

Fibonacci solves the problem a different way. His solution is based on a key insight – the man buys 30 birds for 30 denari, so that the birds cost, on average, 1 denaro each. Fibonacci then makes up “packages” of birds averaging 1 denaro each. There are only two ways of doing this. Package A has 1 partridge and 4 sparrows (5 birds for 5 denari), and package B has 1 dove and 2 sparrows (3 birds for 3 denari). The solution will be a combination of those two packages.

Now the man can take 1, 2, 3, 4, or 5 copies of package A, leaving 25, 20, 15, 10, or 5 birds to be made up of package B. But the birds making up package B must be multiple of 3, so that the only possible answer is 3 copies of package A and 5 copies of package B. This means that the man buys 3 partridges, 5 doves, and 3×4 + 5×2 = 22 sparrows. That’s 30 birds and 3×3 + 2×5 + ½×22 = 30 denari.

Now it turns out that, had we kept on going with the algebraic approach, we would have gotten the same answer. We had:

5 p + 3 d = 30

Given that the numbers of partridges and doves (p and d) had to be positive whole numbers, that meant that p had to be a multiple of 3, and d a multiple of 5. That could only be achieved with p = 3 and d = 5.

We can also return to the diagrammatic approach. The equation:

5 p + 3 d = 30

describes the diagonal red line in the diagram below. That line only crosses one of the possible solutions, namely the dot corresponding to 3 partridges and 5 doves.

In mathematics, there’s more than one way to skin a cat. Or, in this case, a bird.


Fibonacci and his birds

The mathematician Leonardo of Pisa (better known as Fibonacci) is famous for his rabbits, but I was recently reminded of his “problem of the birds” or “the problem of the man who buys thirty birds of three kinds for 30 denari.” This problem appears in his influential book, the Liber Abaci.

The “problem of the birds” is expressed in terms of Italian currency of the time – 12 denari (singular: denaro) made up a soldo, and 20 soldi made up a lira. In the original Latin, the problem reads:

“Quidam emit aves 30 pro denariis 30. In quibus fuerunt perdices, columbe, et passeres: perdices vero emit denariis 3, columba denariis 2, et passeres 2 pro denario 1, scilicet passer 1 pro denariis ½. Queritur quot aves emit de unoquoque genere.”

In English, that translates to:

“A man buys 30 birds of three kinds (partridges, doves, and sparrows) for 30 denari. He buys a partridge for 3 denari, a dove for 2 denari, and 2 sparrows for 1 denaro, that is, 1 sparrow for ½ denaro. How many birds of each kind does he buy?”

How many birds of each kind does the man buy? It may help to cut out and play with the bird tokens below (click image to zoom). In a similar vein, what if the man buys birds as follows (still purchasing birds of all three kinds, and at the same price)?

  • 4 birds for 6 denari
  • between 6 and 10 birds for twice as many denari as birds
  • 8, 11, 13–14, 16–22, 24–25, or 27 birds for the same number of denari as birds
  • 8 birds for 12 denari
  • 12 birds for 18 denari
  • 16 birds for 12 denari
  • 28 birds for 21 denari
  • 6, 8–9, or 14 birds for 11 denari
  • 7–10, 12, 15, or 18 birds for 13 denari

Solution to the main problem here.