An important concept in computer science is the **free monoid** on a set *A*, which essentially consists of **sequences** ⟨*a*_{1}…*a*_{n}⟩ of elements drawn from *A*. The key operations on the free monoid are:

- ⟨
*a*⟩, forming a singleton sequence from a single element of*A* *x*⊕*y*,**concatenation**of the sequences*x*and*y*, which satisfies the**associative law**: (*x*⊕*y*)⊕*z*=*x*⊕(*y*⊕*z*)- ⟨⟩, the empty sequence, which acts as an identity for concatenation: ⟨⟩⊕
*x*=*x*⊕⟨⟩ =*x*

The free monoid satisfies the mathematical definition of a **monoid**, and is **free** in the sense of satisfying nothing else. There are many possible implementations of the free monoid, but they are all mathematically equivalent, which justifies calling it **the** free monoid.

In the R language, there are four main implementations of the free monoid: vectors, lists, dataframes (considered as sequences of rows), and strings (although for strings itâ€™s difficult to tell where elements start and stop). The key operations are:

Vectors | Lists | Dataframes | Strings | |
---|---|---|---|---|

⟨⟩, empty | `c()` |
`list()` |
`data.frame(n=c())` |
`""` |

⟨a⟩, singleton |
implicit (single values are 1-element vectors) | `list(a)` |
`data.frame(n=a)` |
`as.character(a)` |

x⊕y, concatenation |
`c(x,y)` |
`c(x,y)` |
`rbind(x,y)` |
`paste0(x,y)` |

An **arbitrary** monoid on a set *A* is a set *B* equipped with:

- a function
*f*from*A*to*B* - a binary operation
*x*⊗*y*, which again satisfies the**associative law**: (*x*⊗*y*)⊗*z*=*x*⊗(*y*⊗*z*) - an element
*e*which acts as an identity for the binary operator:*e*⊗*x*=*x*⊗*e*=*x*

As an example, we might have *A* = {2, 3, 5, …} be the prime numbers, *B* = {1, 2, 3, 4, 5, …} be the positive whole numbers, *f*(*n*) = *n* be the obvious injection function, ⊗ be multiplication, and (of course) *e* = 1. Then *B* is a monoid on *A*.

A **homomorphism** from the free monoid to *B* is a function *h* which respects the monoid-on-*A* structure. That is:

*h*(⟨⟩) =*e**h*(⟨*a*⟩) =*f*(*a*)*h*(*x*⊕*y*) =*h*(*x*) ⊗*h*(*y*)

As a matter of fact, these restrictions uniquely define the homomorphism from the free monoid to *B* to be the function which maps the sequence ⟨*a*_{1}…*a*_{n}⟩ to *f*(*a*_{1}) ⊗ ⋯ ⊗ *f*(*a*_{n}).

In other words, simply specifying the monoid *B* with its function *f* from *A* to *B* and its binary operator ⊗ uniquely defines the homomorphism from the free monoid on *A*. Furthermore, this homomorphism logically splits into two parts:

**Map**: apply the function*f*to every element of the input sequence ⟨*a*_{1}…*a*_{n}⟩**Reduce**: combine the results of mapping using the binary operator, to give*f*(*a*_{1}) ⊗ ⋯ ⊗*f*(*a*_{n})

The combination of **map** and **reduce** is inherently parallel, since the binary operator ⊗ is associative. If our input sequence is spread out over a hundred computers, each can apply map and reduce to its own segment. The hundred results can then be sent to a central computer where the final 99 ⊗ operations are performed. Among other organisations, Google has made heavy use of this **MapReduce** paradigm, which goes back to Lisp and APL.

R also provides support for the basic **map** and **reduce** operations (albeit with some inconsistencies):

Vectors | Lists | Dataframes | Strings | |
---|---|---|---|---|

Map with f |
`sapply(v,f)` , `purrr::map_dbl(v,f)` and related operators, or simply `f(v)` for vectorized functions |
`lapply(x,f)` or `purrr::map(x,f)` |
Vector operations on columns, possibly with `dplyr::mutate` , `dplyr::transmute` , `purrr::pmap` , or `mapply` |
Not possible, unless `strsplit` or tokenisation is used |

Reduce with ⊗ | `Reduce(g,v)` , `purrr::reduce(v,g)` , or specific functions like `sum` , `prod` , and `min` |
`purrr::reduce(x,g)` |
Vector operations on columns, or specific functions like `colSums` , with `purrr::reduce2(x,y,g)` useful for two-column dataframes |
Not possible, unless `strsplit` or tokenisation is used |

It can be seen that it is particularly the conceptual **reduce** operator on dataframes that is poorly supported by the R language. Nevertheless, the **map** and **reduce** operations are both powerful mechanisms for manipulating data.

For **non-associative** binary operators, `purrr::reduce(x,g)`

and similar functions remain extremely useful, but they become inherently sequential.

For more about purrr, see purrr.tidyverse.org.