Thinking about complexity

Posted on June 28, 2014 by Tony

I was recently involved in a discussion on complexity. Complexity seems like a natural idea – “abababababababababababababababababababababababababababababab” is a simple sequence of letters, while “Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune” is a complex one. Actually formalising this idea is a little tricky, however. As with some other concepts (time, for example), we recognise complexity when we see it, but actually defining complexity is difficult. One of the leading approaches is Kolmogorov complexity.

Roughly, Kolmogorov complexity measures the complexity of a sequence by the simplest program that can generate that sequence, which is a formal way of finding the simplest description of the sequence. For example, that first sequence is simple because it can be described as “ab”×30.

The simplest programming language I know is combinatory logic (much simpler even than Turing machines). Combinatory logic has all the theoretical power of any other language, but programs are composed only of brackets and the constants S and K (which I will treat as equivalent to 0 and 1). The brackets satisfy (x y) z = x y z. Execution proceeds by term rewriting, and there are just two execution rules:

• S x y z → x z (y z)

• K x y → x

We can set up a version of Kolmogorov complexity based on combinatory logic. Through Gödel numbering, each program in combinatory logic is associated with a natural number. Execution of the programs gives finite or infinite sequences of S, K, and brackets. I will treat S as 0 and K as 1, ignoring the brackets. I will treat finite sequences as repeating.

Program #0 in the Gödel numbering is just the constant S, which terminates immediately and is equivalent to producing the sequence “0,0,0,…” Program #1 is just the constant K, equivalent to “1…” A slightly more complex program is #96, which is S S K K S, and executes as follows:

1. S S K K S → S K (K K) S

2. S K (K K) S → K S ((K K) S) = K S (K K S)

3. K S (K K S) → S

Execution stops with S in this case, which is equivalent to producing the sequence “0…”

I will take the simplest program producing a sequence to be the first program in the Gödel numbering, with the complexity of the sequence being the number of bits in the Gödel number. The two sequences “0…” and “1…” therefore both have complexities of one bit. Repeating pairs have complexities of 2 or 3 bits, while repeating triples have complexity 4 to 6 (see below). Because programs may produce unproductive infinite loops, actual calculation of the complexity of a sequence is not always possible.

Sequence	Program	Complexity
010101010101010101010101010101 = 01…	#3	2
101010101010101010101010101010 = 10…	#4	3
010010010010010010010010010010 = 010…	#8	4
001001001001001001001001001001 = 001…	#9	4
100100100100100100100100100100 = 100…	#13	4
011011011011011011011011011011 = 011…	#15	4
101101101101101101101101101101 = 101…	#25	5
110110110110110110110110110110 = 110…	#49	6

Some more complex sequences (having non-trivial descriptions) are listed below. The highest complexity will be associated with random sequences – which implies that random-number generators are machines for creating complexity. In the images at the top of the page, the random pattern of mineral flecks in granite therefore form the most complex pattern. That may or may not be what we intended the word “complexity” to mean.

Sequence	Program	Complexity
010000100001000010000100001000 = 01000…	#141	8
001010010100101001010010100101 = 00101…	#153	8
010110101101011010110101101011 = 01011…	#167	8
000110010001100100011001000110 = 00011001…	#183	8
010000100010000100010000100010 = 010000100…	#189	8
001000001000001000001000001000 = 001000…	#198	8
000101001000101001000101001000 = 000101001…	#201	8
001000010010000100100001001000 = 00100001…	#216	8
001000100100010010001001000100 = 0010001…	#233	8
010100101010010101001010100101 = 0101001…	#251	8
010101101010101101010101101010 = 010101101…	#275	9
001100011000110001100011000110 = 00110…	#305	9
000010000001000000100000010000 = 0000100…	#333	9
001110110011101100111011001110 = 00111011…	#359	9
010110010110010110010110010110 = 010110…	#369	9
001010001010001010001010001010 = 001010…	#392	9
010100011010001010100011010001 = 010100011010001…	#407	9
000001010000100000101000010000 = 0000010100001…	#425	9
001000010001000010001000010001 = 001000010…	#434	9
010001001000100100010010001001 = 0100010…	#465	9
001001101001001001101001001001 = 001001101001…	#471	9

Simulating the World Solar Challenge

Posted on June 23, 2014 by Tony

Inspired by the recent guest post by Georg Russ at solarracing.org, I have built a (very simplistic) simulation of the World Solar Challenge in NetLogo. The model and associated map file can be downloaded here.

The image below shows a snapshot of the simulation (in which I have completely ignored control stops). There are three graphs on the left. The first graph plots energy from the solar panels (following the discussion by Georg Russ). The second graph plots the speed of Car 1 (the blue car, running at 75 km/h) as well as the battery state (as a percentage of full charge).

The third graph (in the style of my race charts for WSC 2013) plots distance from Darwin on the horizontal axis with time (relative to an 80 km/h baseline speed) on the vertical axis. In other words, a vertical position of 1 hour means that a car is running 1 hour behind the baseline speed. A steady 80 km/h speed would thus be indicated by a horizontal line, with faster speeds sloping downwards and slower speeds sloping upwards. The graph shows that Car 3 (pink) has been running ahead of the baseline speed, but only by draining its battery. A close examination of the graph shows that Car 3 has already been forced to slow down. This highlights the need to strategically choose car speed, for the reasons discussed by Georg Russ.

The afternoon of Day 1: Car 3 (pink) is leading, but has started to slow because of a drained battery (click to zoom)

For a video of the running simulation, see here.

One feature of the race is that the early-morning and late-evening sunshine can provide substantial charge, if the panels are tilted to face the sun. It will be interesting to see how the new WSC rules impact this practice.

Nuon Solar Team gathering sunshine during WSC 2013 (photo: Jorrit Lousberg)

The image below shows a second snapshot of the simulation on day 5, after Car 1 (blue) has won. Notice that the solar energy has varied with the (imaginary) weather conditions (top graph), causing Car 1 to slow down with a drained battery on the fourth day (middle graph). This highlights the importance of weather forecasting in race strategy – it would have been better to run at a sustainable steady speed. In 2013, Solar Team Twente was assisted in this regard by an attached military weather forecaster – who blogged his (Dutch) story here (robot-translated here).

The afternoon of Day 5: Car 1 (blue) has won (click to zoom)

To underscore the weather issue even more, here is a map (from the Australian Bureau of Meteorology) of solar exposure on the rainy fifth day of WSC 2013:

Below is my race chart for WSC 2013, where the baseline speed was 97 km/h (which equates to an overall average speed of 85 km/h when forced waits at control stops are included). Cruiser Class entries (shown as dashed lines) were treated exactly as if they were in the Challenger Class, which means that for those cars the chart tells us road position, but is not very helpful on speed. Since the vertical axis represents hours behind the baseline speed, arrival times can be read off on the right-hand scale. Clearly visible on the graph are a miscalculation by Team Tokai concerning race strategy on the rainy fifth day, and some problems experienced by Michigan and Punch Powertrain.

Eight unsolved mathematical problems

Posted on June 20, 2014 by Tony

Eight things to think about…

1) Are there there infinitely many twin prime pairs p and p+2? We have 3/5, 5/7, 11/13, 17/19, etc. Does that sequence go on forever?

2) Are there there infinitely many Sophie Germain prime pairs p and 2p+1? We have 2/5, 3/7, 5/11, 11/23, etc. Does that sequence go on forever?

3) Is every even number greater than 4 the sum of two odd primes? We have 6 = 3 + 3, 8 = 3 + 5, 10 = 3 + 7, 12 = 5 + 7, etc. Does that always work?

4) Are there infinitely many perfect numbers? We have 6 = 1 + 2 + 3, 28 = 1 + 2 + 4 + 7 + 14, 496, 8128, 33550336, etc. Does that sequence go on forever? And are there any odd numbers in the sequence?

5) Is π a normal number? That is, do the digits 0 to 9 occur equally often in 3.14159265358979323846264338327950288419716939937510582…, and does that also work for bases other than decimal?

6) Do the nontrivial zeros of the Riemann zeta function all have real part ½?

7) Does P = NP? This is perhaps the most important (and most famous) unsolved problem in computer science, and there is a million-dollar prize for solving it.

8) Do smooth solutions to the Navier–Stokes equations always exist? There is a million-dollar prize for this one too.

Visual modalities and data visualisation

Posted on June 15, 2014 by Tony

In this post we will look at some visual modalities for data visualisation – size, colour, and shape. People use these all the time, but not always well. We will use the following matrix of numbers as an example, encoding the numbers in different ways (large values are highlighted in the image on the right):

Using shape to convey magnitude does not reveal the pattern here, since the triangles do not stand out from the diamonds and squares (the relationship between magnitude and shape is also rather arbitrary):

However, both size and brightness succeed in getting the message across (size works best):

On the other hand, varying only the hue or the saturation of colours does a poor job:

Combining hue and saturation variation with brightness variation, as in the various sequential ColorBrewer palettes, works very well – which is why this is recommended for colouring maps:

See also my three lenses on data visualisation post from last year and this paper by Cynthia Brewer on colour use guidelines.

A weak BICEP2?

Posted on June 12, 2014 by Tony

Earlier this year, the team behind the Background Imaging of Cosmic Extragalactic Polarization instrument, or BICEP2 (photo below by “Amble”), reported the discovery of primordial gravitational waves – supporting the theory of cosmic inflation. It now seems that their observations could instead be explained by cosmic dust polarisation. In a recent Nature column, Princeton physicist Paul Steinhardt outlines some ideas for analysing the results of future experiments of this kind. He also makes the rather disturbing suggestion that inflationary theory may be unfalsifiable. If that is true, is it really science at all?

Revisiting Artificial Anasazi: a NetLogo tutorial (part 2)

Posted on June 10, 2014 by Tony

This post continues my tutorial revisiting the famous “Artificial Anasazi” model of Axtell et al. (2002), ported to NetLogo by Marco Janssen in 2008 (see his paper). I have rewritten the model for explanatory purposes (though very much inspired by Janssen’s work). Part 1 of the tutorial looked at the patch dynamics, and we now turn to modelling the people.

The model studies the population of Anasazi farmers living in Long house Valley (near Black Mesa in Arizona) between 800 and 1350 AD. The image below shows the NetLogo view of this valley, as modelled for 1000 AD, with each person representing a household. For purposes of simplicity, I am not modelling the availability of drinking water and the locations of dwellings. The locations of the people on the diagram below therefore represent farm locations. However, because I am not modelling water availability and dwelling locations, the spatial distribution of people should not be taken too seriously – I am only hoping to accurately model population numbers.

Anasazi society was matriarchal (households were headed by women) and matrilocal (daughters lived near their mothers). Since archaeology (counting and dating house ruins) gives us household numbers (rather than people numbers), the model operates at a household level. However, food consumption calculations are based on five people per household. Young women are assumed to marry at age 16, so that the possible history of a household is as follows:

Household age	Matriarch age	First possible daughter	Last possible daughter
0 (begins)	16	–	–
1	17	0 (born)	–
17	33	16 (marries)	–
18	34	–	–
19	35	–	–
20	36	–	0 (born)
36	52	–	16 (marries)
37	53	–	–
38	54	–	–

The earliest age that a household can reproduce (spin off a daughter household) is 17 (at which time the mother is aged 33). Fertility is assumed to continue to age 36 (based on the discussion in Axtell et al.), which means that the latest possible daughter household is produced at a household age of 36. Matriarchs are assumed to die after age 54, so that the oldest possible household is 38 years (which is perhaps a little old for a society of this kind).

The Anasazi were able to store corn (according to Axtell et al., for 2 years), which gave them the ability to ride out short droughts. Daughter households were gifted some of their mother’s corn (about one third), to get them started in life. This gives us the following constants for the model:

to setup
  ...
  set start-year           800        ;; initial year
  set food-requirement     (5 * 160)  ;; a person needs 160 kg of corn each year, while a typical household consist of 5 persons
  set fertility-age        17         ;; the minimum age of agents (households) that can reproduce (spin off daughter households)
  set fertility-end-age    36         ;; the maximum age of agents (households) that can reproduce
  set death-age            38         ;; the age after which agents (households) die (dissolve when the matriarch dies)
  set corn-gift-ratio      0.33       ;; each new household gets this fraction of the corn storage of the parent household
  set corn-stock-years     2          ;; corn can be stored for this many years
  ...
end

Households must make certain decisions. Initially, and in times of famine, they must choose a (new) farm. We introduce a reporter (function) to return an estimate of the productivity of a farm (patch). This reporter will be used twice, so it is good practice to give the calculation its own name. The result of the reporter is in fact just be the base yield of a patch, but we could expand it to incorporate historical information and/or decision errors. Giving the calculation its own name makes it easier to include such extensions down the track.

to-report estimated-farm-yield
  report base-yield
end

We use this estimate to generate an agent set of potential farms – those that produce enough food for a household and are not being farmed by anybody else:

to-report potential-farms
  report (patches with [ estimated-farm-yield >= food-requirement and not being-farmed? ])
end

The following reporter picks out the best farm from an agent set. If there are no suitable farms, the household is assumed to leave the valley, which is equivalent (from a model point of view) to dying. But what is the “best farm”? The matrilocal structure means that agents try not to move very far (i.e. they select a low distance from their existing location), but we also introduce a slight bias towards a high estimated farm yield (minimising 1000 / estimated-farm-yield). The NetLogo built in min-one-of operator does the hard work of finding the patch with the lowest combined score. Incidentally, for distances to be calculated correctly, the NetLogo world must be set to not “wrap” either horizontally or vertically.

to find-best-farm [ farms ]
  ifelse (count farms = 0)
    [ die ]
    [ let existing-farm patch-here
      let best-farm (min-one-of farms [ distance existing-farm + 1000 / estimated-farm-yield ])
      ask best-farm [ set being-farmed? true ]
      move-to best-farm ]
end

According to the archaeological record, the population of the valley began with 14 households. We can initialise them with the code below (which assumes that households are what NetLogo calls a “breed”). We use a utility procedure initialise-household to set various attributes. This procedure takes a starting age as a parameter. Starting ages run from 0 to 28, but we will use the same procedure later for new-born households, which will always start with an age of zero. We also give each household an initial list of corn storage stocks, divided into ages of 2, 1, and 0 years. The oldest corn (at the front of the list) will be eaten first, and corn older than 2 years will be thrown away. The NetLogo built in n-values operator generates a list of the required length, calling random-float three times.

to setup
  ...
  set-default-shape households "person"
	
  calculate-patch-yields  ;; needed to set up decision-making
  create-households 14 [
    initialise-household (random 29)
    set corn-storage (n-values (corn-stock-years + 1) [ 600 + random-float 400 ])
    find-best-farm potential-farms
  ]
  ...
end

to initialise-household [ start-age ]
  set age start-age
  set harvest 0
  set size 4
  set color black
end

Closely related code allows a household to reproduce. For efficiency reasons, we pass the set of potential farms as a parameter. The NetLogo built in hatch operator allows an agent to create other agents. We also use the NetLogo built in map operator twice – once to generate a list of gift corn, with each entry being one third of the parent household’s list; and once to subtract the entries in the gift-corn list from the corresponding parent household’s list (as a general rule, if a list is being transformed to another list of the same length, the map operator usually provides the most elegant NetLogo solution).

to do-reproduce [ farms ]
  let gift-corn (map [ ? * corn-gift-ratio ] corn-storage)
  set corn-storage (map [ ?1 - ?2 ] corn-storage gift-corn)
  
  hatch 1 [ 
    initialise-household 0
    set corn-storage gift-corn
    find-best-farm farms
  ]
end

We can now provide the code for a simulation step, although annual-household-activities and plot-interesting-data still need to be defined. There are two ask loops. In the first loop, agents set off in search of a new farm if they predict that their corn stocks (plus their expected future harvest) will not meet their annual food requirements (plus a 10% margin). In the second loop, households of fertile age reproduce with a probability given by the fertility slider. A local variable stores the result of the potential-farms calculation, to avoid recomputing it unnecessarily.

to go
  calculate-patch-yields
  annual-household-activities
    
  ask households [  ;; agents with insufficient food for next year try to find a new farm
    if (food-estimate < 1.1 * food-requirement) [
      ask patch-here [ set being-farmed? false ]
      find-best-farm potential-farms
    ]
  ]
  
  let farms potential-farms  ;; avoid recomputing this unnecessarily
  ask households [
    if (age >= fertility-age and age <= fertility-end-age and random-float 1 < fertility and count farms > 0) [ 
      do-reproduce farms
      set farms potential-farms
    ]
  ]
  
  plot-interesting-data
  if (year = 1350 or count households = 0) [ stop ]
  set year year + 1
  tick
end

to-report food-estimate  ;; estimate the amount of food available for next year, based on current stocks of corn, and an estimate of the future harvest
  let future-estimate harvest  ;; predict next year's harvest to be the same as this year's harvest
  report (future-estimate + sum (but-first corn-storage))  ;; ignore the corn on the front of the list, which is now too old to eat
end

The annual-household-activities procedure increments the age of households, has them eat their annual food quota, and kills them off if they are too old or have insufficient food. Points to note include:

We set up a global variable harvest-list to be a list of this year’s harvests (this will later be plotted as a histogram).
We supplement the spatial variability in farm quality with a temporal variability in annual harvests, re-using our apply-variability reporter.
We maintain the household’s 3-element list of corn by dropping the oldest corn at the front, and adding the new harvest to the back.
We introduce a household attribute unsatisfied-hunger to process the food that the household needs.
We use a NetLogo trick to eat corn starting from the front of the list, using the eat-corn reporter (below).

to annual-household-activities
  set harvest-list [ ]  ;; a list of this year's harvests (1)
  ask households [
    set age age + 1
    if (age > death-age) [
      ask patch-here [ set being-farmed? false ]
      die
    ]

    set harvest (apply-variability harvest-variability) * ([ base-yield ] of patch-here)  ;; this year's harvest (2)
    set harvest-list (fput harvest harvest-list)  ;; (1)
    
    set corn-storage (lput harvest (but-first corn-storage))  ;; oldest corn is at the front of the list (3)
    set unsatisfied-hunger food-requirement  ;; a household attribute (4)
    set corn-storage (map [ eat-corn ? ] corn-storage)  ;; eat corn starting from the front of the list (5)

    if (unsatisfied-hunger > 0) [
      ask patch-here [ set being-farmed? false ]
      die
    ]
  ]
end

to setup
  ...
  set harvest-list [ ]
  ...
end

In transforming a list to another of the same length, using map is usually a good idea. Here it processes the list from the front, applying eat-corn to each element. The unsatisfied-hunger attribute links together the different calls to eat-corn, so that they either (a) subtract the unsatisfied food requirement from a list element, or (b) consume all of a list element and update unsatisfied-hunger for the next iteration. The eat-corn reporter thus has two simple cases:

to-report eat-corn [ existing-amount ]
  ifelse (existing-amount >= unsatisfied-hunger)
    [ let remaining-amount (existing-amount - unsatisfied-hunger)
      set unsatisfied-hunger 0
      report remaining-amount ]
    [ set unsatisfied-hunger (unsatisfied-hunger - existing-amount)
      report 0 ]
end

A diagram showing three example calls to eat-corn may make this clearer. It can be seen that the oldest corn at the front of the list is being eaten first:

The following code handles plotting and statistics. We use plotxy so that our x-coordinates can start at 800. We plot histograms of household ages and our previously calculated harvest-list (we don’t use the obvious [ harvest ] of households because that would exclude households that starved to death, and would also include new daughter households that have no harvest yet). Unfortunately, histograms in NetLogo require explicit control of the x axis, which adds some complexity. We also maintain a list of population levels in population-sequence, and calculate how well this fits the historical sequence (excluding the proportion of the historical sequence where the population is zero). We measure fitness using the root-mean-square of differences between the two sequences (the aquare root of the mean of the squares of the differences). This is equivalent to using what the literature calls the L² norm (which penalises large errors), but is expressed in units of numbers of households, which is easier to understand. Values of the root-mean-square error tend to be around 30, which means that “typically” the model differs from the archaeological reality by about 30 households.

to plot-interesting-data
  set-current-plot "Population (households)"
  set-current-plot-pen "Potential"
  plotxy year (count patches with [ mean historical-base-yield >= food-requirement ])
  set-current-plot-pen "Historical"
  plotxy year (item (year - 800) historical-sequence)  ;; index list starting at 0
  set-current-plot-pen "Model"
  plotxy year (count households)
  
  let age-list ([ age ] of households)
  
  set-current-plot "Household ages"
  let xmax (1 + max (fput 0 age-list))
  set-histogram-num-bars xmax
  if (plot-x-max < xmax) [ set-plot-x-range 0 xmax ]  ;; histograms require explicit x-max control
  histogram age-list
  
  set-current-plot "Household harvests"
  let block-size 100
  set xmax (1 + ceiling (max (fput 0 harvest-list) / block-size))
  if (plot-x-max < xmax * block-size) [ set-plot-x-range 0 (xmax * block-size) ]
  set-histogram-num-bars (plot-x-max / block-size)
  histogram harvest-list

  if (length population-sequence < desired-fit-length) [ set population-sequence (lput (count households) population-sequence) ]
  let cut-historical-sequence (n-values (length population-sequence) [ item ? historical-sequence ])
  set fit-badness (sqrt (mean (map [ (?1 - ?2) * (?1 - ?2) ] population-sequence cut-historical-sequence)))
end

to setup
  ...
  set population-sequence [ ]
  set desired-fit-length (length filter [ ? > 0 ] historical-sequence)
  ...
end

A typical plot is shown below (simulated population levels are in red). The root-mean-square error here is 25.963, which is quite good. Our decision to introduce the usable-farm-fraction slider is partially responsible for this, since it limits excessively high populations without unduly restricting low populations.

The entire setup procedure has now been described. For completeness, here it is in its entirety, together with the declarations at the top of the program. The full NetLogo program is at Modeling Commons.

extensions [ gis table ]

breed [ households household ]

globals [ start-year fertility-age fertility-end-age death-age food-requirement
          food-estimate-margin corn-gift-ratio corn-stock-years good-farm-bias
          year pdsi-table population-sequence historical-sequence harvest-list fit-badness desired-fit-length ]

patches-own [ value zone patch-quality base-yield historical-base-yield being-farmed? ]

households-own [ harvest age corn-storage unsatisfied-hunger ]

to setup
  clear-all  ;; as previously described
  set-default-shape households "person"
  gis-map-load "adata/Map.asc"
  set pdsi-table load-zone-file "adata/ZoneAdjPDSI.txt"
  ask patches [ initialise-patch ]

 set start-year           800         ;; initial year
  set food-requirement     (5 * 160)  ;; a person needs 160 kg of corn each year, while a typical household consist of 5 persons
  set fertility-age        17         ;; the minimum age of agents (households) that can reproduce (spin off daughter households)
  set fertility-end-age    36         ;; the maximum age of agents (households) that can reproduce
  set death-age            38         ;; the age after which agents (households) die (dissolve when the matriarch dies)
  set corn-gift-ratio      0.33       ;; each new household gets this fraction of the corn storage of the parent household
  set corn-stock-years     2          ;; corn can be stored for this many years
  set year start-year
  set population-sequence [ ]
  set harvest-list [ ]
  
  set historical-sequence
    [ 14 14 14 14 14 14 14 14 14 14 13 13 13 12 12 12 12 12 12 12 11 11 11 11 11 10 10 10 10 10 9 9 9 9 9 9 9 9 8 8 7 7 7 7 7 7 7 7 7 7
      28 29 29 29 29 29 29 29 29 29 29 30 30 31 31 31 31 32 32 32 32 33 33 33 33 33 37 37 37 37 37 39 39 39 40 40 40 40 41 41 41 42 42
      42 42 42 42 42 42 42 56 58 58 58 58 58 58 58 58 58 58 60 60 60 60 60 60 61 61 61 60 61 61 61 61 59 61 61 61 61 61 62 62 62 62 62
      62 62 62 62 61 63 63 63 63 63 63 63 63 63 66 67 67 67 67 67 67 67 67 67 66 67 67 67 67 67 67 66 66 66 66 69 69 69 69 65 68 68 68
      68 66 67 67 67 66 66 66 66 66 66 66 68 68 68 68 68 68 68 68 68 87 87 87 87 87 87 87 86 86 87 85 86 86 87 87 87 87 88 88 88 87 87
      87 87 87 88 90 90 90 90 89 91 92 92 95 95 95 95 97 97 94 94 94 94 95 95 95 95 95 95 134 139 139 139 139 139 139 142 142 142 142
      143 143 146 146 146 146 151 151 153 151 151 151 151 151 156 164 164 164 164 163 163 163 163 165 165 165 165 164 164 163 164 164
      164 161 162 162 162 162 162 161 164 164 165 165 166 166 166 167 166 166 166 166 159 159 160 160 158 159 160 160 161 162 162 162
      148 150 151 151 151 151 151 149 149 147 148 148 148 148 150 150 150 151 151 152 152 152 153 153 153 116 117 118 118 119 119 120
      121 122 126 124 124 124 126 127 127 129 131 131 133 132 134 134 136 137 133 138 138 139 139 138 140 140 142 142 142 143 143 144
      145 145 147 146 147 147 149 149 150 151 151 145 146 146 147 148 149 149 151 153 154 155 157 159 160 161 163 164 166 167 167 169
      169 171 171 173 170 173 176 176 178 178 180 182 184 184 185 188 189 190 192 191 193 192 194 194 194 195 196 199 200 192 193 194
      196 196 197 200 202 202 204 201 209 208 211 212 212 213 214 215 216 210 208 206 201 196 188 181 176 172 167 159 156 148 146 141
      120 118 114 106 103 95 90 88 83 74 70 68 60 58 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 ]
  
  make-region-label 45 98  "Non-arable uplands" black
  make-region-label 28 47  "General valley" brown
  make-region-label 23 73  "North valley" red
  make-region-label 50 68  "Mid valley" gray
  make-region-label 25 84  "Dunes" green
  make-region-label 20 102 "Kinbiko Canyon" pink
  make-region-label 39 11  "Arable uplands" orange
 
  set desired-fit-length (length filter [ ? > 0 ] historical-sequence)

  calculate-patch-yields
  create-households 14 [
    initialise-household (random 29)
    set corn-storage (n-values (corn-stock-years + 1) [ 600 + random-float 400 ])
    find-best-farm potential-farms
  ]

  reset-ticks
  tick-advance start-year
end

The interface for the model looks like this:

Most of the parameters for this model have been given plausible values determined by the literature. The fertility and usable-farm-fraction sliders, however, are adjusted to maximise the quality of fit between the model and archaeological reality, and we need to take a closer look at these. The diagram below shows root-mean-square errors for various parameter combinations, averaged over ten runs each. A good value for usable-farm-fraction is 0.21. A fertility value of 0.9 generally works well, since it gives a realistically slow population rise, but occasionally it can lead to the population dying out. A value of 0.1 performs almost as well, and is somewhat safer. It is also not far off the value of 0.125 suggested by Axtell et al.

The graph below shows 100 runs of the model (red), compared to the historical data (blue). It can be seen that, in one of the runs, the population dies out early. The other 99 runs follow the historical record reasonably well, however.

This model begs for extension, of course. Not just by restoring the water availability and dwelling location factors I have excluded – the quality of human decision-making could also be improved. The difficulty is that land-use models generally rely on surveying and interviewing farmers in order to reveal their decision processes. The Anasazi are no longer available for such investigation. However, even the simplified model we have presented here has succeeded in replicating the population variation in Long House valley over time. The exception, as Axtell et al. point out, is the final 50-year period. Archaeology shows that the valley was deserted around 1300, but in fact a reasonable population could have survived. Some other factors must have been in operation to cause the mass exodus that occurred. Plausible guesses about those factors could also be included in the model.

The Queen’s Birthday honours list for Australia

Posted on June 9, 2014 by Tony

The Queen’s Birthday honours list for Australia is out. Appointments of scientists to the Order of Australia include (among others):

Dr Megan Clark, AC – “For eminent service to scientific research and development through fostering innovation, to science administration through strategic leadership roles, and to the development of public policy for technological sciences.”
Professor Marc Feldmann, AC, FAA – “For eminent service to medicine and to public health as an acclaimed researcher in the field of chronic immune disease, and through the development of innovative treatment therapies.”
Professor Richard Gibbs, AC – “For eminent service to science and academic medicine as a leading researcher, author and scholar, particularly in the field of genetics and human genome sequencing, and as a mentor of emerging scientists.”
Dr Ian Allison, AO, AAM – “For distinguished service to the environment as a glaciologist, to furthering international understanding of the science of the Antarctic region, and to climate research.”
Professor Nicholas Hoogenraad, AO – “For distinguished service to science education and technological development, particularly in the fields of biochemistry and molecular biology.”
Professor Philip Lake, AO – “For distinguished service to conservation and the environment as an ecologist and freshwater scientist, and to research and professional organisations.”
Professor Barry Ninham, AO – “For distinguished service to physical sciences through landmark theoretical and practical advances in colloids and surfaces, and as an academic, educator and mentor.”
Professor Ian Ritchie, AO – “For distinguished service to science in the field of chemistry and hydrometallurgy, as an academic and educator, and to fostering technical innovation in business and industry.”

Revisiting Artificial Anasazi: a NetLogo tutorial (part 1)

Posted on June 7, 2014 by Tony

I have previously uploaded some NetLogo tutorials, and for the next one I thought I would revisit the famous “Artificial Anasazi” model of Axtell et al. (2002). This model was ported to NetLogo by Marco Janssen in 2008 (see his 2009 JASSS paper and his model). I have rewritten the model from scratch for explanatory purposes (though very much inspired by Janssen’s work). It makes a good example of how NetLogo can assist land-use studies, both archaeological and present-day.

The model studies the population of Anasazi farmers living in Long house Valley (near Black Mesa in Arizona) between 800 and 1350 AD. The map below shows part of the valley, through which Highway 160 now runs (there are more maps here):

Much has been preserved by the dry Arizona climate. Fragments of pottery (like the bowl at the top of the page) are found routinely, and these well-preserved ruined Anasazi dwellings below are not far away. Extensive archaeological investigation has told us both the climate that existed between 800 and 1350 AD, and the population of the valley during that time.

For purposes of simplicity, my version of the model does not include the availability of drinking water and the locations of dwellings. I am only modelling the location of farms. Consequently, although I hope to accurately model population numbers, I do not hope to accurately model the locations at which people lived, as Axtell et al. did. In addition, this first half of the tutorial will only look at code for the patches in the model.

The model requires three data files in a folder called “adata” (zipped version here). One of these is a map of the valley, formatted as an ascii GIS raster file, which can be read by the NetLogo GIS extension. Such use of GIS datasets is common in land-use models. The following command will read in the raster file, and set a numerical value attribute of each patch.

to gis-map-load [ fname ]
  ifelse (file-exists? fname)
    [ let dataset gis:load-dataset fname
      gis:set-world-envelope (gis:envelope-of dataset)
      gis:apply-raster dataset value
    ]
    [ file-error fname ]
end

to file-error [ fname ]
  user-message (word "Cannot find the file \"" fname "\"")
end

These numerical value attributes encode different regions of the valley. The following two reporters (functions) transform numerical codes to names (which will be stored in a zone attribute), and transform names to colours (which will be stored in the standard patch colour attribute):

to-report patch-zone-name [ i ]
  ifelse (i = 0) [ report "General" ] [      ;; General valley floor
  ifelse (i = 10) [ report "North" ] [       ;; North valley floor
  ifelse (i = 15) [ report "North Dunes" ] [ ;; North valley dunes
  ifelse (i = 20) [ report "Mid" ] [         ;; Mid valley floor
  ifelse (i = 25) [ report "Mid Dunes" ] [   ;; Mid valley dunes
  ifelse (i = 30) [ report "Natural" ] [     ;; Natural (non-arable)
  ifelse (i = 40) [ report "Uplands" ] [     ;; Uplands (arable)
  ifelse (i = 50) [ report "Kinbiko" ] [     ;; Kinbiko Canyon
  ifelse (i = 60) [ report "Empty" ] [ 
  report "?" ] ] ] ] ] ] ] ] ]               ;; there should be no "?" patches
end

to-report patch-zone-color [ z ]
  ifelse (z = "General") [ report brown ] [
  ifelse (z = "North") [ report red ] [
  ifelse (z = "Mid") [ report gray ] [
  ifelse (z = "Natural") [ report yellow ] [
  ifelse (z = "Uplands") [ report orange ] [
  ifelse (z = "Kinbiko") [ report pink ] [ 
  ifelse (z = "North Dunes") [ report green ] [ 
  ifelse (z = "Mid Dunes") [ report green ] [ 
  ifelse (z = "Empty") [ report white ] [ 
  report magenta ] ] ] ] ] ] ] ] ]  ;; there should be no magenta patches
end

I have used a formatting style here which is suitable for highly nested ifelse structures in NetLogo (the NetLogo editor helps ensure the correct number of right brackets). The following utility code adds a label to a patch, and loading the GIS file followed by some labelling produces the map below:

to make-region-label [ x y txt clr ]
  ask patch x y [
    set plabel-color clr
    set plabel txt
  ]
end

to setup  ;; this code will be updated later in the tutorial
  clear-all
  gis-map-load "adata/Map.asc"
  ask patches [
    set zone (patch-zone-name value)
    if (zone = "?") [ user-message (word "Error in patch data load for " self) ]
    set pcolor (patch-zone-color zone)
  ]
  
  make-region-label 45 98  "Non-arable uplands" black
  make-region-label 28 47  "General valley" brown
  make-region-label 23 73  "North valley" red
  make-region-label 50 68  "Mid valley" gray
  make-region-label 25 84  "Dunes" green
  make-region-label 20 102 "Kinbiko Canyon" pink
  make-region-label 39 11  "Arable uplands" orange
  ...
end

Here is the colour-coded NetLogo map of the valley:

The colour-coding is important, because different parts of the valley have had different fertility levels over time. Fortunately, we have a good idea what those levels were. One of the data files contains estimates of the Palmer Drought Severity Index (PDSI) for each patch, over the 800–1350 period of the simulation. This file is formatted as a sequence of lists, one for each different zone of the valley, with the first element of each list being the zone name, and the rest of the list being PDSI numbers (it’s handy that NetLogo can read in an entire list in one gulp). The following code transforms the data file into a table called pdsi-table (obviously this also requires the NetLogo “table” extension).

to-report load-zone-file [ fname ]
  let tbl table:make
  ifelse (file-exists? fname)
    [ file-open fname
      while [ not file-at-end? ] [
        let lst file-read
        table:put tbl (item 0 lst) (but-first lst)  ;; split first item (zone name) from rest of list
      ]
      file-close
    ]
    [ file-error fname ]
  report tbl
end

to setup
  ...
  set pdsi-table load-zone-file "adata/ZoneAdjPDSI.txt"
  set start-year 800
  set year start-year
  ...
end

The data file used here was extracted from that used in the original model. Having read it, the following simple reporter (function) can then look up the PDSI value for a particular zone and year:

to-report get-pdsi [ z y ]
  ifelse (table:has-key? pdsi-table z)
    [ report item (y - start-year) (table:get pdsi-table z) ]
    [ report 0 ]
end

For each zone, the PDSI determines an estimated crop yield for the farmers. Although the Anasazi grew a range of crops (corn, beans, and squash – see coin above), the model expresses crop yields as if only corn was grown. In the original model, the relationship between PDSI and crop yield is expressed as a step function, but I have smoothed this out using a cubic polynomial interpolation (the red curve below, rather than the dashed step function):

The following code calculates the crop yields for each patch (in kilograms per hectare), using these cubic interpolations:

to-report thresholded-cubic [ x a b c d lo hi ]
  let res (x * x * x * a + x * x * b + x * c + d)
  if (res < lo) [ set res lo ]
  if (res > hi) [ set res hi]
  report res
end

to calculate-patch-yields  ;; calculate the crop yield for each patch based on the PDSI data, using a cubic interpolation of published numbers
  ask patches [
    let pdsi get-pdsi zone year
    let theoretical-yield 0

    if (zone = "North" or zone = "Kinbiko" or (zone = "Mid" and pxcor <= 34)) [
      set theoretical-yield (thresholded-cubic pdsi 4.4167 6.9456 49.583 823.48 617 1153)
    ]
    if (zone = "General" or (zone = "Mid" and pxcor > 34)) [
      set theoretical-yield (thresholded-cubic pdsi 3.65 5.7925 41.65 686.28 514 961)
    ]
    if (zone = "Uplands") [
      set theoretical-yield (thresholded-cubic pdsi 2.9333 4.6599 33.267 548.77 411 769)
    ]
    if (zone = "North Dunes" or zone = "Mid Dunes") [
      set theoretical-yield (thresholded-cubic pdsi 4.5833 7.1871 51.917 858.03 642 1201)
    ]

    set base-yield (theoretical-yield * ...)  ;; missing code to be supplied later
    ...
  ]
end

The base-yield attribute of each patch is based on the theoretical yield calculated in this way, but it is modified in two ways, using the lower two of the following sliders:

First, we introduce some spatial variability in crop yields, using a normally distributed patch-quality attribute:

to-report apply-variability [ var ]
  let res random-normal 1 var  ;; random variation around a mean of 1
  if (res < 0) [ set res 0 ]   ;; because random-normal can return negative values
  report res
end

to setup
  ...
  ask patches [
    set patch-quality (apply-variability crop-yield-variability)
  ]
  ...
end

Because the normal distribution (the “bell curve”) has infinitely long tails, the calls to random-normal can sometimes return negative or very large crop yield values. Excessively large values have little effect on the simulation, because the model incorporates a rule that stored corn more than two years old goes bad and is thrown away. However, negative values must be explicitly tested for (negative crop yields are likely to give some rather strange behaviour!).

The second adjustment is that the entire crop yield is discounted using the harvest-adjustment slider:

to calculate-patch-yields
  ask patches [
    ...
    set base-yield (theoretical-yield * patch-quality * harvest-adjustment) 
    ...
  ]
end

In the models of Axtell et al. and Janssen, low values of harvest-adjustment are used to avoid unrealistically high population levels. The histogram below shows a typical run, for the year 1200 AD (a good year) on the General valley floor. Here, the theoretical yield of 961 is discounted to a mean of 560 kg (dashed line), well below the 800 kg (dotted line) required to support a typical five-person household for a year:

The effect of the harvest-adjustment is thus to restrict the population of the General valley floor to 53 good farms (8.3% of the patches), but also to impose a rather unrealistic triangular crop-yield distribution on the patches being farmed (i.e. the right-hand tail of the “bell curve”). It seems to me better to explicitly provide a usable-farm-fraction slider that restricts farming to a subset of patches. This permits a more realistic harvest-adjustment of 0.9, corresponding to “wastage” of only 10% of the crop.

It turns out that 0.21 is a good value for usable-farm-fraction. But why would 79% of the patches be unavailable for farming? One reason is the factors I have excluded in this tutorial version of the model – water supply and housing. Some potential farms are just too far away from suitable housing sites. The Anasazi penchant for cliff dwellings also suggests that potential housing sites had to be defensible, and this also rules out some potential farms as being too far away from suitable housing. Additionally, some patches would be ruled out by various terrain factors, and there would also be social pressures that limited the population level. Furthermore, the patches in the model are a little less than a hectare in size, and so one-hectare farms will occupy a little more than one patch. The usable-farm-fraction slider compensates for all of these effects.

We package the various patch initialisation activities together as follows. This includes using usable-farm-fraction to select a set of patches which will have non-zero patch-quality (as usual, we obtain a certain probability by testing for random-float 1 being less than that probability). The code also includes a historical-base-yield attribute, which will hold a list of the three most recent base yields for a patch:

to initialise-patch
  set zone (patch-zone-name value)  ;; as described above
  if (zone = "?") [ user-message (word "Error in patch data load for " self) ]
  set pcolor (patch-zone-color zone)
  set being-farmed? false
  set historical-base-yield [ ]  ;; this will be expanded to a list of the last 3 yields
  
  ifelse (random-float 1 < usable-farm-fraction)
    [ set patch-quality (apply-variability crop-yield-variability) ]
    [ set patch-quality 0 ]
end

We call this initialisation in setup, and we update the historical-base-yield list in the calculate-patch-yields procedure. We can also average the list to smooth out temporal variation in crop yields, compensating for the corn-storage ability of the Anasazi. The code count patches with [ mean historical-base-yield >= 800 ] will count feasible farms, taking corn storage into account.

to setup
  ...
  ask patches [
    initialise-patch
  ]
  ...
end

to calculate-patch-yields
  ask patches [
    ...  ;; calculate theoretical-yield as described above
    set base-yield (theoretical-yield * patch-quality * harvest-adjustment)
    set historical-base-yield (fput base-yield historical-base-yield)  ;; put new base-yield on the front of the list
    if (length historical-base-yield > 3) [ set historical-base-yield (but-last historical-base-yield) ]  ;; truncate to 3 entries, if necessary
  ]
end

The main additional code in setup is creation of a list of historical population levels:

to setup
  clear-all  ;; as previously described
  gis-map-load "adata/Map.asc"
  set pdsi-table load-zone-file "adata/ZoneAdjPDSI.txt"
  ask patches [ initialise-patch ]
  set start-year 800
  set year start-year

  set historical-sequence
    [ 14 14 14 14 14 14 14 14 14 14 13 13 13 12 12 12 12 12 12 12 11 11 11 11 11 10 10 10 10 10 9 9 9 9 9 9 9 9 8 8 7 7 7 7 7 7 7 7 7 7
      28 29 29 29 29 29 29 29 29 29 29 30 30 31 31 31 31 32 32 32 32 33 33 33 33 33 37 37 37 37 37 39 39 39 40 40 40 40 41 41 41 42 42
      42 42 42 42 42 42 42 56 58 58 58 58 58 58 58 58 58 58 60 60 60 60 60 60 61 61 61 60 61 61 61 61 59 61 61 61 61 61 62 62 62 62 62
      62 62 62 62 61 63 63 63 63 63 63 63 63 63 66 67 67 67 67 67 67 67 67 67 66 67 67 67 67 67 67 66 66 66 66 69 69 69 69 65 68 68 68
      68 66 67 67 67 66 66 66 66 66 66 66 68 68 68 68 68 68 68 68 68 87 87 87 87 87 87 87 86 86 87 85 86 86 87 87 87 87 88 88 88 87 87
      87 87 87 88 90 90 90 90 89 91 92 92 95 95 95 95 97 97 94 94 94 94 95 95 95 95 95 95 134 139 139 139 139 139 139 142 142 142 142
      143 143 146 146 146 146 151 151 153 151 151 151 151 151 156 164 164 164 164 163 163 163 163 165 165 165 165 164 164 163 164 164
      164 161 162 162 162 162 162 161 164 164 165 165 166 166 166 167 166 166 166 166 159 159 160 160 158 159 160 160 161 162 162 162
      148 150 151 151 151 151 151 149 149 147 148 148 148 148 150 150 150 151 151 152 152 152 153 153 153 116 117 118 118 119 119 120
      121 122 126 124 124 124 126 127 127 129 131 131 133 132 134 134 136 137 133 138 138 139 139 138 140 140 142 142 142 143 143 144
      145 145 147 146 147 147 149 149 150 151 151 145 146 146 147 148 149 149 151 153 154 155 157 159 160 161 163 164 166 167 167 169
      169 171 171 173 170 173 176 176 178 178 180 182 184 184 185 188 189 190 192 191 193 192 194 194 194 195 196 199 200 192 193 194
      196 196 197 200 202 202 204 201 209 208 211 212 212 213 214 215 216 210 208 206 201 196 188 181 176 172 167 159 156 148 146 141
      120 118 114 106 103 95 90 88 83 74 70 68 60 58 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 ]
  
  make-region-label 45 98  "Non-arable uplands" black  ;; as previously described
  make-region-label 28 47  "General valley" brown
  make-region-label 23 73  "North valley" red
  make-region-label 50 68  "Mid valley" gray
  make-region-label 25 84  "Dunes" green
  make-region-label 20 102 "Kinbiko Canyon" pink
  make-region-label 39 11  "Arable uplands" orange
 
  reset-ticks
  tick-advance start-year  ;; start counting from the year 800
end

The code for a step just plots crop yields (smoothed, using the historical-base-yield list) and calculates new crop yields:

to go  ;; code for one step
  calculate-patch-yields
  plot-interesting-data
  if (year = 1350) [ stop ]
  set year year + 1
  tick
end

to plot-interesting-data  ;; plotting code
  set-current-plot "Population (households)"
  set-current-plot-pen "Potential"
  plotxy year (count patches with [ mean historical-base-yield >= 800 ])
  set-current-plot-pen "Historical"
  plotxy year (item (year - 800) historical-sequence)  ;; index list starting at 0
end

Plotting (see graph below) shows a good fit between predicted carrying capacity of the valley and actual population levels. Even better results are obtained when actual households are modelled, as we will see in part 2 of this tutorial.

The keys to defining this NetLogo model, so far, have been loading of a GIS map and various archaeologically established datasets, together number-crunching to calculate crop yields. When households are modelled, however, some code for human decision-making will also be needed. The model also illustrates how much functionality can be achieved with even a small NetLogo program!

See part 2 of this tutorial for the human side of the model. The full NetLogo program is at Modeling Commons.

Caenorhabditis elegans: a model organism

Posted on June 4, 2014 by Tony

Caenorhabditis elegans (photo above by Bob Goldstein, diagram below by “KDS444”) is a transparent nematode worm, about 1 mm in length. It lives naturally in the soil, where it eats bacteria, but it is also quite happy to make its home in a Petri dish. A 1963 suggestion by Sydney Brenner led to C. elegans becoming the focal point of a vast collaborative effort to understand the worm in detail. Brenner shared the 2002 Nobel Prize in Physiology or Medicine for this work.

The cellular development of C. elegans has been mapped in detail, and its genome had been largely mapped by 1998. The diagram below shows the neural network of the worm, drawn using R, based on data from here (from this paper via this one). In this diagram, colour shows the centrality of neurons in the network. Other information on C. elegans is available at wormbase.org.

Because of the effort that has gone into understanding this humble worm as whole, rather than as just parts, a great deal has been learned about biology in general. Brenner was on to a good thing!

Three lovely scanning electron microscope (SEM) images

Posted on June 2, 2014 by Tony

A wonderful image of pollen from sunflower, morning glory, hollyhock, lily, primrose, and castor bean plants (Dartmouth Electron Microscope Facility 2011, colourised by William Crochot).

Image of a strawberry by Annie Cavanagh and David McCarthy of the School of Pharmacy, University of London. This beautifully detailed image was created in 2010 by stitching several different SEM images together. I have mentioned this image before. It comes via via Wellcome Images, and more of the story is here.

A human human T-lymphocyte (white blood cell), colourised blue (NIAID 2010).

Scientific Gems

Facts, ideas, and images from the shoreline of science.

Monthly Archives: June 2014

Thinking about complexity

Simulating the World Solar Challenge

Eight unsolved mathematical problems

Visual modalities and data visualisation

A weak BICEP2?

Revisiting Artificial Anasazi: a NetLogo tutorial (part 2)

The Queen’s Birthday honours list for Australia

Revisiting Artificial Anasazi: a NetLogo tutorial (part 1)

Caenorhabditis elegans: a model organism

Three lovely scanning electron microscope (SEM) images

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: