Personality and Gender

The so-called “Big Five” personality traits are often misunderstood. They all have catchy names, expressed by the acronym CANOE (or OCEAN), but in fact all they are is a summary of answers to certain kinds of personality questions:

  • Conscientiousness: I pay attention to details; I follow a schedule; …
  • Agreeableness: I am interested in people; I feel the emotions of others; …
  • Neuroticism: I get upset easily; I worry about things; …
  • Openness to experience: I am full of ideas; I am interested in abstractions; …
  • Extraversion: I am the life of the party; I start conversations; … (this last one is also measured by the MBTI test)

These tests work in multiple cultures. In this article, I am using data from the Dutch version of the test, the “Vijf PersoonlijkheidsFactoren Test” developed by Elshout and Akkerman. Specifically, I am using data from 8,954 psychology freshmen at the University of Amsterdam during 1982–2007 (Smits, I.A.M., Dolan, C.V., Vorst, H.C., Wicherts, J.M. and Timmerman, M.E., 2013. Data from ‘Cohort Differences in Big Five Personality Factors Over a Period of 25 Years’. Journal of Open Psychology Data, 1(1), p.e2). In my analysis, I have compensated for missing data and for the fact that the sample was 69% female.

The Dutch test consists of 70 items, in 5 groups of 14. The following tree diagram (click to zoom) is the result of UPGMA hierarchical clustering on pairwise correlations between all 70 items. It can be seen that they naturally cluster into 5 groups corresponding almost perfectly to the “Big Five” personality traits – the exception being item A11, which fits extraversion slightly better (r = 0.420) than its own cluster of agreeableness (r = 0.406). This lends support to the idea that the test is measuring five independent things, and that these five things are real.

On tests like this, women consistently score, on average, a little higher than men in conscientiousness, agreeableness, neuroticism, and extraversion (and in this dataset, on average, a little lower in openness to experience). Mean values for conscientiousness in this dataset (on a scale of 14 to 98) were 60.3 for women and 56.1 for men (a difference of 4.2). For agreeableness, they were 70.6 for women and 67.6 for men (a difference of 3.0). There are also small age effects for conscientiousness, agreeableness, and openness to experience (over the 18–25 age range), which I have ignored.

The chart below (click to zoom) shows distributions of conscientiousness and agreeableness among men and women, and the relative frequency of different score ranges (compensating for the fact that the sample was 69% female). Thus, based on this data, a random sample of people with both scores in the range 81 to 90 would be 74% female. With both scores in the range 41 to 50, the sample would be 72% male. This reflects a simple mathematical truth – small differences in group means can produce substantial differences at the tails of the distribution.

Belief in God in the US

In another fascinating example of social statistics, Pew have just released a survey of US beliefs about God. The study included multiple questions about the nature and attributes of God, but my mosaic plot below only looks at the first one. The composition of each column is based on the recent survey, while the width of each column is based on religious composition data from a 2014 study by Pew.

In dark blue, 62% of the US believes in God “as described in the Bible.” A further 30% (in light blue) believes in some other god or higher power (or would not describe their belief in God in more detail). In red, 7% believe in no God at all, and in grey, 1% gave no response.

Columns correspond to denominations: Evangelical Protestant, Mainline Protestant, Historically Black Protestant (HBP), Catholic, Other Christian (OC), Jewish (J), Other Religion (Oth), “Nothing in Particular,” Agnostic (Ag), and Atheist (Ath). Numbers in the “OC” and “Oth” categories were not directly provided by Pew, and were estimated using totals provided (these two columns should therefore be taken with a grain of salt).

Among Christians, 92% of Historically Black Protestants and 91% of Evangelical Protestants believe in God “as described in the Bible,” but only 72% of Mainline Protestants and 69% of Catholics do. What’s more, 1% of Mainline Protestants, 2% of Catholics, and 10% of Jews say that they believe in no God at all (i.e. they adhere to their religion only culturally, and are actually atheists).

On the other hand, 90% of those who describe their religion as “nothing in particular” believe in some kind of God or higher power. So do 67% of agnostics and 18% of atheists (clearly, many who claim to be “nothing in particular” are in fact Christians of some form, and many who claim to be atheists are in fact not).

Part of the explanation for this presumably lies in the fact that religion is in flux for many people in the US. Christians switch between the four main groups, some Christians lose their faith, while other people gain faith in Christianity or in another religion. Religious reality is more complex than a handful of numbers might suggest.

Religious knowledge in the United States

Part of the US religious landscape. Clockwise from top left: Evangelical Protestant, Mainline Protestant, Jewish, Catholic, Other Christian, Other

Readers of this blog know that I really love social statistics, and among the masters of that field are the people at Pew Forum. Back in 2010, they ran an interesting survey of religious knowledge. A simple 15-question version of the survey can be found online [if you want to try it, do so now, since this post has spoilers]. A total of 3,412 adults were interviewed (in English and Spanish). The focus of the survey was on the religious knowledge of different religious groups in the United States:

I was a little frustrated with the survey, since it mixed religion, history, and politics, with questions at quite different levels – ranging from “Where was Jesus born?” (multiple choice: Bethlehem, Jericho, Jerusalem, or Nazareth) to “What religion was Maimonides?” There was, however, an interesting subset of five easy questions about the Hebrew Bible (Old Testament), which Christians and Jews have in common, and I decided to do my own analysis of these questions:

  1. What is the first book of the Bible? (Genesis/Bereishit)
  2. Which of the following is NOT one of the Ten Commandments? (Do unto others as you would have them do unto you)
  3. Which Bible figure is most closely associated with remaining obedient to God despite suffering? (Job)
  4. Which Bible figure is most closely associated with leading the exodus from Egypt? (Moses)
  5. Which Bible figure is most closely associated with willingness to sacrifice his son for God? (Abraham)

Since these questions are closely related and of similar difficulty, it makes sense to add them together. Notice also that Pew’s interviewers were instructed to accept both English and Hebrew answers to (a). The last four questions were multiple-choice, with “Do not commit adultery,” “Do not steal,” and “Keep the Sabbath holy” the other options for (b), and with Job, Elijah, Moses, and Abraham the options for (c) to (e). I would expect a bright child in Sunday School to get 5 out of 5 on these questions, and just guessing should average around 1 out of 5.

Which Bible figure is most closely associated with leading the exodus from Egypt?

Answers to these questions in fact depended quite substantially on education level, and this complicates analysis, because average education level in the US itself varies between religious groups. I coded education numerically as follows:

  • Level 0: No High School (grades 1 to 8)
  • Level 1: Partial High School (grades 9 to 11)
  • Level 2: High School graduate
  • Level 3: High School Plus: technical, trade, vocational, or college education after High School, but less than a 4-year college degree
  • Level 4: College (university) graduate with 4-year degree
  • Level 5: Post-graduate training

The chart below shows the 11 religious groups I looked at, and their average (mean) education level. Note that Jews are the best-educated (presumably for cultural reasons), followed by Atheists/Agnostics (possibly because many people in the US become Atheists/Agnostics while at university). The lowest average education levels were for Other Protestants (which includes Black Protestants) and for Hispanic Catholics. Each coloured bar has an “error range,” which is the 95% confidence interval (calculated using bootstrapping). Religious groups with overlapping error ranges can’t really be distinguished statistically:

I “chunked” these education levels into two groups: less-educated (0 to 2, everything up to a High School diploma) and more-educated (3 to 5, everything beyond a High School diploma, be it trade school or a PhD). The chart below shows the average number of correct answers for the five questions, by religious group / education group combination. Each religious group has two coloured bars, the first (marked with +) being for the more-educated subgroup, and the second for the less-educated subgroup:

The more-educated group gets more questions right (on average, 3.4 compared to 2.3), and within both education groups, there is a similar ordering of religious groups:

  • Mormons do best (4.5 or 3.4 questions right, depending on education subgroup).
  • White Evangelical Protestants come next (4.0 or 3.1). Both Mormons and Evangelicals put great weight on Bible study, so this makes sense.
  • Then comes a group of three with similar results: Jews, Other Protestants (including Black Protestants), and Atheists/Agnostics. Orthodox Jews put great weight on studying the Torah, but many Jews in the US are in fact fairly secular. More interesting is the high score for Atheists and Agnostics – they do seem to have some knowledge of the beliefs they are rejecting (Atheists and Agnostics also scored highest on the complete survey).
  • Then comes a group of five: Other Christians, Unknown/Other, White (non-Hispanic) Catholics, White Mainline Protestants, and Unaffiliated (“nothing in particular”). Notice that White Mainline Protestants (ABCUSA, UMC, ELCA, PCUSA, UCC, RCA, Episcopal, etc.) get about one question less right (3.1 or 2.1) than their Evangelical counterparts, reflecting less of an emphasis on the Bible in mainline denominations.
  • The lowest scores were for Hispanic Catholics (2.7 or 1.5 questions right, depending on education subgroup). Given that guessing gives an average score of 1, this suggests that many Hispanic Catholics in the US have a rather tenuous link to their faith (many of them appear to strengthen this connection by becoming Protestants).

Thus if the Hebrew Bible (Old Testament) is a religious meeting place, it is a meeting place between Mormons, Evangelical Protestants, Jews, and (ironically) Atheists and Agnostics.

It is also interesting to see what happens when we add two simple questions about the New Testament – “Where was Jesus born?” and “Tell me the names of the first four books of the New Testament of the Bible, that is the Four Gospels?” Not surprisingly, Jews now do worse, since the New Testament applies specifically to Christianity. Atheists and Agnostics also do a little worse – apparently they know a little less about the New Testament than about the Old. In spite of the interviews being conducted in English and Spanish, Hispanic Catholics continued to do poorly, with less-educated Jews and Hispanic Catholics providing the wrong answer to “Where was Jesus born?” more than half the time.

Has Wikipedia stabilised?

I have previously blogged about Wikipedia being in trouble. However, revisiting the English Wikipedia statistics page, I see that (at least in purely quantitative terms) the decline of Wikipedia may have halted.

The chart above shows the number of new articles per day on the English Wikipedia. During its first five years, this number grew exponentially, but switched to a linear decline in mid-2007. Recently that decline seems to have halted, with an average of 866 new articles per day over the past two and a half years (see here for examples of recent articles). The statistics on the number of active editors tell a similar story.

Whether Wikipedia’s quality problems have stabilised is another story, of course, but it looks like Wikipedia will not be vanishing any time soon.

Archaeology and Statistics

Statistics can be a useful tool in archaeology, as the 1996 book Statistics for Archaeologists: A Common Sense Approach by Robert Drennan points out. Quantifying Archaeology by Stephen Shennan is another book on the subject.

Elsewhere I have discussed the benefits of the R statistical toolkit. The image below uses R to plot some data from Drennan’s book. Specifically, it is a histogram of the lengths of stone scrapers found at two sites (from his Tables 1.9 and 1.10). It can be seen that there is no significant difference between the two archaeological sites involved (red vs blue), but a very clear difference between scrapers made from flint (light, mean length 42.9 mm) vs chert (dark, mean length 18.4 mm). The visual plot summarises the numbers better than the tables can, and R’s statistical tests for significance (which I used to confirm the visual impression) are critically important for testing hypotheses.

The R code for this plot is:

#Group means
m.pc <- mean(Scrapers$Length[Scrapers$Site == "Pine Ridge Cave" & Scrapers$Material == "Chert"]) <- mean(Scrapers$Length[Scrapers$Site == "Pine Ridge Cave" & Scrapers$Material == "Flint"])
m.wc <- mean(Scrapers$Length[Scrapers$Site == "Willow Flats" & Scrapers$Material == "Chert"]) <- mean(Scrapers$Length[Scrapers$Site == "Willow Flats" & Scrapers$Material == "Flint"])

#Histograms for each group
bks <- 2.5+5*(0:18)
h.pc <- hist(Scrapers$Length[Scrapers$Site == "Pine Ridge Cave" & Scrapers$Material == "Chert"], breaks=bks, plot=FALSE) <- hist(Scrapers$Length[Scrapers$Site == "Pine Ridge Cave" & Scrapers$Material == "Flint"], breaks=bks, plot=FALSE)
h.wc <- hist(Scrapers$Length[Scrapers$Site == "Willow Flats" & Scrapers$Material == "Chert"], breaks=bks, plot=FALSE) <- hist(Scrapers$Length[Scrapers$Site == "Willow Flats" & Scrapers$Material == "Flint"], breaks=bks, plot=FALSE)

#Matrix of histograms 
mat <- rbind (h.pc$counts,$counts, h.wc$counts,$counts)

#Plot matrix
legnd <- c(paste("Pine Ridge Cave, Chert (mean ", round(m.pc, digits=1), " mm)", sep=""),
           paste("Pine Ridge Cave, Flint (mean ", round(, digits=1), " mm)", sep=""),
           paste("Willow Flats, Chert (mean ", round(m.wc, digits=1), " mm)", sep=""),
           paste("Willow Flats, Flint (mean ", round(, digits=1), " mm)", sep=""))
barplot(mat, space=0, col=c("darkred", "pink", "navy", "skyblue"), legend.text=legnd, names.arg=5*(1:18), ylim=c(0,12),
        cex.names=0.7, ylab="Number of Scrapers", xlab="Scraper Length (mm)", cex.lab=1.3, args.legend=list(cex=0.8))

#Statistical tests
summary(lm(Scrapers$Length ~ Scrapers$Site + Scrapers$Material))

Whither Wikipedia?

Wikipedia is in trouble. The chart above shows why (data from here). During its first five years, the number of active editors on English Wikipedia (those with 5 or more monthly edits) grew exponentially (see the dashed curve, with R2 = 0.98). However, these numbers peaked in March 2007, and since then, Wikipedia has experienced a linear decline, with a net loss of about 2,100 editors each year (see the dashed line, with R2 = 0.90). Extrapolating this loss suggests that the number of Wikipedia editors will reach zero in January 2028 – although in reality Wikipedia will die much sooner unless something changes.

Wikipedia’s problems are fundamentally human, and relate to internal conflicts which have caused thousands of editors to leave. Wikipedia’s system of governance has proven unable to deal with this problem, for a variety of reasons.

However, this raises the question of whether technology solutions can ameliorate Wikipedia’s problems. A recent paper on arXiv, discussed also on the MIT Technology Review, suggests automated tools for assessing article quality, based on edit-longevity and contributor-centrality measures.

This is certainly an intriguing idea, but one that fails to catch some spectacular quality failures. For example, one Wikipedia user made more than 87,000 article edits over a period of many years, but was found to have systematically added false information to articles on a wide range of topics (including history and video-games). Simplistic quality measures are likely to view these articles positively – the real indicator of poor quality is that the articles contain references which do not support (or, in many cases, contradict) the statements to which they are attached. In theory an automated tool could detect that, but it would not be easy.

See also this article on Wikipedia’s decline, which suggests that technology applied to date has been part of the problem [Halfaker, A., Gieger, R. S., Morgan, J., & Riedl, J. (2013). The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to sudden popularity is causing its decline. American Behavioral Scientist 57(5) 664-688, DOI:10.1177/0002764212469365].