Topic Analysis on the New Testament

I have been experimenting recently with Latent Dirichlet allocation for automatic determination of topics in documents. This is a popular technique, although it works better for some kinds of document than for others. Above (click to zoom) is a topic matrix for the Greek New Testament (using the stemmed 1904 Nestle text, removing 47 common words before analysis, and specifying 14 as the number of topics in advance). The size of the coloured dots in the matrix shows the degree to which a given topic can be found in a given book. The topics (and the most important words associated with them) are:

A better set of topics can probably be obtained with a bit more experimentation. Alternatively, here (as a simpler form of analysis) are the relative frequencies of some Greek words or sets of words, scaled to the range 0 to 1 for each word set (with the bar chart showing the total number of words in each New Testament book). Not surprisingly, angels appear more frequently in Revelation than anywhere else, while love is particularly frequent in 1 John:


Would y’all like to see a map?

The word y’all is used as a second person plural pronoun in the United States (although in my travels I have also heard it used as a polite singular). The map above (click to zoom) shows the average frequency of use by state, according to the 2003 Harvard Dialect Survey. The usage is primarily Southern.

English needs a second person plural pronoun, it seems to me. What do all-y’all think?

Image produced using the maps package of R. Other visualisations of the survey exist.


Bridges, gender, and Benjamin Lee Whorf

I’ve long been fascinated by the Sapir-Whorf hypothesis – the idea that the structure of language determines (or at least influences) the way that you think. I first read Whorf’s book several decades ago.

A friend recently pointed me at this TED talk by Lera Boroditsky. After years of being sneered at, it seems that Whorf is back in fashion.

And there’s certainly something to Whorf’s ideas. For example, there is solid evidence that the way that you name colours influences the way that you see them (slightly, anyway). There is some exaggeration in the TED talk, though. Australian aboriginal speakers of Kuuk Thaayorre have a unique way of describing directions (in absolute, rather than relative terms, e.g. “there is an ant on your northern leg”). They also navigate well across their tribal lands. But is there a causal relationship? Do aboriginal people with this linguistic feature navigate better than those without it? No, they don’t.

Even stranger is the idea that Spanish speakers, for whom a bridge is masculine (el puente), are less likely to describe a bridge as “beautiful,” and more likely to describe it as “strong,” than German speakers, for whom a bridge is feminine (die Brücke). There really are way too many confounding factors there – people who speak different languages differ in other ways too. So I thought I’d try a quick-and-dirty experiment of my own.

For a set of 17 languages, I counted Google hits for the phrases “beautiful bridge” (e.g. French: beau pont, German: schöne Brücke) and “strong bridge” (e.g. Greek: ισχυρή γέφυρα, Dutch: sterke brug), divided one set of numbers by the other, and took the logarithms of those ratios. The chart below summarises the results. Languages in pink have a feminine bridge, languages in blue have a masculine bridge, and languages in grey have a bridge which is neither (for example, English has no gender, while Dutch and Swedish have merged masculine and feminine into a “common” gender).

The mean values there are 0.95, 1.14, and 1.60, where positive numbers mean more hits with “beautiful bridge” (i.e. the trend runs the opposite way from the prediction), but none of the differences are statistically significant (p > 0.4). Gender does not seem to influence perceptions of bridges.

Interestingly, if we exclude the international languages English and Spanish, there is actually a statistically significant (but weak) correlation with GDP of the relevant nation (p = 0.029, r = 0.58). On the whole, poorer countries are more likely to describe a bridge as “strong,” and wealthier countries as “beautiful.” That makes sense, if you think about it (although Iceland is an exception to this pattern).

How about you? Is the bridge beautiful, or strong?