Topic Analysis on the New Testament

I have been experimenting recently with Latent Dirichlet allocation for automatic determination of topics in documents. This is a popular technique, although it works better for some kinds of document than for others. Above (click to zoom) is a topic matrix for the Greek New Testament (using the stemmed 1904 Nestle text, removing 47 common words before analysis, and specifying 14 as the number of topics in advance). The size of the coloured dots in the matrix shows the degree to which a given topic can be found in a given book. The topics (and the most important words associated with them) are:

A better set of topics can probably be obtained with a bit more experimentation. Alternatively, here (as a simpler form of analysis) are the relative frequencies of some Greek words or sets of words, scaled to the range 0 to 1 for each word set (with the bar chart showing the total number of words in each New Testament book). Not surprisingly, angels appear more frequently in Revelation than anywhere else, while love is particularly frequent in 1 John:


The 2019 Eurovision Song Contest is on right now. Above (click to zoom) is a combined word cloud for the songs (or English translations of the songs).

From the point of view of getting into the final, it seems to be bad to sing about Heaven (Montenegro, Portugal), war (Croatia, Finland), cell phones (Belgium, Portugal), or cold (Latvia, Poland, Romania). On the other hand, it’s good to sing about lights (Germany, Norway, Sweden).

Good luck to everyone for the final!

A Brief History of Science in English Words

Inspired by this book, here is a brief history of science in ten English words:

Alembic (14th century). The word “alembic” comes to us from the Greek word ἄμβιξ (ambix) via the Arabic الأنبيق (al-anbīq). See this Google ngram and this dictionary entry. As with “algebra,” “Alnitak,” and “alizarin,” the Arabic definite article “al” in the name of this forgotten item of laboratory equipment is a reminder of the debt which medieval European science owes to the Islamic world.

Atom (15th century). The word “atom” also comes from Greek. A school of Greek philosophers used “a-tomos” (“un-cuttable”) as the name for hypothetical indivisible units of matter. The word was revived in 1805 by the English chemist John Dalton, giving it the meaning it still has in modern chemistry (though without any knowledge of atomic structure). See this Google ngram and this dictionary entry. A later age was to give us “atom bomb.”

Fossil (1610s). Originally referring to anything dug up from the ground, and coming to us from Latin via French, the word “fossil” gradually transformed itself into the modern meaning as people became more and more interested in digging up fossilized plants and animals. Geological theories about the formation of these fossils then gave us the verb “fossilize.” See this Google ngram and this dictionary entry.

Microscope (1650s). The microscope was invented around 1590 in the Netherlands, and pioneering microscopic work was done by Antonie van Leeuwenhoek (1632–1723) and Robert Hooke (1635–1703). The word itself comes from the Greek μικρός (mikrós, small) and σκοπεῖν (skopeîn, to see). See this Google ngram and this dictionary entry. From the same era, thanks to Galileo, we get “telescope.”

Stamen (1660s). The word “stamen” was adopted from Latin to refer to the (male) pollen-producing organ of a flower. The tip of the stamen is called an “anther” (from Greek via French). See this Google ngram and this dictionary entry. The increasing scientific interest in the internal structure of flowers led to the enormously important taxonomic work of Carl Linnaeus.

Metre (1797). The “metre,” as a new unit of measurement, was proposed by the French Academy of Sciences in 1791, and defined to be 1/10,000,000 of the distance between the Equator and the North Pole (measured via Paris). Today, the metre is defined to be the distance travelled by light in a vacuum during 1/299,792,458 of a second. See this Google ngram and this dictionary entry. The Système International d’Unités has also given us “litre,” “gram,” and many units of measurement named after scientists.

Burette (1836). The word “burette” (and likewise “pipette”) comes to us from French, specifically from an 1824 paper by the French chemist Joseph Louis Gay-Lussac (see this Google ngram and this dictionary entry). The design of the instrument we use today is due to Karl Friedrich Mohr, but the name serves as a reminder of the significant French contributions to chemistry.

Nova (1877). The adjective “nova” (Latin feminine singular for “new”) has a long history. After being applied as an adjective to new stars, it became a noun in its own right around 1877 (see this Google ngram and this dictionary entry). The word “supernova” followed in 1934.

Transistor (1948). The transistor was invented at Bell Labs in 1947. John R. Pierce suggested the word the following year, by analogy with words such as “resistor,” and it was adopted after a survey of selected Bell Labs staff (see this Google ngram and this dictionary entry). A decade later, “transistor radio” appeared, as the word “transistor” began to represent a new electronic age.

Laser (1959). The word “laser” was coined in 1957 by Gordon Gould and first used publically in 1959. It was originally an acronym for “Light Amplification by Stimulated Emission of Radiation,” by analogy with “maser” (see this Google ngram and this dictionary entry). The first working laser was developed by Theodore Maiman and Irnee D’Haenens in 1960. A few years later, this noun spawned the verb “lase.”

Perhaps as a result of what C. P. Snow called “The Two Cultures,” the past century seems to have seen a movement away from Greek and Latin borrowings. The increasing dominance of English has also seen fewer borrowing from modern languages (like “burette”). And with the development of totally new devices and totally new concepts, invented words like “laser” and “gluon” seem to have become more common.

Three little words explained


The word “traffic” means “all the cars except mine.” As the old slogan goes, “you’re not stuck in traffic, you are traffic.”


The word “away” (as in “I’ll throw it away”) means “where someone else can deal with it.”


The word “bug” (as in “software bug”) means “my stupid mistake,” but suggests that an error somehow crawled (or flew) into my program without me being responsible. Which is sometimes a convenient fantasy.