The genetic code and its variations

The discovery of the structure of DNA by Watson and Crick in 1953, and the Crick, Brenner, Barnett, Watts-Tobin experiment of 1961 led to the elucidation of the genetic code, shown above. Triplets of the RNA letters “U,” “C,” “A,” and “G” (or the DNA letters “T,” “C,” “A,” and “G”) encode the amino acids Alanine (Ala/A), Arginine (Arg/R), Asparagine (Asn/N), Aspartic acid (Asp/D), Cysteine (Cys/C), Glutamine (Gln/Q), Glycine (Gly/G), Histidine (His/H), Isoleucine (Ile/I), Leucine (Leu/L), Lysine (Lys/K), Methionine (Met/M), Phenylalanine (Phe/F), Proline (Pro/P), Serine (Ser/S), Threonine (Thr/T), Tryptophan (Trp/W), Tyrosine (Tyr/Y), and Valine (Val/V). Consequently, long strings of DNA letters map to long strings of amino acids (that is, to proteins).

The genetic code is largely standard, but nevertheless comes in several variations, as shown above. For example, the “Stop” triplet UGA codes for Cysteine (Cys/C) in the nuclei of certain ciliate protozoa, and for Tryptophan (Trp/W) within most mitochondria (mitochondria not only have their own DNA, but they have their own genetic code for interpreting it). In the table above (click to zoom), colour indicates the number of different alternative meanings for each triplet, and “+” signs give a rough indication of how common an alternative is. The table shows several intriguing patterns.

The diagram below uses multi-dimensional scaling (with R) to visualise the differences between the various genetic codes, with the standard code in blue. The mitochondrial codes (yellow) have substantial variation, compared to nuclear codes (green). These variations in the mitochondrial codes are believed to be the result of random genetic drift.

The tree below (the result of neighbour-joining with R) offers a somewhat less informative view of the differences between the various genetic codes: