More Recent Comments

Wednesday, February 07, 2007

The Real Genetic Code

This is the genetic code. It shows the relationship between a sequence of nucleotides in messenger RNA (mRNA), or DNA, and the amino acids that are inserted into a growing polypeptide chain.

Each codon consists of three nucleotides and you read them from 5ʹ ("five prime") to 3ʹ ("three prime"). The first one is one the left of the box, the second one is at the top, and the third one is along the right-hand edge. The genetic code tells you that codon CUU encodes leucine (Leu), and so do codons CUC, CUA, and CUG. (The Genetic Code is redundant.)

The three STOP codons tell the protein synthesis machine to stop making protein. The methionine (Met) codon (AUG) is usually the start codon that tells the machinery to start making a protein. There are a few unusual variants of the genetic code that aren't shown in the figure.

The Genetic Code was cracked in the early 1960's when the meaning of each codon was worked out. Since then it has become routine to decode any message in the coding regions of DNA and RNA by simply referring to the genetic code shown above. For example, you can decode the following sequence of RNA if you know that it starts on the left at the initiation codon AUG.

This is the same procedure that we use to translate a string of dots and dashes sent over a telegraph line. The string of dots and dashes is the message, the Morse Code is the lookup table that we use to decode the mesage. We do not say that the string of dots and dashes is the Morse Code. We say that it's a message encrypted using the Morse Code. Similarly, we do not say that a string of nucleotides is the genetic code. It's the message that's translated using the Genetic Code.


  1. I'm not a biologist, but I'm curious - what determines how the nucleotides are grouped for decoding (which groups of three), and what determines the decoding direction (maybe the asymmetric nucleotides in the start codon?). Since there are separate start and stop codons, I'm assuming the direction matters.

  2. Great blog; thanks for the effort...

    My favourite visual representation of the genetic code is Ben Fry's, available in a number of versions here:

    It is simple, clear and economical, and makes some of the patterns inherent in the code (such as hydrophobicity) readily apparent. This page does not link to an interactive version I have seen him demonstrate, unfortunately....

    Another great article (to a dilettante like me, anyway) is Brian Hayes' popular review in American Scientist on the optimality of the code's "design":

  3. Blogger cut off the American Scientist URL... You can find the Hayes article by searching for "Brian Hayes genetic code" on the American Scientist site; its from the Nov-Dec. 2004 issue...

  4. I'm not a biologist, but I'm curious - what determines how the nucleotides are grouped for decoding (which groups of three), and what determines the decoding direction (maybe the asymmetric nucleotides in the start codon?). Since there are separate start and stop codons, I'm assuming the direction matters.

    Yes, the direction matters. The gene is copied into single-stranded messenger RNA and this mRNA has a definite orientation based on how the nucleotides are joined up.

    The part of the mRNA that encodes amino acids for protein synthesis is called the coding region. It consists of adjacent nucleotide triplets (codons) that begin near the 5' end of the mRNA. The other end is called the 3' end.

    The correct frame for reading the codons is determined by the intitiation codon, which is located near the beginning of the mRNA. Eukaryotes and prokaryotes have different mechanisms for identifying the true initiation codon but the basic idea is similar. The protein synthesis machinery will only assemble at the correct initiation codon.

    From then on, protein synthesis takes place in a progressive stepwise manner by translating each codon then moving on the the next one. This continues until one of the in-frame termination codons is encountered. At that point the disassembly of the protein synthesis machinery is triggered by special protein factors that bind to the termination codon.

  5. How about a report on the alternative genetic code used by the U-boat corps for extra security?

  6. Encryption or encode? Encode is the correct term to use.

  7. I know this was posted many years ago, but I wanted to ask if you think that the genetic code is in fact a real code. I was in a philosophical discussion with an ID proponent that defended that the genetic code is a real code and therefore must have been coded by a designer. My view as a Biochemist is that, strictly speaking, the term "code" refers to a system of symbolic representations that have no meaning outside human minds and without a convention about link between the symbols and what they represent. Thus, even though we can assign each codon of the table to each amino acid that they represent, my understanding is that nature doesn't have this ability: correct tRNA is established by the strength of tRNA:mRNA pairing whereas those remaining tRNAs are "seen" indiscriminately as incorrect (poor binding) as no mechanism exist to tell those incorrect tRNAs apart.

    1. It is entirely a matter of definition. What defines a code? The problem with the ID argument is that it is semantic. It attempts to establish that something must have been designed by what label you use to describe it. ID proponents like to insist that if something is a "code", then "it must have been coded by an intelligent designer". Simply point out the elementary logical fact that that simply doesn't follow. It could have been coded by evolution instead.

      You can simply elect to agree to call it a code so you avoid the semantical argument entirely, and just proceed to discuss whether there's any evidence that the genetic code was designed. The label isn't itself evidence for anything, nor are definitions.

      There is good evidence that the genetic code derives from a process of evolution(multiple independent lines of evidence converge on a similar history of code evolution from simpler to more complex), both with respect to the amino acid repertoire it encoded at various stages (from fewer and simpler, to more and more complex amino acids), and in the complexity and attributes of individual translation system components (the molecules that in effect implement the coding system, such as tRNA, the ribosome, aminoacyl-tRNA enzymes, etc.).

      In contrast, there's no attribute of the code or translation system that indicates it was designed. It being a "literal code" isn't evidence for how it came to exist.