Friday, March 14, 2008

Evolution and Variation in Folded Proteins

As a general rule, the primary structure of a protein (amino acid sequence) ultimately determines it's final three-dimensional shape1. Proteins fold spontaneously to adopt a specific structure that minimizes free energy. The folded protein occupies the bottom of a free energy well [How Proteins Fold, The Anfinsen Experiment in Protein Folding, Disulfide Bridges Stabilize Folded Proteins, Heat Shock and Molecular Chaperones].

Each protein has a characteristic shape associated with its function. When we discuss the evolution of proteins, we like to divide the residues into three categories as shown below for the structure of myoglobin from sperm whale (Physeter catodon) [PDB 1A6M].

Myoglobin is a small protein with a bound heme group (shown as a space-filling molecule). It carries oxygen in the bloodstream and tissues. The oxygen molecule binds to the active site of the protein near one side of the heme group. There are specific amino acid residues at the active site that are absolutely required for binding oxygen. As you might expect, these amino acids are highly conserved—they will be found at that position in myoglobin from humans or any other species.

The second category of amino acid residues makes up the hydrophobic interior of the protein. Myoglobin is an all-α-helical protein and several of the helices group together to form a helix bundle. The interior of that bundle consists largely of hydrophobic amino acid residues. This is what stabilizes the three-dimensional structure and causes the polypeptide chain to spontaneously fold after it is synthesized.

The third category of residues is the surface residues. These are usually hydrophilic residues that interact with the surrounding water. The surface residues don't make as much of a contribution to the overall three-dimensional structure so their exact composition can be quite variable.

The class of proteins to which myoglobin belongs is called "globins." There are two other globins that you are probably familiar with: α-globin and β-globin are the two polypeptides that come together to form an α2β2 hemoglobin tetramer.

The three proteins (myoglobin, α-globin, and β-globin) descended by gene duplication from a common ancestral globin several hundred million years ago. Today their amino acid sequences are quite different due to the accumulation of random mutations and fixation by random genetic drift. In spite of the differences in primary structure, the three-dimensional structures of the three proteins are very similar. This can easily be shown by superimposing the three structures as shown in the figure (myoglobin=green, α-globin=blue, β-globin=purple).

Most people don't appreciate the amount of variation that underlies this conserved three-dimensional structure. It's worth taking a look at a bunch of aligned globin sequences from different species to see exactly which amino acids are highly conserved and which positions can tolerate almost any amino acid.

Let's go to the Pfam (protein family) database at the Sanger Institute in Cambridge (UK). The entry for the globin family is Globin PF00041. Click on "Alignments" in the left sidebar. This link takes you to the alignment page where you can create an alignment of all the known globin sequences. Choose 75 seeds (default) in the first table and select "Pfam viewer" from the pull-down menu under "Viewer." Click "View" to see the alignments.

Highly conserved amino acid residues are highlighted by vertical shading in the Pfam view. The first thing you should notice is that there are very few amino acids that are invariant. The conserved residue on the left (blue) is tryptophan (W). It's present in most of the globins from different species but not all. Look at the other positions and note that in most cases a variety of different amino acid residues can be substituted. Sometimes only hydrophobic residues (blue) can be found at a particular site and sometimes there are other restricted choices. Lots of insertions and deletions (dots) can be tolerated without major disruption to the overall three-dimensional structure.

Data like this reveals that the amino acid residues in the active site are usually conserved. Residues in the hydrophobic core are moderately conserved. And residues on the surface are hardly conserved at all.

The point is that there are literally billions of different proteins that have the same shape as globins and still function as carriers of oxygen. This is an important point. Opponents of evolution often take a single globin from a single species and calculate the probability that such a structure will form. They assume that only one out of twenty amino acids can be found at each position and the resulting probability (e.g., 20020) is enormous. Thus, they conclude, such a protein could never form by chance. They don't seem to appreciate the fact that we already know of billions of different proteins that can function as globins.

There are many other examples of this observation. The four structures below show the conformation of the cytochrome c polypeptide chain from tuna, rice, yeast, and a bacterium. The amino acid sequences have diverged considerably from their common ancestor of 3 billion years ago but the structures are very similar.

We conclude that the amino acid sequence of a polypeptide determines how it will fold in three-dimensional space but there are billions of different amino acid sequences that will adopt the same structure.

Finally, let's look at a more complicated example. The enzymes lactate dehydrogenase (below left) and malate dehydrogenase (below right) share a common ancestor even though they are different enzymes. This is a case where substitutions of amino acid residues in the active site gave rise to a new activity. Today the amino acid sequence similarity is barely above the threshold for defining homology but the structures are still very similar.




1. Other factors that contribute are bound ligands, such as heme groups, and interactions with other proteins as in multimeric proteins with sifferent subunits.

7 comments :

  1. In contrast, one can consider the recent transitive homology studies of Cro proteins carried out by the Cordes lab (see PNAS for their paper). They found two proteins with 40% sequence identity and identical functions, but very different folds. That this is surprising essentially proves the rule that we expect enormous sequence flexibility for a given structure.

    Temperature and ionic strength have also been found to occasionally dicate structure, and not just in terms of denaturation or aggregation. Brian Volkman's work on lymphotactin indicates that these conditions can cause transitions between different stably folded structures.

    ReplyDelete
  2. I've been waiting for this post. This point needs following up:

    The point is that there are literally billions of different proteins that have the same shape as globins and still function as carriers of oxygen. This is an important point. Opponents of evolution often take a single globin from a single species and calculate the probability that such a structure will form. They assume that only one out of twenty amino acids can be found at each position and the resulting probability (e.g., 200^20) is enormous. Thus, they conclude, such a protein could never form by chance. They don't seem to appreciate the fact that we already know of billions of different proteins that can function as globins.

    I'll keep tabs on it!

    ReplyDelete
  3. Richard Dawkins makes the same point in The Ancestor's Tale, p.589 (paperback)

    Indeed,there are lots of different amino acid sequences that will yield the same shape, which is one reason to doubt naive calculations of the astronomical 'improbability' of a particular protein chain, obtained by raising 20 to the power of its length.

    If I remember correctly that is exactly how Durston derives his estimates of improbability, which is why I was interested in knowing if the spring smackdown was still in the works.

    It's as if I lost a hand holding 3 deuces to someone who had a four of clubs, a four of hearts, a four of diamonds, a King of spades and an eight of hearts; and concluded that I needed exactly that hand to beat three deuces, when in fact there is a whole family of hands containing three of something that could do the job.

    ReplyDelete
  4. Larry, have you any opinions on the hypothesis of Drummond that expression levels are important in the rate of evolution of coding sequences ? - principally through the negative effects of high levels of misfolded proteins.
    http://www.pubmedcentral.nih.gov/picrender.fcgi?tool=pmcentrez&artid=1242296&blobtype=pdf

    ReplyDelete
  5. A paper that describes a genetic basis for the malate/lactate change you describe in the post: http://mbe.oxfordjournals.org/cgi/content/full/21/3/489

    ReplyDelete
  6. Martinc asks,

    Larry, have you any opinions on the hypothesis of Drummond that expression levels are important in the rate of evolution of coding sequences ? - principally through the negative effects of high levels of misfolded proteins.

    Of course I have an opinion! :-)

    Silent Mutations and Neutral Theory

    Biology is messy. For every generality there will be many exceptions. The trick is not to let the existence rare exceptions distort one's view of the big picture.

    Most silent mutations are neutral and so are most amino acid substitutions. The fact that some of these will affect protein folding and, therefore, the rate of gene expression does not mean that most of them will. Similarly, the fact that some codons will be translated faster than others is a well-known fact but it doesn't seem to be very important in the vast majority of cases.

    ReplyDelete
  7. I think you mean 20^200 (20 possible residues in each of 200 different positions) not 200^20 (200 possible residues in 20 positions). 20^200 is 2^20 x 20^160 times bigger!. Not that that helps the creationists much.

    ReplyDelete