Sunday, January 22, 2012

The Modern Molecular Clock

The first molecular phylogenetic trees were constructed from the amino acid sequences of small proteins. One of those proteins was cytochrome c and it turned out to be very useful because homologues could be found in all species, including bacteria.

The original trees were published by Emanual Margoliash but I'm showing a later version here from Fitch and Margoliash (1967). This is a very famous tree that's found in many textbooks. (The version shown here is from Mulligan (2008).)

From the very beginning, the authors of these molecular phylogenetic trees noted that the rate of change in each lineage was approximately constant. You can see that in the tree shown here. The number of changes in the lineage leading to yeast (Saccharomyces) is 17+10+2=31 from the common root. The number of changes in the lineage leading to insects is 31 or 28, depending on the species. The number leading to humans and monkeys is 32.

All modern species appear to be at the ends of lineages that have evolved at a relatively constant rate since they diverged from a common ancestor. This result was surprising since most biochemists thought of evolution by natural selection as the main mechanism. How is it possible that the environments of insects, yeast, and primates change at a constant rate causing natural selection to make the same number of changes in each lineage over millions of years?

The same result was observed when bacteria were added to the tree a few years later. There's an approximate molecular clock. Of course by that time Kimura and others (including Fitch) had published on Neutral Theory and that explained the approximate molecular clock. The changes in amino acid sequence are neutral and they become fixed by random genetic drift. Since drift is a stochastic process, the rate of fixation of these neutral alleles is approximately constant over time.1 The astonishing conclusion—which most people still don't grasp—is that the vast majority of all evolutionary change is by random genetic drift, not natural selection.

The discovery of an approximate molecular clock led immediately to attempts to calibrate it with respect to time. Margoliash (1963) was the first to do this using cytochrome c sequences. Here's a table from his paper ...


Note that the number of changes from yeast to all of the animals is approximately 44 amino acid substitutions. (The Fitch and Margoliash values in the chart above distances are percentages but the point is still valid.) If you plug in divergence times from the fossil data, then using known calibration points, such as the divergence of horses and humans, or horses and chickens, you can extrapolate to a predicted yeast-animal split at about 500 million years ago.

We now know that the calibration points are incorrect and that fungi and animals diverged about one billion years ago. That doesn't change the fact that there's an approximate molecular clock. It just affects the calibration of that clock with time (years).

There are many problems with molecular clocks, including the fact that they tick at different rates for different proteins as shown below. This is because the sequences of some proteins, like cytochrome c, are highly constrained by natural selection (i.e. conserved). Other proteins, like fibrinopeptides, can tolerate many more changes.


There are two recent reviews that are well worth reading if you're interested in molecular clocks (Broham and Penny, 2003; Kumar, 2005). They discuss the problems with calibration (evident in the figure above) and the problems with relating the molecular clock to generation time and not years. The bottom line is that the molecular clock does not correspond exactly to the prediction of neutral theory but it's close enough to be used to estimate times of divergence. It's still powerful evidence that most changes in gene/protein sequences are neutral changes that have been fixed by random genetic drift. Natural selection is a minor player in molecular evolution.


1. It's equal to the mutation rate, μ, and independent of population size.

Bromham L, Penny D. (2003) The modern molecular clock. Nat Rev Genet. 2003 Mar;4(3):216-24. [doi:10.1038/nrg1020]

Fitch, W.M. and Margoliash, E. (1967) Construction of phylogenetic trees. Science 155:279–284.

Kumar. S. (2005) Molecular clocks: four decades of evolution. Nat, Rev, Genet, 6:654-662.

Margoliash, E. (1963) Primary structure and evolution of cytochrome c. Proc. Natl. Acad. Sci. USA 50:672-679.

Mulligan, P.K. (2008) Proteins, evolution of in AccessScience, ©McGraw-Hill Companies.

21 comments:

  1. Creationsts have been seen arguing that because molecular clocks are calibrated with fossil dates, this somehow constitutes cheating because, as they say, then the fossil dates estimated with radiometric dating, doesn't support the molecular clock dates at all. So somehow, if a fossil dates at 140 million years, and an "uncalibrated" molecular clock dates to 120 million years, that means both the fossil AND the molecular lock date is wrong and unreliable. They're completely impervious to the argument that the a molecular clock isn't a precision instrument but that the dates still blow gigantic holes through YEC.

    ReplyDelete
    Replies
    1. In most cases the molecular clock predicts deeper divergence times than the fossil record. We know why that's the case but it requires far more intelligence than most IDiots possess.

      You don't need to put dates on a molecular phylogeny to understand why it offers powerful independent support for an evolutionary explanation of the history of life. I'm sure that Intelligent Design Creationism has a wonderful explanation for the amazing congruence of the molecular, fossil, and morphological data—I just haven't come across it yet.

      Could someone post a link?

      Delete
    2. Not sure about a pure ID explanation(there are so many different views), but I know old-fashioned YEC argues that fossil ages are inventions by lying scientists who "throw away data that doesn't fit with an evolutionary perspective" or some such nonsense. In other words, the congruence between fossil ages and phylogenetics is the result of a conspiratorial construction. That's in regards to the ages, though. They also completely reject radiometric dating and stratigraphy for estimating dates of fossils of course.

      Their explanation for morphological and and genetic similarities is because they argue for some kind of hyper-speed, post-Noah's-flood "within kinds" evolution, by that they mean there is multiple independent ancestry for for a number of proto-species (the "kinds") that were on the ark(Cats, Dogs, Monkeys, Humans etc. etc.).
      It's hard to get any kind of consensus from them, because they all seem to be arguing different things all the time, since they just invent whatever explanation they think they need at the moment. Their explanation for the genetics thing is usually just hand-waved away with the "common desing - common designer" argument.

      I think the more "sophisticated" IDiots like Behe and Axe would argue that the congruence is real, but that goddiddit all in the end. That evolution was guided by the unnamed designer(god, the christian one of course).

      Delete
    3. Interestingly, Schwartz published a paper a couple of years ago attacking 'the molecular clock' as a means of supporting his 'orangutans and humans are the closest sister group' schtick. I was pretty surprised that his paper made it into print as it did, for it was quite illogical in its attack.

      Delete
    4. Just to add a few thoughts... While deep coalescence can be a problem for shallow comparisons - especially when single markers are used - a recent simulation study has provided ways of quantifying the bias and fixing it: adding a mt marker to a handful of nuclear markers tends to sort this out provided your phylogenetic model is adequate.

      Also, the apparent acceleration amongst recently diverged species ('time-dependent' molecular rates, sensu Ho et al. 2005) will overestimate divergence times in an uncalibrated tree. However, the acceleration is not apparent in synonymous sites (Subramanian & Lambert, 2011).

      Further, there is some evidence that any incongruence between molecular and fossil dating is reduced when large numbers of loci are used. In their review paper Nei et al. (2010) demonstrate that little if any bias exists in a vertebrate molecular clock based on ~4200 gene orthologues for divergences dating from a several million years (human-chimp) to several hundred million years (human-zebrafish) (p 271). There was little difference when dN, dS or amino acid changes were used.

      Overall, a molecular clock applied with forethought to its assumptions can be quite robust.

      Delete
  2. Isn't there still a big question about what percent of amino acid differences between species are maintained by selection? (e.g., nonsynonymous selection on gene expression)

    ReplyDelete
    Replies
    1. Yes. I think that as much as 1% of the amino acid substitutions might be adaptive and others think it could be 2%! :-)

      Seriously, there is some controversy. Some scientists argue that a majority of amino acid substitutions conferred some selective advantage and the allele was fixed because the mutated protein worked better than the ancestral protein in that particular species.

      Unfortunately, they are wrong but they just don't realize it yet.

      Delete
    2. Gillt --

      Nei et al. (2010) - the review paper I mention above - discuss quite a bit on this topic. Amongst other things, they show that amino acid substitutions are still clock-like, indicating the majority of such substitutions are effectively neutral.

      Delete
    3. I have two questions:

      1) Negative selection affects mutation rate - therefore for different proteins there are different rates.
      But that would mean that negative selection occurs quite often?
      For example mutation rate of Fibrinopeptides is about(*) 5 times higher than of Hemoglobin. That would mean that 4 of 5 mutations in Hemoglobin are purged out by negative selection (compared with Fibrinopeptides)?

      (*) This value is taken from the graph. I couldn't find exact rates on internet, but I think it doesn't matters for my question.

      2) Negative selection affects (slows down) mutation rate, but mutation rate is still quite constant over time.
      Would that mean that on average(?) negative selection is also stochastic process?

      Delete
  3. FWIW, the chart above(actually the earlier one with Snakes separating from Turtles, which then give rise to Birds....)I recognize from a textbook I just got for a molecular systematics course (text is Lemey et als 2009 Phylogenetic Handbook).

    ReplyDelete
  4. The other fun feature of the cytochrome c tree is that it's exactly what Michael Denton used in Evolution: A Theory in Crisis to show that molecular data don't fit evolutionary expectations, because he thought they ought to show a scala naturae rather than a branching tree with extant species at the tips. Of course, he later fixed that.

    There's a literature on why the rattlesnake is out of place on the tree: some seriously rapid evolution. But in fact snakes might be farther from birds than turtles are, though they clearly shouldn't be farther than mammals too.

    ReplyDelete
    Replies
    1. John, you are misrepresenting Denton. Denton was concerned about the constant rate of change and the apparent molecular clock. He said,
      What sort of mutational mechanism might have generated uniform rates of evolution over vast periods of time in vastly dissimilar types of organisms? Basically, there are only two types of changes that can occur to the sequences of the genes specifying for functional proteins: neutral mutations which have no effect on function and are substituted by drift; and advantageous mutations which have a positive effect on function and are substituted by selection.

      Unfortunately, neither evolution by genetic drift nor evolution by positive selection is likely to have generated anything remotely resembling a uniform rate of evolution in a family of homologous proteins.

      The rate of genetic drift in any gene is directly related to and determined by the mutation rate. This is not controversial.


      He then goes on to point out that the various lineages have vastly different generation times so it's difficult to imagine how the rate of mutation could be constant in, say, yeast and humans. This rules out any explanation based on neutral mutations and drift, according to Denton.

      Denton actually knew much more about evolution in 1985 than most of the people who were fighting him. They didn't understand the point he was making so they made up a story about him not understanding phylogenetic trees. That false story, promoted on talk.origins, has persisted for more than two decades.

      Delete
    2. I'll admit I read the book quite some time ago and don't have a copy handy. As I recall, your position is that Denton didn't change his mind about anything between his two books and so wasn't rejecting common descent even in the first book. I really have to find a copy and look again, because I find that very hard to believe based my perhaps fallible memory.

      The reason for this confusion, if it's confusion, is that creationists universally interpreted Denton's argument that way, and it has become a common creationist claim, regardless of the original source. I suspect most people first encountered it second-hand, from some creationist.

      Delete
    3. I side with J.H. on this one. Denton spent the overwhelming majority of chapter 12 of Evolution: ATIC attacking the Scala Naturae view. Criticism of other views is relatively brief. In making sequence comparisons, Denton consistently picks up the stick by the wrong end, and chooses the least informative graphics to illustrate his point; ray diagrams and Venn diagrams (plus, the Venn diagrams are just plain wrong.) It's the sort of performance which leads an informed reader to ask, was Denton dishonest or just incompetent?

      Delete
    4. I agree with you that Denton spends a lot of time arguing against the Scala Naturae view. He explains why it is wrong to think like that.

      Delete
    5. To a nonspecialist, it seems to me that the most unfortunate aspect of Denton's Chapter 12 is that he argues that common descent implies that there should be amino acid sequences for cytochrome c intermediate between prokaryotes and eukaryotes. This would be analogous to expecting to find a crocoduck intermediate between crocodiles and ducks.

      Delete
  5. Can someone either explain to this community college teacher (or point me to a concise explanation of (that can realistically be read during a busy semester)) how genetic drift can fix an allele in a large population? Small populations I can see, but I just don't have an intuitive sense for how it works for larger populations. Thanks

    ReplyDelete
    Replies
    1. If you think about a particular neutral allele it's more difficult for it to be fixed in a large population than in a small one. In fact, the probability of fixation is 1/2N, where N is the population size.

      However, if you think about fixing any neutral allele the situation changes. The frequency of mutation (μ) determines how many neutral mutations arise in a population. The number of mutations per generation in a population is 2Nμ and that means that there will be more mutations in a large population than in a small one.

      The probability of fixing any neutral allele is 2Nμ/2N = μ. Thus, the number of neutral alleles that become fixed is equal to the mutation rate and it's independent of population size.

      Delete
    2. And neutral alleles being far more common than adaptive or maladaptive ones, this is why you say that "the vast majority of evolution is by genetic drift"? Every semester I aspire to have my non-majors understand the idea of neutral mutations in more than a "my instructor said that the vast majority of mutations don't change anything"... but I must admit to struggling with the the role it actually plays in evolution.

      Delete
  6. Where does the figure of the clock and 3 proteins come from? Where is it originally published? Thanks

    ReplyDelete
  7. Found it, sorry for bothering
    Rate of molecular evolution
    An Introduction to Genetic Analysis. 7th edition.
    Griffiths AJF, Miller JH, Suzuki DT, et al.

    ReplyDelete