Friday, January 20, 2012

Understanding Mutation Rates and Evolution

The recent article by physician Joseph A. Kuhn contains a lot of errors and misunderstandings [Physicians Can Be IDiots]. Today I want to focus on one paragraph.
The complexity of creating two sequential or simultaneous mutations that would confer improved survival has been studied in the malaria parasite when exposed to chloroquine. The actual incidence of two base-pair mutations leading to two changed amino acids leading to resistance has been shown to be 1 in 1020 cases (42). To better understand this incidence, the likelihood that Homo sapiens would achieve any single mutation of the kind required for malaria to become resistant to chloroquine (a simple shift of two amino acids) would be 100 million times 10 million years (many times the age of the universe). This example has been used to further explain the difficulty in managing more than one mutation to achieve benefit.
The reference is to The Edge of Evolution by Michael Behe. His book was published in 2007 but I never got around to reviewing it thoroughly—partly because it's so difficult to explain where he goes wrong.1 Here's my take on one part of the book: The Two Binding Sites Rule. This post covers "chloroquine-complexity clusters" (CCC).

Behe discusses chloroquine resistance in the malaria parasite Plasmodium falciparum. He notes that one of the common mutations requires two amino acid changes in a gene that encodes an ion transporter. This double mutation is rare—he estimates that it arose less than ten times since chloroquine started to be used as a treatment for malaria. Given the size of the Plasmodium population, this corresponds to a frequency of about 10-20. That's the frequency of two mutations occurring simultaneously since the error frequency of DNA replication is about 10-10 (10-10 × 10-10 = 10-20). This is what you would expect if each of the single mutations is deleterious and the only way the double mutation could arise is when the two mutations occur simultaneously in the same individual.2

This double mutation frequency, 10-20, is called a "chloroquine-complexity cluster" or "CCC." A CCC can arise in species with large populations and very short generation times but if you need two of them to occur simultaneously then it will never happen (probability = 10-40).
Put more pointedly, a double CCC is a reasonable first place to draw a tentative line marking the edge of evolution for all life on earth. We would not expect such an event to happen in all of the organisms that have ever lived over the entire history of life on this planet. So if we do find features of life that would have required a double CCC or more, then we can infer that they likely did not arise by a Darwinian process. [p. 63]
Behe is correct, provided that all his assumption are valid.

For some species, the edge of evolution is a single CCC. Humans are an example. Assuming that over the course of human evolution the average population size is one million individuals—a generous assumption—and assuming an average generation time of 10 years—a reasonable assumption—then a CCC will only arise in 1020/106 = 1014 generations. This corresponds to 1015 years.
On average, for humans to achieve a mutation like this by chance, we would need to wait a hundred million times ten million years. Since that is many times the age of the universe, it's reasonable to conclude the following: No mutation that is of the same complexity as chloroquine resistance in malaria arose by Darwinian evolution in the line leading to humans in the past ten million years. [p. 61, Behe's emphasis]
This is where Joseph A. Kuhn gets his information in the paragraph quoted at the top of this posting. I assume that IDiots like Dr. Kuhn will accept without question the word of a scientist like Michael Behe but not the word of thousands of other expert scientists who study evolution. Isn't that strange?

Now let's return to the assumption that Behe makes. He assumes that there are cases where two mutations must occur simultaneously in order to achieve some complex adaptation. He argues that each individual mutation is disadvantageous so it will be eliminated from the population by negative selection before a second mutation can occur in the same gene. There are two problems with this assumption.

First, the actual example he uses, chloroquine resistance in the malaria parasite Plasmodium falciparum is wrong. The are strains of Plasmodium that carry one of the mutations and not the other (Cooper et al. 2007). It looks like the single mutation is not deleterious but neutral. This is thought to be a common mechanism of acquiring complex adaptations that require two or more mutations in the same gene. There are plenty of examples of such intragene epistasis.

I hope most readers understand that neutral mutations can persist in populations for long periods of time. They can even become fixed.

But let's not quibble about the actual example that Behe uses. The principle is sound; namely, that there almost certainly are examples where a double mutation will be beneficial but each individual mutation is detrimental. The probability that the two mutations will arise simultaneously in the same individual is 10-20 as Behe says. If that's what has to happen then this really is beyond the edge of evolution ... or is it?

No, it's not. It's not true that deleterious mutations are quickly eliminated from a population. According the modern Nearly Neutral Theory, slightly deleterious mutations can persist for hundreds of generations and can even become fixed in the population. It depends on the strength of negative selection (i.e. the selection coefficient) and the size of the population. In small populations, slightly deleterious alleles are invisible to selection and their frequency is entirely controlled random genetic drift.

Thus, even if we accept Behe's assumption that each of the mutations is deleterious, it does NOT follow that they will be quickly eliminated by natural selection. It's quite possible to have a population carrying a slightly deleterious mutation then have a second mutation occur that makes the allele beneficial. In this case the probability is much less than 10-20 and such an allele (double mutation) is well inside the edge of evolution.

Mutations like this are known to occur in the evolving E. coli strains of Richard Lenski [Evolution in Action and Michael Behe's Reaction] (Woods et al. 2011). For more information on the presence of deleterious alleles in a population and what it means for Behe's argument see Mutations and Complex Adaptations where I quote Michael Lynch, the leading expert on this sort of thing.

It's important to understand that there's a difference between the probability that a mutation will occur and the probability that the new allele will become fixed in the population. They are not the same thing.

It's important to understand that beneficial alleles won't always be fixed; in fact, the vast majority will be lost before they ever reach significant frequency in a population.

It's important to understand that neutral alleles can persist for a very long time and may even become fixed in the population. That's why populations contain so much variability. Additional mutations may convert a neutral allele to a highly beneficial allele and this is likely a route to many complex adaptations.

It's important to understand that random genetic drift can cause slightly deleterious alleles to persist in a population. Subsequent mutations can convert a deleterious allele to a beneficial one. This is how you get beneficial double-mutant alleles even though both single mutations might be harmful. The persistence of neutral and/or deleterious alleles in a population is not Darwinian evolution so if that's the only kind of evolution you know about then you will never understand evolution.

1. Michael Behe is the best Intelligent Design Creationist. Most of his arguments are based on a reasonable knowledge of biochemistry and evolution. He accepts common descent. Many critics of Behe fail to appreciate his arguments. There are quite a few bad reviews of The Edge of Evolution (e.g. Another Bad Review of The Edge of Evolution, Chalk Up One for the Intelligent Design Creationists ).

2. Behe ignores the probability of fixation. He assumes that as soon as a beneficial mutation occurs it will be fixed in the population. That's not correct, of course, because the probability of fixation depends on the selection coefficient (s). It's approximately 2s. For a very beneficial mutation, as in chloroquine resistance, the selection coefficient might be 0.1 and the probability of fixation would be 20%. That still means that the mutation will be lost 80% of the time! This omission doesn't hurt Behe's argument because including the probability of fixation only makes his case stronger!

Cooper, R.A., Lane, K.D., Deng, B., Mu, J., Patel, J.J., Wellems, T.E., Su, X., and Ferdig, M.T. (2007) Mutations in transmembrane domains 1, 4 and 9 of the Plasmodium falciparum chloroquine resistance transporter alter susceptibility to chloroquine, quinine and quinidine. Mol Microbiol. 63:270-8. [PubMed] [doi: 10.1111/j.1365-2958.2006.05511.x]

Woods RJ, Barrick JE, Cooper TF, Shrestha U, Kauth MR, Lenski RE. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-6. [PubMed] [doi: 10.1126/science.1198914]


  1. Larry, I think you give Behe too much credit. This passage (from White, N. J. 2004. Antimalarial drug resistance. J. Clin. Invest. 113:1084-92.) is where Behe got the value of 10^-20:

    “Chloroquine resistance in P. falciparum may be multigenic and is initially conferred by mutations in a gene encoding a transporter (PfCRT) (13). In the presence of PfCRT mutations, mutations in a second transporter (PfMDR1) modulate the level of resistance in vitro, but the role of PfMDR1 mutations in determining the therapeutic response following chloroquine treatment remains unclear (13). At least one other as-yet unidentified gene is thought to be involved. Resistance to chloroquine in P. falciparum has arisen spontaneously less than ten times in the past fifty years (14). This suggests that the per-parasite probability of developing resistance de novo is on the order of 1 in 10^20 parasite multiplications. “

    White is talking about what he suspects is a multigenic trait, and Behe "interprets" this to mean a double point mutation in a single gene.

    For this reason, I am of the opinion that this value, and the part of Behe's argument that relies on this value (that would be most of The Edge of Evolution) is just plain wrong.

    1. The mutation rate is pretty close to 10^-10 according to our understanding of DNA replication and this has been confirmed by several studies on mutations in many species. That means that the probability of a double mutation occurring spontaneously is 10^-20. Behe is right about that.

    2. But consider that the mutation rate can be dramatically elevated when individuals in a population are under strong selective pressure. This is best studied as the adaptive response in E. coli where downregulation of the mismatch repair system potentially elevates the mutation rate on the order of 1000x. The rate of spontaneous double mutation under these conditions would be elevated 10^6 fold higher than predicted from studies of DNA replication under non-stress conditions.

  2. Let me provide a little math to back up Larry's argument (I am not addressing the comment by "aghunt" as it is a separate issue). If we have a population of N (diploid) individuals and we need one mutation to be followed by another to gain a fitness advantage of t, where the first mutation has a fitness disadvantage of s, and if the mutation rate is u per haploid genome, counting mutations to one of these sites, then the total number of mutations per generation is 2Nu, on average. Each such mutation, it can be shown, dies out but before it does it exists in about 1/s individuals. Each of those could have a second mutant occur, so there will be about 2Nu(1/s)u such rescuing mutations per generation.

    Each of those has a probability of fixation of about twice its selective advantage so about 2t. Thus, per generation, the number of cases where a first mutation occurs, hangs around and is ultimately rescued is about 2Nu(1/s)u(2t) or the product of 4Nt/s u^2 (that's 4Nt/s times the square of the mutation rate). If s is small and N and t are large, that could make the sequence of two mutations more plausible.

    This is approximate: if we consider cases where 4Ns is small one has to do more elaborate diffusion equation stuff -- it would alter the probability from these values as the deleterious mutation might sometimes not get lost.

    There is the complication of whether N is the number of people or the number of malaria parasites in this case. Isn't it going to be the latter? That could be a much bigger number.

  3. In any case, I think it's logically fallacious to speak about events that happened in the past and then declare on the basis that "those two mutations" are too improbable to have happened by chance. Any two single mutations are, and plenty of double mutations happen all the time. The question is whether they lead to anything "specific". What Behe is doing is basically injecting teleology into evolution, by a hidden assumption that humans must have been a goal from the get-go.

    Behe's argument would work, going forward, in that picking out two single nucleotides in the entire human genome and calculating the odds, those two specific ones have an incomprehensibly low chance of happening. But two mutations happen often, and those two also had the same unfathomably low odds of occuring, before they did. But they did.
    For Behe's argument to apply to events that happened in the past, he would have to show that some two specific mutations constituted some kind of "bottleneck" that the molecular history of life had to find, or all life would have gone extinct. He would have to show that no other evolutionary trajectory would have been possible and therefore those two were the only two mutations that could have "kept all the species alive".

    No, simply declaring now after the event, even if he manages to find two specific and therefore very unlikely mutations, that evolution couldn't have produced it, is logically fallacious. It's question begging, nothing more. With the planet covered in living organisms, most of them microbes, evolution is bound to find solutions to almost any biochemical problem it runs into. These chance arguments only work if you think some given lineage is "special" and was a "goal" in the process.

    Going forward, if we do in fact specify to specific mutations, then it might be the case that evolution will never produce them. But it will then just produce something else instead, because there is no such thing, and there never was such a thing, as a two-mutation bottleneck that the history of life had to pass through. In the land of proteins and molecular evolution, solutions are all over the place and none of them are "special". The fitness landscape of molecules is rugged, possibly even noisy in places.. but it isn't flat, as people like Behe and Axe are essentially trying to argue.

    1. In any case, I think it's logically fallacious to speak about events that happened in the past and then declare on the basis that "those two mutations" are too improbable to have happened by chance. Any two single mutations are, and plenty of double mutations happen all the time. The question is whether they lead to anything "specific". What Behe is doing is basically injecting teleology into evolution, by a hidden assumption that humans must have been a goal from the get-go.

      You are not being fair to Behe. He is postulating that IF two simultaneous mutations are required for a complex function then this is beyond the edge of evolution in most cases. He is correct about that. (Assuming that both of the single mutations are highly deleterious on their own.)

      He then tries to make the case that such CCC's exist—that's why he discusses binding sites and the "Two Binding Sites Rule."

      The logic of his argument is sound. If, as you say, there are plenty of examples where a true CCC allele has become fixed in the population then we have a problem explaining how that could have happened using simple Darwinian adaptationist models. (In species with small populations and long generation times.)

      The problem with Behe's argument is not that it's a logical fallacy—although I understand your point—it's that there's no such thing as a CCC using Behe's definition.

    2. "He is postulating that IF two simultaneous mutations are required for a complex function then this is beyond the edge of evolution in most cases. He is correct about that."

      Ummm. Wouldn't that rather read "beyond the edge of natural selection"?!

  4. I find these kind of reasoning just a play but not a basis for good solid arguments. Evolutionary histories are not simple probabilistic random trials.
    For instance, the case for a single CCC in human line: 1 in 10^15 years. But that's the time to get certainity close to 1. You expect 1 such CC change in those years, but it may actually occur at year 1 or at year 10^15 - 1. In fact, it may actually occur twice in those years. On the other hand, we have real example of several independent arousal of CCC in Plasmodium in a matter of years.

    So, in principle you cannot rule out an explanation by mutation histories based solely in how improbable it seems. Paraphrasing Sherlock Holmes, no matter however improbable, if you have an extant example it must be true. Being serious, the threshold to rule out a mechanism solely on probabilistic grounds should be many, many times (not just 10-100-1000) the age of universe. And still only just improbable, not impossible.

    The main philosophical problem here is, obviously,"purpose". If all life is seen as simple accident, all probabilistic arguments loose most of its appeal. Back at the origin of the universe, the odds that I am here now and decide sipping a little tea, in this precise second, are essentially 0. But the odds that "something happens" are basically 1 in 1.

    1. So, in principle you cannot rule out an explanation by mutation histories based solely in how improbable it seems. Paraphrasing Sherlock Holmes, no matter however improbable, if you have an extant example it must be true.

      If you have thousands of examples of true CCC's that have become fixed in thousands of species, as Behe claims, then Sherlock is right. It must be true.

      The problem is explaining how it could have happened. Since we cannot come up with a reasonable explanation using our current models, it follows that there has to be another explanation that can account for such seemingly impossible events. Even Sherlock would recognize that this is not an accident and he would start looking for a perpetrator.

      He would soon discover that looks can be deceiving and the alleles are not true CCC's. He would also discover that the old Darwinian models have been replaced by modern models that can explain the fixation of such pseudo-CCC alleles.

  5. And ... there is a link between aghunt's 'polygene' and Joe's single-gene model: recombination cannot see gene boundaries. If we are talking of a sexual diploid, the only difference between a double mutation in a polygene and in a single gene is a diminished likelihood of intervening recombination/gene conversion boundary in the latter. Even the two 'bits' of a double-mutation in a single gene must physically reside some chromosome distance apart from each other, so even in a single gene, recombination can create a gene with advantage t from two versions of it each with disadvantage s. I won't embarrass myself further by doing the math, but another potential source of 'unlikely double-mutation' in a single gene is thus provided by the chance that, somewhere in the population, two mutations in need of 'rescuing' recombine.

    This chance increases if there is more than one way to rescue a particular mutant, which would seem to invoke the "Birthday" principle - multiple subpopulations capable of mutual rescue with several others will greatly increase the overall probability that a match will be found between two of them.

    1. The probability that a single individual will carry two slightly disadvantageous alleles of the same gene, and that recombination will occur between the mutated sites, is likely to be significant compared to the probability of an independent second mutation.

      Good point.

  6. Alan Miller argues that having sites arise in different loci and be brought together by recombination increases the likelihood that both mutations will be fixed when each is individually deleterious. I think not. Crow and Kimura (in their paper in American Naturalist, 1965) considered such a case with two loci which have deleterious mutations with selection coefficient s against each of them, but selection coefficient t in favor of a haplotype that has both of them. This is the case I considered above. In my analysis I assumed no recombination, the sites being within the same locus, in effect. Here the recombination fraction between them is r.

    The upshot is that the haplotype containing both mutants can increase if r < t/(1+t). That is shown by Crow and Kimura and is also explained on pages 288-290 of my free e-book Theoretical Evolutionary Genetics.

    So recombination makes the fixation of both mutants less likely. In the rough calculation of fixation probabilities in my previous comment, you could just replace t by t - r - rt and be close.

    1. Without claiming to be much of a population geneticist, my argument was that the effect of recombination is to make the double-mutation combination more likely to arise. Serial mutation remains an additional possible source, even in the recombining population. While recombination may disrupt the combination, reducing the fixation likelihood of any given double-mutant, I think there will be more of them.

      I would further argue that r will be low in the case I suggested - two mutations in a single gene recombining together. Having been created (by recombination, or serial mutation in either order), the combination will be sundered rarely. As it increases in the population, recombination will destroy the combination even less frequently, since an appreciable fraction of inter-site recombination boundaries will result in no change.

      Essentially, we do have recombination, in most realistic sexual populations. We have mutation too, which can give either of the "s" subpopulations the complementary second mutation.

      What the overall probability might be for origination and fixation of a t mutant with recombination and mutation, I don't know - it may indeed be diminished, but we wouldn't want to be accused of stacking the deck!

    2. Certainly recombination does have the effects Alan Miller describes. There is the whole Fisher-Muller theory of the evolution of recombination, in which genes that do not interact nevertheless show natural selection to be impeded by too-close linkage (too little recombination).

      But the cases we are discussing here have strong gene interaction, of a type that would make it deleterious to break up the haplotype that has both mutations. The phenomena I am discussing (and which I think Behe is envisaging, right?) will become important in such cases.

    3. Joe,

      But the cases we are discussing here have strong gene interaction, of a type that would make it deleterious to break up the haplotype that has both mutations.

      For sure - once you achieve a beneficial epistasis, you don't want recombination breaking it up. The "breakup of adaptive gene combinations" is frequently cited as a minor 'cost of sex'. But it strikes me that this is damning the very process that frequently produces such combinations, simply because it may occasionally break one up again! Without it, they would occur more rarely, by serial mutation alone. If linkage is tight, the combination occurs rarely but is also disrupted rarely; if loose, it occurs more often and is disrupted more often. And of course this disruption is strongest while rare – AB may indeed generate Ax or xB, but as it increases in frequency, another AB is more likely to be on the homologue.

      I think both pictures, the mutational and the recombinational, need to be incorporated in an overall assessment of ‘likelihood’ for a sexual species.

      In two-locus models – the minimum for a recombinant picture – the effect is perhaps minor, and the additional source of the beneficial combination may be balanced by reduction of the probability of fixation. But if there are ‘pools’ of several single-site deleterious mutations, and some may be rescued by more than one of the available possibilities, we get the “pigeonhole” effect, where the probability of at least one favourable combination increases sharply. Mutationally, ‘rescue’ is still linearly dependent on the overall size of the single-mutation pool. But recombinationally, overall chances increase with the number of different combinations available.

      This isn’t to say that this is what recombination is 'for', but (among its many effects) it does increase the likelihood of a seemingly improbable double mutation. If Behe doesn’t account for all the possibilities, he can’t determine how unlikely it was.

      Eukaryote-style recombination has many effects, of course, but its place at the base of the eukaryotes seems to be a key discontinuity between 2-3 billion years of comparative stasis and a genuine transcending of an evolutionary ‘edge’.

  7. > Prof. Moran: Thus, even if we accept Behe's assumption that each of the mutations is deleterious, it does NOT follow that they will be quickly eliminated by natural selection.

    But where does Behe assume the mutations are deleterious? Certainly they can't be beneficial, but what Behe does, and what people do seem to miss, is to observe mutation rates, specifically those for two required mutations. Behe is not modeling! At least not in gathering this initial number.

    He is observing the rate at which evolution produces this result, so no amount of proposing this route or that exception is going to refute him. If Moran and Felsenstein and others are right, the observed rate of development of chloroquine resistance should be higher.

    So to explain that it's higher than it is, is well, more than a bit problematic.

    And any modeling comes afterwards, which Behe does, but the numbers do not come from this.