Thursday, January 08, 2015

Evolutionary biochemistry and the importance of random genetic drift

I urge you to read an important paper that has just been published in PNAS.
Lynch, M., Field, M.C., Goodson, H.V., Malik, H.S., Pereira-Leal, J.B., Roos, D.S., Turkewitz, A.P., and Sazer, S. (2014) Evolutionary cell biology: Two origins, one objective. Proc. Natl. Acad. Sci. (USA) 111:16990–16994. [doi: 10.1073/pnas.1415861111]
Here's the bit on random genetic drift. It will be of interest to readers who have been discussing the importance of drift and natural selection in a previous thread [How to think about evolution].

Do you think Lynch et al. are correct? I do. I think it's important to emphasize the role of random genetic drift and I think it's true that most biochemists and cell biologists are stuck in an adaptationist mode of thinking.
A commonly held but incorrect stance is that essentially all of evolution is a simple consequence of natural selection. Leaving no room for doubt on the process, this narrow view leaves the impression that the only unknowns in evolutionary biology are the identities of the selective agents operating on specific traits. However, population-genetic models make clear that the power of natural selection to promote beneficial mutations and to remove deleterious mutations is strongly influenced by other factors. Most notable among these factors is random genetic drift, which imposes noise in the evolutionary process owing to the finite numbers of individuals and chromosome architecture. Such stochasticity leads to the drift-barrier hypothesis for the evolvable limits to molecular refinement (28, 29), which postulates that the degree to which natural selection can refine any adaptation is defined by the genetic effective population size. One of the most dramatic examples of this principle is the inverse relationship between levels of replication fidelity and the effective population sizes of species across the Tree of Life (30). Reduced effective population sizes also lead to the establishment of weakly harmful embellishments such as introns and mobile element insertions (7). Thus, rather than genome complexity being driven by natural selection, many aspects of the former actually arise as a consequence of inefficient selection.

Indeed, many pathways to greater complexity do not confer a selective fitness advantage at all. For example, due to pervasive duplication of entire genes (7) and their regulatory regions (31) and the promiscuity of many proteins (32), genes commonly acquire multiple modular functions. Subsequent duplication of such genes can then lead to a situation in which each copy loses a complementary subfunction, channeling both down independent evolutionary paths (33). Such dynamics may be responsible for the numerous cases of rewiring of regulatory and metabolic networks noted in the previous section (34, 35). In addition, the effectively neutral acquisition of a protein–protein-binding interaction can facilitate the subsequent accumulation of mutational alterations of interface residues that would be harmful if exposed, thereby rendering what was previously a monomeric structure permanently and irreversibly heteromeric (8, 36–39)1. Finally, although it has long been assumed that selection virtually always accepts only mutations with immediate positive effects on fitness, it is now known that, in sufficiently large populations, trait modifications involving mutations with individually deleterious effects can become established in large populations when the small subset of maladapted individuals maintained by recurrent mutation acquire complementary secondary mutations that restore or even enhance fitness (40, 41).

1. Note the brief description of how irreversibly complex structures can evolve. This refutes Michael Behe's main point, which is that irreversibly complex structures can't have arisen by natural processes and must have been designed. We've known this even before Darwin's Black Box was published.


  1. "One of the most dramatic examples of this principle is the inverse relationship between levels of replication fidelity and the effective population sizes of species across the Tree of Life (30)."

    So, if a species has a small population size, replication is more accurate, and vice versa?

    1. Basically, yes. But I question the number of causal relationships that Lynch seems to attach to population size. It is inevitable that cells which are 10,000 times as big as others (Eukarya vs prokaryote) will have smaller population sizes, especially if they also build huge somas. But they will also be under significantly different mechanistic constraints. There is likely little advantage to increasing fidelity in prokaryotes, when the constraint is on getting replication out the way asap, your energetic membrane is also your boundary, and you lack the ability to 'eat' or to efficiently store and transport nutrients.

      I'm particularly puzzled by Lynch's statement that this is a 'dramatic example' of the distinction between large and smaller populations wrt drift. Assuming an advantage to fidelity, and all else being equal (which it isn't of course), the larger population should have got further up the adaptive slope.

    2. Something puzzles me. Can you define a a mutation as detrimental based on a single obvious dimension? Is it worth investigating whether a detrimental mutation that survives and propagates may have some compensating effects?

    3. It is indeed the net effect on offspring numbers, relative to the competition, that determines benefit, neutrality or detriment. Actual causality may integrate many factors. This was kind of the point I was aiming at. If you take two vastly different organisms and focus on one aspect of their overall set of constraints, you can easily err if you determine that a parameter value in one 'should' have anything to do with that in the other.

      I display my 'adaptationist' prejudices by finding it probable that there is a significant element of optimisation in the replication fidelity encountered independently in prokaryotes and eukaryotes, due to their very different modes of life and the nature of replication itself (and error correction). Reproductive rate involves integration of complex cost-benefit sums.

      Of course, the picture will also involve constraint - the accessibility of an optimisation path.

    4. Is it worth investigating whether a detrimental mutation that survives and propagates may have some compensating effects?

      Probably not. You'd be wasting a lot of time and money with very little chance of finding out anything interesting. You might give it a shot if you have tons of grant money, dozens of slaves, and don't care about publications and grant renewals.

      Did I mention that your adaptationist bias is showing? :-)

    5. Re Petrushka

      The recessive gene for sickle cell anemia is a textbook example because it strengthens resistance to malaria. Thus, is is most prevalent in areas with malaria carrying mosquitoes.

    6. Did I mention that your adaptationist bias is showing? :-)

      So do you agree with Lynch that replication fidelity is influenced more by drift (and is hence sensitive to population size) than by adaptative factors (which would be sensitive, among other things, to the correlated variable functional genome size)? Unfortunately I cannot access the full text, so don't know to what level of granularity his reference [30] subdivides the Tree of Life.

    7. @Allan Miller

      The reference #30 is to a paper by Lynch from 2012 [Sung et al. doi: 10.1073/pnas.1216223109]

      I'm skeptical about the variation in mutation rates but Lynch has published a lot of data to support his claims.

    8. Would Koonin's thinking re the role of horizontal gene transfer, if true, have any impact on the discussion of variation in mutation rates?

    9. I don't really know enough to have an adaptationist bias. I just ask questions. I'm not trying to stir the pot.

    10. The reference #30 is to a paper by Lynch from 2012

      Ah, OK thanks. The paper itself is a lot more tentative on the relationship than his own later reference to it! It is based upon a fairly small data set, but then the difficulties in getting good data for u, Ge and Ne are significant. In many of these cross-domain analyses, I find myself wondering about the existence of controls for confounding variables. Lynch addresses this to some degree, still finds Ne to be the most significant factor, but I'm not so sure. More convincing would be analysis taking sets of paired, related populations differing in Ne alone, Ge alone, and both together, and determining the variation in u. If there is an optimal u for a given Ge, but drift prevents small populations from achieving it, it is not necessarily the case that one would therefore expect the smaller populations to consistently undershoot this optimum; they may simply increase the variance.

    11. Diogenes: So, if a species has a small population size, replication is more accurate, and vice versa?

      Me: Basically, yes

      Or: basically, no. That is, the per-base mutation rate is lower in eukaryotes, but the per-genome rate is higher, and the per-generation rate (in multicellular forms) higher still.

    12. @ Allan (& Larry, Diogenes or anybody)

      Clearly, I am hopelessly over my head here. Could you recommend a text I could purchase to hopefully bring me up to speed?

    13. Hi again Allan,

      Lenski (of long-term E. coli evolution experiment fame) would seem (to my reading) to contradict your contention regarding the effect of population sizes.

      But like I said - I am hopelessly out of my depth and like I mentioned earlier would appreciate your recommendation for a current population genetics text that could bring me up to speed.

    14. Lenski (of long-term E. coli evolution experiment fame) would seem (to my reading) to contradict your contention regarding the effect of population sizes.


      I think Lenski is talking about variation in mutational robustness - the ability of populations to cope with the mutational load imposed by u - rather than population-dependent variation in the mutation rate itself. Of course, there will possibly be a 3-way relationship, as the degree of robustness encountered will affect the fitness benefit of changes to u.

      However, the problem with using digital organisms, as with using pop-genetic models, is that they ignore the significant mechanistic constraints applying between kingdoms. I'm inclined to think that those, rather than population size per se, account for a significant part of the differences. If adding fidelity costs base pairs, the prokaryote will suffer a greater cost than the eukaryote, because of immediate ecological competition with neighbours rather than summed small fitness effects in a notional 'stirred' monoculture.

    15. Tom,

      As for a text, I got most of my population genetics from Maynard Smith's primer Evolutionary Genetics and Futuyma's Evolutionary Biology (now in its 3rd edition as simply Evolution). Joe Felsenstein speaks very highly of this online course ;)

      I confess to stubbing my toe on anything but the most basic mathematics - most of my understanding is of a qualitative rather than a quantitative nature, but a grasp of the role of sampling (and hence of chance) in biology is important IMO. The central process is one of sampling, and will fix or eliminate alleles all by itself with a surprising inevitability.

  2. Don't have access to full text. From the abstract:

    future findings will significantly influence applications in agriculture, medicine, environmental science, and synthetic biology.

    Good to know this time and money wasn't wasted on pure research!


    1. Pure research provides the neutral mutations. Technology does the selecting. Or something like that.

  3. This passage is also bad for Behe, and shows he lost the argument about evolution of chloroquine resistance in malaria:

    it is now known that, in sufficiently large populations, trait modifications involving mutations with individually deleterious effects can become established in large populations when the small subset of maladapted individuals maintained by recurrent mutation acquire complementary secondary mutations that restore or even enhance fitness

    The death of irreducible complexity.

    1. And one of the most beatiful insights of modern evolutionary biology, IMHO.

    2. What happened Quest, meds ran out and you're now forgetting you already posted the same shit less than 5 minutes ago?

    3. Also, I have to say I'm sure it will go down well in your local church that the best thing in life for you is to watch two men lick each other's asses.

  4. There is missing one line to your post:
    H/T Alex Palazzo ;)

  5. I'll add one more thing. When cell biologists apply concepts from evolution to their topic of interest, they can start to generate insight - (and now for the shameless plug) - like our current commentary on ncRNAs: Non-coding RNA: what is functional and what is junk?

  6. I really injoyed your paper and will forward it to colleagues work with lncRNAs.

    1. @ apalazzo

      Out of curiosity, what proportion of ncRNA could you speculate as possibly proving to be eventually "functional"?

    2. So that question is tricky - I will give 2 answers.

      1) Abundance. Lets exclude tRNA an rRNA, which makeup >95% of all the ncRNAs in a cell ... then lets omit snRNA, snoRNA, miRNAs, 7SL ... well at the end there isn't much left! We estimate that there are on the order of 3x10^4 real lncRNA molecules in a typical mammalian tissue culture cell (+/- 1 order of magnitude). In contrast there are 10^6 rRNAs and 10^7 tRNAs. Most of these lncRNAs are represented by only a few types of lncRNA species, and these are probably most of the functional ones. Keep in mind that fewer than 1000 lncRNAs have been found to be present at >1 copy per cell. So my answer for #1 is that most ncRNAs in a cell are functional.

      2) Species. There is currently no clear list of lncRNAs. Various compilations range from 5k-50k different types of lncRNAs generated by the human genome. For real documented cases we can look to the LncRNA Database ( which keeps track of functionally validated lncRNAs. Their current list for humans is 166. But even if we take a look at the most optimistic prediction, this only accounts for 2% of the genome. So when you read that 80% of the genome is transcribed, we are talking about ncRNAs that are present at 1 copy / 10^4 cells! (There are also other types of ncRNAs like eRNAs and circular RNA, but if you are interested read our paper - they again add up to very little of the total %.)

      So then you might ask what % of the genome is converted to functional ncRNA, and the answer is I can't give you a hard number - but likely less than 1%. That's my answer for question #2.

  7. P Z Myers just recently posted astute insights regarding striking psychological antipathy to random stochastic processes as a driving force in Biology.

    To summarize, evolution of so-called irreducible greater complexity can occur despite negative selection, irrespective of positive selection and in the absence of any selection (although given Joe's citation of 1/4N , I remain uncertain how selection can ever be determined to be absent)

    1. I had thought the paper discussed and the discussions it spawned would come up as it's own post by Larry, but it illustrates the issue I brought up regarding selection and drift in the other thread in another context. Science Daily described the results this way:
      "By this measure, two-thirds of adult cancer incidence across tissues can be explained primarily by “bad luck,” when these random mutations occur in genes that can drive cancer growth, while the remaining third are due to environmental factors and inherited genes."
      Here again a random variable is split into components and these components are then treated as separate things. There is a baseline mutational process that produces cancer cells at a particular rate. Given some risk factors this rate increases (whether a cancer cell actually leads to a tumor is a matter of resampling again). But this is not a different process. I flip a biased coin 100 times and get 67 heads. How many of these heads are due to the bias and how many are due to the coin tossing as such? This question is not very useful (and has no answer). We can ask serious questions on how large the bias we estimate is, how secure we can be that there is one, etc.
      Worst of all the idea that two thirds of cancers are due to bad luck and one third is due to environmental factors reads as if it were possible to attribute any particular instance to one of the two categories.

    2. @Tom Mueller
      I have noticed the same problem, though mostly with, but not exclusive to creationists. People have some strange psychological problem with chance. One problem I often run into in discussions is people take chance to be synonymous with uncaused. As in, there is no underlying explanation for why it happened the way it did.

      Take the common term "random mutations". Even if we ignore creationists, there are still many people who think the random part of that term means there is no underlying cause for why a particular mutation happened at some particular location in the genome.

      Why did that A turn into a C in the next generation? It was random. Oh, so you're saying it just happened for no reason? No, that is not what random means. What it means is we can only predict that it will happen in terms of probability distributions, it doesn't mean there was no cause of that A changing into a C. There are well-known chemical reasons for why A's are replaced with C's. Sooner or later, maybe a pollutant of some sort that moves around in the cell nucleus through brownian motion, makes it way to a piece of DNA and causes a mutation. Is the path it took without cause? No obviously not. It jitters around because other molecules are bumping into it, forcing it in the direction it takes.
      Why did it take that direction and not another? Well because that's how the other molecules bumped into it.

      There's clearly an actual causal chain that links the individual mutation to a whole host of events and interactions before it. It's just that they can't be predicted because we can't know the state of the system to such an accuracy we can actually make such predictions. As a consequence, we will always be forced to model it in terms of probabilities instead of newtonian billiard-ball mechanics. We describe the outcome as "random" or "stochastic", possibly associated with certain biases in the outcome (that transitions happen more frequently than transversions).

    3. @Simon Gunkel
      "Worst of all the idea that two thirds of cancers are due to bad luck and one third is due to environmental factors reads as if it were possible to attribute any particular instance to one of the two categories."

      I think it's even worse, because it seems there is made a distinction between something having an actual physical and material cause (environmental factors) and the other having no cause at all, as if luck is some kind of force of nature that just makes things happen in some way.

      All mutations have a chemical and physical basis. Short of certain quantum effects, DNA doesn't just sit and spontaneously alters itself without there being a physical cause of such changes.

    4. @ Mikkel Rumraket Rasmussen & Simon Gunkel

      re: P Z Myers' "the importance of luck"

      I am experiencing this dawning and horrible realization that I must be suffering from this very same psychological antipathy which would explain some of my current difficulties to wrap my head around Neutral Theory.

      I am very grateful for the patience and indulgence offered me by others present.

    5. I've got an opposing bias. Everything in nature is both stochastic and chaotic. If you have a model that is deterministic and/or non-chaotic, then you have used LLN and linear approximations. It's on you to show me that the errors arising from these approximations are not too large.

      It's worth noting that evolution is stochastic, no matter whether neutral theory holds or not (the evidence points towards it).

      Here's a simple way to start considering this: Let's say you have a population of N haploid clonal organisms. And we start out with no variation and we assume no novel mutations. As time goes on some of the individuals will die, leaving no offspring, while some will split, leaving two. We look at two events and combine them: The next individual to die and the next individual to split. If population size is roughly constant these steps won't be that far apart in time.
      Since all the individuals are identical by definition, we pick an individual to die from the population with a probability of 1/N. And we pick an individual to split with a probability of 1/N.
      So if we choose one individual, let's call him Bob, we can wonder how likely it is that all individuals at some point will be descendants of Bob. Of course we can find the answer simply from symmetry considerations (at some point all individuals will be descendants of one of our N individuals and they are all the same, so 1/N). But we can look somewhat closer. We will ignore cases where either the next individual to die and the next individual to split are both descendants of Bob, or where neither is. If n individuals are descendants of Bob, i.e. Bobness has a frequency x=n/N then the probability of picking two Bobs is x² and that of picking two non-Bobs is (1-x)². That of picking a Bob to die and a non-Bob to split is x(1-x) and that of picking a Bob to split and a non-Bob to die is x(1-x) as well! So if we discard the cases as discussed above we find that the probability of increasing x by 1/N is 0.5 and that of decreasing x by 1/N is 0.5. Unless x is 0 or 1, in which case it stays the same. We have basically described the population genetics in this case as a series of fair coin tosses.

      Now, so far we have assumed that all individuals are equal, but we can of course introduce selection here. If we note that the fitness of any individual is twice the probability that it will eventually split rather than die and introduce s=ln (mean fitness of some class of individuals we are interested in/mean fitness of all other individuals) then we find that the probability of an increase is 1/(1+e^-s) and likewise for a decrease 1/(1+e^s). We have now found a model using a biased coin rather than a fair coin, with s telling us how strong the bias is.


    6. It's worth noting that the bias tends to be small. Casino dice are tested prior to being used, so that they are rather fair. Because there are constraints to how often you can roll them prior to shipping, small biases can't be eliminated. Most of evolution hinges on biases that are so small that they wouldn't disqualify a die from Vegas (and that's not a contentious statement, the break-off point of 4Ne is generally far below what's accepted there - even cases of strong selection are usually closer to a fair coin than casinos require).

      Neutral theory is about how s is distributed among mutations that actually get fixed. Both neutral theory and previous views were stochastic, but neutral theory states that for a large chunk of substitutions s is so small that it's effect can be neglected. For this we have to diverge a bit. Mutations produce novel variation and through resampling that variation is eventually lost. Selection generally speeds this up (although for diploids, heterozygote advantage can slow it down). That means that the amount of variation found in populations tells us something about the distribution of values for s. Before molecular data was available estimates of how much variation there was relied on identifiable phenotypic traits. When molecular data became available these estimates were revised and s values tended to get closer to 0 - simply because a lot of variation exist for synonymous sites or junk DNA, which do not produce phenotypic effects and therefore weren't taken into account before.

      Now the alternatives to neutral theory basically boil down to heterozygote advantage (and since that doesn't help with haploids, we can have some frequency dependent selection scenarios that do the same thing). But the cases where this is well documented are few and far between and it makes a lot of sense to attribute variation at a synonymous site to the alleles having s close enough to 0 to make no noticable difference - after all the synonymous site doesn't produce a difference in the protein coded for.

    7. Simon, I'm a little confused about your statements. If "neutral theory is about how s is distributed among mutations that actually get fixed", but "the alternatives to neutral theory basically boil down to heterozygote advantage ...." The we seem to have an apples-oranges problem.

      Heterozygote advantage may be a plausible explanation for molecular polymorphism. But not for substitutions (at least not much). Which are we talking about?

    8. I don't see the problem. One of these statements is about how we generally describe the theory and one is a statement about predictions we can derive from it. After all a distribution of s for fixed mutations implies a distribution of s for novel mutations via our models and a distribution of s for novel mutations allows estimates for how much polymorphism there should be.
      IIRC the primary data that led to neutral theory was the observed amount of polymorphism. I'm not sure whether we have data on substitution rates that does not rely on the effective neutrality of at least a part of the genome.

  8. Invoking P Z Myers just now provided pause to reconsider EvoDevo.

    I feel an embarrassing rewrite of one of my high-school assignments coming on.

    Glen’s recent correction of the Demuth Plos paper just cited on this blog provided my “aha moment”.

    This part I get:
    “…most of the [genetic] changes are non-adaptive. First, since most (~90%) of the genome is junk, and most of the differences are located in junk DNA, it follows that most of the new alleles had no effect on function.

    Now please allow me to play devil’s advocate here.

    advocatus diaboli ON

    Yes - ~90% of genomic change occurs within junk DNA. But wouldn’t that mean 90% of C/H differences are entirely uninteresting and unworthy of further consideration?

    Of the remaining ~10% of genomic change, most (we have no way of really determining how much, due to Joe’s citation of 1/4N) is neutral.

    OK… so far so good. So why not say that these presumed neutral alleles are similarly uninteresting from an evolutionary perspective.

    For example, humans can be blonde or red-head, Chimpanzees can’t, so what who cares?! Aren’t such kinds of neutral variation only passingly interesting but unimportant when considering what makes a Chimpanzee different from a Human? (Remember I am playing Devil’s advocate here!)

    That leaves precious little of these genome differences (let’s say less than 1%) that can be attributed to Natural Selection or adaptation.

    So what makes a Chimpanzee different from a Human? Well most of the difference is about junk DNA (uninteresting) and most of the remainder is due to neutral alleles (similarly uninteresting).

    Does that mean mathematical considerations would oblige us to concede that the majority of crucial differences between Chimpanzees and Humans (that made them different species) are due to random stochastic processes and non-adaptionist forces as described by Neutral Theory?

    A Devil’s advocate would declare “no!”

    Crucial differences between Chimpanzees such as opposable thumbs, and possibly also modifications in the ankle or foot that allow humans to walk on two legs not to mention modifications to the throat allowing vocalization could be attributed to changes caused by Natural Selection in just ONE enhancer HACSN1 that clearly were due to selection and not drift.

    Those 13 nucleotide changes in an 81 nucleotide stretch of enhancer represents quite an anomalous mutation rate that could not be attributed to drift.

    In other words, the champions of EvoDevo make a very convincing adaptationist point! Neural Theory is important for getting some evolutionary ball to start rolling – the first few changes to HACSN1 were random and stochastic. But those few changes had a fortuitous selection advantage which brings us right back to an adaptationist full circle!

    Now repeat the above logic with an incredibly small repertoire of enhancer sequences and you have the adaptationist difference between a Chimpanzee and a Human.

    What is even more remarkable, the genetic differences distinguishing Chimpanzees from Humans becomes remarkably small and humbling from an anthropocentric perspective.

    advocatus diaboli OFF

    Such thinking would justify the following activity I designed for my students a few years back before I was even aware of this blog and was even aware of the significance of Neutral Theory.

    Like, I said; I feel an embarrassing rewrite of this assignment coming on.

    I was hoping for some input from others more adept than I.

  9. ps - I did not include, but should have repeated, that Neutral Theory is correct:

    Alleles can be fixed despite negative selection, irrespective of positive selection and in the absence of any selection.

    That said, the Devil's advocate above would remain undeterred, that is the part I am having difficulty with.

  10. Alex should have send his paper to Nature Structural and Moelcular Biology before they published a whole issue on non-coding RNAs (see:

    1. Thanks for the heads up. Reading that review next to ours is very mind numbing. You'll note that they never address issues concerning WHETHER a lncRNA is functional, but then midway through their manuscript they write:
      "The next phase in the annotation of lncRNAs is to determine structure-function relationships" - i.e., what the function IS.

      Another funny paragraph is this one:
      "Subcellular localization of lncRNAs provides important circumstantial information for hypothesis generation. For example, lncRNAs that are associated with chromatin or specific subnuclear domains and have low coding potential are less likely to encode proteins and more likely to have regulatory or structural functions. Where possible, it will be useful to determine and record the cell- and tissue-specific locations of lncRNAs, preferably by in situ hybridization methods as a first step toward determining the distributions and exploring the functional roles of individual lncRNAs."

      The assumption being made here is that subcellular localization and tissue-specific expression imply function. This is exactly what we caution AGAINST.

    2. And to be clear, I'm reffering to the review by Mattick and Rinn in that issue of NSMB.

  11. Hi Larry. Did you see William Provine's new book?