Friday, March 22, 2013

Estimating the Human Mutation Rate: Direct Method

This is the fourth in a series of posts on human mutation rates and their implication(s). The first three were ...

What Is a Mutation?
Estimating the Human Mutation Rate: Biochemical Method
Estimating the Human Mutation Rate: Phylogenetic Method

There are basically three ways to estimate the mutation rate in the human lineage. I refer to them as the Biochemical Method, the Phylogenetic Method, and the Direct Method.

The Biochemical Method is based on our knowledge of biochemistry and DNA replication as well as estimates of the number of cell divisions between zygote and egg. It gives a value of 130 mutations per generation. The Phylogenetic Method depends on the fact that most mutations are neutral and that the rate of fixation of alleles is equal to the mutation rate. It also relies on a correct phylogeny. The Phylogenetic Method gives values between 112-160 mutations per generation. These two methods are pretty much in agreement.

The Direct Method involves sequencing the entire genomes of related individuals (e.g. mother, father, child) and simply counting the new mutations in the offspring. You might think that the Direct Method gives a definitive result that doesn't rely on any assumptions, therefore it should yield the most accurate result. The other two methods should be irrelevant.

This would be true if the Direct Method were as easy as it sounds but things are more complicated.

The first paper to be published was by Xue et al. (2009). They looked at the sequences of Y chromosomes from two men separated by 13 generations. (6 generations in one lineage and 7 generations in the other.) The Y chromosomes differed by four mutations in 10.15 × 106 bp.1 These are neutral mutations and the rate works out to 3.0 × 10-8 mutations per base pair per generation.

If we assume an average of 400 cell divisions per generation (male lineage) then this gives a mutation rate of 0.75 × 10-10 mutations per bp per replication. This isn't far from the value of 1.0 × 10-10 that we used in the Biochemical Method.

If we apply this mutation rate to the entire genome then there will be 96 mutations in each sperm cell and 7 in each egg cell for a total of ...
103 mutations per generation


-mutation types
-mutation rates

The problems with this calculation have to do deciding how many real mutations there are. In this particular experiment, the Y chromosomes were extracted from cells in culture. The authors actually found 23 differences between the two Y chromosomes but only 12 of these were confirmed by resequencing. Of these, only four were confirmed by sequencing DNA directly from the donors. (Eight mutations occurred during growth of the cell lines.) The authors are confident that they have not missed any mutations and I suspect that the number of false negatives is, in fact, close to zero.

This value (103 mutations per generation) is on the low end of the values calculated previously but the error bars are significant due to the low number of mutations.

Three other papers have appeared recently.2

1. Roach et al. (2012) sequenced genomes from a family of four (mother, father, two children). They found 33,937 potential mutations but confirmed only 28 mutations in the two children. After making some adjustments for false negatives they estimate that the total average number of mutations per diploid genome per generation was ...
70 mutations per generation
This is about half the value estimated by the Biochemical and Phylogenetic Methods. It's not clear to me how they estimated the true number of mutations. What is clear is that it is not easy to count mutations when dealing with sloppy sequences.

2. Conrad et al. (2011) looked at two sets of parents and offspring (trios). They used cell lines so they had to distinguish between germline mutations and somatic cell mutations. One of the offspring had 49 mutations and the other had 35 mutations. There were 1,586 somatic cell mutations that had to be eliminated. After correcting for false negatives, they estimate 60 mutations in one child and 45 mutations in the other. Since only 2.555 Gb were analyzed, this works out to ...
75 mutations per generation
56 mutations per generation
These values are lower than what we expected from previous studies. The authors determined that 92% of the mutations in one offspring were from the father but only 36% of the mutations in the other trio were from the father. This is not reasonable and neither is the discrepancy in total mutations between the two different offspring. It suggests that there are a lot of errors in this study.

3. The most comprehensive study so far is from Kong et al. (2012). These authors looked at 78 Icelandic families whose genealogies were well known. They sequenced the genomes of 219 distinct individuals and found an average of 63.2 mutations in each child. Since they only looked at 2.63 Gb, this translates to ...
77 mutations per generation
Individual values vary over a wide range. The lowest score reported is 58 and the highest is 129. This study suffers from the same problems as the other two direct sequencing experiments; namely, that it's difficult to decide which of the differences are real mutations and which ones are artifacts. The authors claim that their false negative rate is only 2%.

The whole genome sequencing papers have been widely reported as giving a result that is half the mutation rate we estimated previously. This is a problem because the mutation rate is used in many calculations. We'll discuss the implications in later posts.

1. The Y chromosome is 24 Mb but they couldn't analyze regions of repeats and some other regions weren't well covered.

2. Please let me know if I missed any papers.

Conrad, D.F., et al. (2011) Variation in genome-wide mutation rates within and between human families. Nature Genetics 43:712-715. [doi: 10.1038/ng.862]

Kong, A., et al. (2012) Rate of de novo mutations and the importance of father's age to disease risk. Nature 488:471-475. [doi: 10.1038/nature11396]

Roach, J.C., et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328:636-639. [doi: 10.1126/science.1186802]

Xue, Y., et al. (2009) Human Y chromosome base-substitution mutation rate measured by direct sequencing in a deep-rooting pedigree. Current Biology 19:1453-1457. [doi: 10.1016/j.cub.2009.07.032]


  1. There were also estimates in one of the 1000 genomes project papers (Nature 467, 1061–1073, October 2010). 49 and 35 detected SNPs, giving 1.2 x10^−8 and 1.0 x 10^−8 muts/bp/gen.

  2. Larry, I thought you banned this troll. He is polluting an interesting post with comments that just consist of negative tone and almost no content.

    But let's take the phrase "when genetic mutations go against evolution" as a starting point. This sounds like a non sequitur, but I think the intended meaning was "reversions" (i.e. when a 2nd mutation at the same site reverts the nucleotide back to its original state) - which would make it a valid question (despite the extremely rude phrasing) that can actually contribute to the discussion:

    Reversions are a well understood phenomenon and are for instance important in HIV evolution, where the mutation rate is high and selection is strong. In the studies Larry cited, any reversions that occur will be missed, resulting in underestimation of rate, but the question is whether reversions are common enough to affect the numerical estimates. From the biochemical approach we know that they cannot be _very_ common, so the effect will be minimal (i.e. will not affect the estimates at the level of precision given) unless there is a mechanism causing the rates to be hugely elevated in specific regions. Regions in which the mutation rate is elevated are called mutational hotspots and quantitative estimates of this effect are well established. Perhaps someone has the numbers handy; I would be _very_ surprised if mutational hotspots are strong enough to cause enough reversions to alter these estimates.

    So nice try, John, but the answer is that reversions are not something that any of these researchers will have failed to think about, or that creates a problem for these analyses. Now if only you'd learn to phrase your questions a little more respectfully you might get serious answers from those of us who investigate these things for a living more often. But of course I understand that serious answers are the last thing you are after.

  3. Ah, I see the troll post I was responding to has disappeared in the mean time. My post above is an answer to a potentially legitimate question about reversions. I thought I'd post it in case other readers are wondering about that.

    1. The comments are apparently hand-deleted. So his stuff persisits for a bit.

  4. A new article that you all might find interesting:

    1. I'm not going to discuss mutation rates in mitochondria. It's clear that they are unreliable and they've become irrelevant.

  5. There is a danger of circularity in feeding mutation rates from some of these methods into other calculations (which is why it is useful that there is more than one). The phylogenetic method, for example, uses a fossil-based time for divergence, but if the mutation rate from that is then used to revise the time of the Homo-Pan split ...

  6. it's difficult to decide which of the differences are real mutations and which ones are artifacts

    Not a real issue. Resequencing 100 or sites by a different method is easy and completely eliminates the problem.

    1. In a typical experiment there are over one million potential differences. The most obvious sequencing errors can be eliminated if you have extensive coverage (6 X). But this isn't always the case with short reads.

      One is left with about 40,000 good candidates. I'm glad you think it's easy to resequence all those sites using a different method.

      Of course this doen't help at all with false negatives.

    2. One is left with about 40,000 good candidates

      Can you elaborate? I admit to not being an expert but in no publication or a personal conversation I've come across figure this high. Seriously? "Good candidates"? Good coverage still gets you ~40,000??? Based on what is the number reduced to the typical ~100?

  7. Thanks, Larry, for this useful series of posts. I assume that in the phylogenetic method the authors are allowing for coalescent effects which will lead to a divergence of the gene copies that is greater than the time back to the fork on the species trees. If they do not do that they will get too high a mutation rate.

    Another interesting issue is whether human mutation rates in recent years are higher than they have been in the longer term. The estimated mutation rates would seem to imply too high a mutational load (and creationists have noted this and are crowing that this shows that humans are deteriorating rapidly and could not have been around longer then, oh say, 6008 years). An elevated mutation rate could be due to being in an industrial society, or perhaps even just to having more of our reproduction done by older males than used to be the case. Comparison of mutation rates on branches of the phylogeny that do not include humans would be interesting as a check on whether human mutation rates are elevated over their (pre-)historic levels.

    1. Yet - taken at face value - the direct measures, necessarily recent, give a lower rate than the 'overall' methods that take the full divergent period.

    2. You're right. Still, all these estimates are somewhat too high for the mutational load implied.

    3. I don't know the derivation in detail, but 'harmful' mutations of the order of 2 per individual are often quoted, which gives unreasonable figures for the proportion of the population that should fail to reproduce, and the numbers of offspring viable females would need to produce to offset those losses. However, it's not clear why as many directly harmful mutations should be thought to get through the filter of gametogenesis and early post-fertilisation expression. The proportion of harmful genes expressed late enough to allow the individual to exist as a counted non-reproducer in the population must be small?

    4. I think that most of the "harmful" mutations are often recessives. So I'm not sure that these estimates are too decoupled from what we observe. One study found 31% of all pregnancies spontaneous terminated (22% occuring before clinical confirmation of pregnancy). 95% of the couples in the study went on to have a child within 2 years, so it was not an effect specific to low-fertility couples.

      Another study claims that ~60% are due to chromosomal abonormalities, leaving 40% as undiagnosed. There is a lot of room where deleterious mutations could be involved.

    5. Larry is missing an important study "Estimating the human mutation rate using autozygosity in a founder population (2012)," which I'm surprised Joe Felsenstein didn't mention because it's from his department.

      I would say this paper is more robust because it didn't just look at the bottom of the tree. I think there were regions of 100Mb autozygosity in the Hutterites studied but still some uncertainty as to what constitutes the common ancestor with respect to these regions.

    6. I think that most of the "harmful" mutations are often recessives

      Perhaps - recessives are certainly capable of getting through the filter; I don't think there is a mechanism constraining new mutations to be recessive?

      But I think the assumption may nonetheless be a little high. On a rate of 130 per individual, 1/75th of mutations being adjudged 'harmful' would indicate that 1.33% of bases cannot tolerate a SNP without harm, or, spreading the risk out (and assuming 96% junk) that 1/3rd of all SNPs in non-junk are harmful.

    7. 1. Wouldn't recessives be far more common, though, since if you still have one working copy for a gene....

      2. Isn't a big portion of that 'junk' structural, meaning that mutations to it could be deleterious?

  8. We had a discussion about evolution at my work lunch the other day. An evolution "skeptic" made a claim that there are never mutations in humans. A quick Google search on my phone brought up this article so I was able to rebut him.
    Great read, thanks!

  9. Wow! This material relates directly to my research paper about Genetic Diversity among Living Humans! The most interesting thing about your review Dr. Moran is citing Kong's 2012 study and the age dependence of the mutation rate. Very riveting and state of the art work.