More Recent Comments

Monday, November 24, 2025

Evolution explains the differences between the human and chimpanzee genomes

If you align similar regions of the human and chimpanzee genomes they turn out to be about 98.6% identical in nucleotide sequence. The total number of differences amount to 44 million base pairs (bp). If the differences are due to mutations that have occurred since divergence from a common ancestor, then there would be 22 million mutations in each lineage.

The mutation rate is approximately 100 new mutations per generation. Most of these will be neutral mutations that have no effect on the survival of the individual and almost all of them will be lost within a few generations. A small number of these neutral mutations will become fixed in the population and it's these fixed mutations that produce most of the changes in the genome of evolving populations. According to the neutral theory of population genetics, the number of fixed neutral mutations corresponds to the mutation rate. Thus, in every evolving population there will be 100 new fixed mutations per generation.

This means that fixation of 22 million mutations would take 220,000 generations. The average generation time of humans and chimps is 27.5 years so this corresponds to about 6 million years. That's close to the time that humans and chimps diverged according to the fossil record. What this means is that evolutionary theory is able to explain the differences in the human genome—it has explanatory power. It could have been falsified if the differences between the human and chimp genomes were quite different.

There is no other explanation that accounts for the data.

Background

The technology for sequencing proteins was developed by Fred Sanger in the 1950s and got him his first Nobel Prize in 1958. By the beginning of the 1960s homologous proteins such as hemoglobin and cytochrome c had been sequenced from a number of different species. The amino acid sequences of these proteins could be aligned and it soon became evident that the proteins from some species were much more similar than the proteins from other species. Furthermore, the similarities seemed to correspond to the inferred evolutionary relationships.

It was possible to construct trees showing the relationship between those amino acid sequences. One of the earliest trees was published by Emanual Margoliash using cytochrome c proteins (Margoliash, 1963). The figure (above) illustrates the relationship of the various sequences from different species. The numbers of the branches represent the number of different amino acids between the sequences at the tips of the branches and the closest node. (I'm showing a later version here from Fitch and Margoliash (1967). This is a very famous tree that's found in many textbooks. The version shown here is from Mulligan (2008).)

Note that there's only one difference between the sequence of the human protein and that of the monkey and the are many more differences between the human and other mammals. The differences between humans and insects and fungi are even greater. This strongly suggests that what we're looking at is an evolutionary relationship beween species. The remarkable and unexpected result is that the number of changes seems to correspond to the time of divergence of these various species and Margoliash noted that this relatively constant rate of change over time is what makes it possible to construct a robust tree.

Similar trees were constructed from hemoglobin sequences by Emile Zuckerkandl and Linus Pauling and they were the first ones to use the term "molecular clock" to identify the relatively constant rate of change in amino acid seqeences over time. The history of this idea is fascinating and I strongly recommend reading Gregory Morgan's article from 1998 (Morgan, 1998). The molecular lock concept is clearly one of the most important discoveries in evolution in the last half of the 20th century.

Explaining the molecular clock was challenging in the early 1960s since it didn't seem to be consistent with evolution by natural selection. If all the amino acids substitutions are due to beneficial alleles that become fixed by natural selection in each species then there didn't see to be any obvious reason why such changes should be constant over long periods of time. The explanation came from the development of the neutral theory of evolution by Kimura and others in the late 1960s.

The neutral theory was based on observations that most changes in the amino acid sequences of proteins were neutral with respect to fitness. This meant that fixation of neutral alleles was due to random genetic drift. Population genetics had shown that the rate of fixation was dependant only on the mutation rate so that as long as the mutation rate per generations is relatively constant over time then there should be a relatively constant rate of change giving rise to an approximate molecular clock. [The Modern Molecular Clock]

By the late 1970s, the concept of a molecular clock had been extended to RNA sequences, especially ribosomal RNAs (RNA). Since rRNA sequences are very similar in all species this enabled scientists to construct very large trees that included all known species. It was this data that led of the discovery of two different Kingdoms of bacteria: Bacteria and Archaea.

Later on the comparisons used whole genomes and the extra data enabled more precise estimations of divergence times.

Now let's look at the data used to explain the difference between the human and chimp genome.

Percent Similarity

I used 98.6% similar in the calculation. This value is based on aligning about 2.1 billion base pair similar regions in the two genomes. It does not account for regions that do not align because of duplications, insertions, and deletions. There are about 26,000 regions that don't align and they range in size from just a few base pairs to over 1000 bp. If you count all the diferences in those duplications, insertions, and deletions, then you can get percent differences of much less than 98.6%.

This is deceptive because what we are interested in is the mumber of mutations that have occurred and a 1000 bp insertion is not 1000 different mutations; it's only one mutation. This is why the percent similarity in aligned regions is much closer to a true estimate of the number of mutations. [What's the Difference Between a Human and Chimpanzee?]

The earliest data on the difference between the human and chimp genomes depended on the rate of hybridization of DNA from each species and gave rise to the common view that the DNA from the two species is 98.5% similar (see Britten, 2002). The first direct comparison of substantial amounts of human and chimp genome sequences indicated a difference of 1.4% (Britten, 2002). The first sequenced chimpanzee genome showed a diffence of 1.23% (The Chimpanzee Sequencing and Analysis Consotrium, 2005). Subsequent analyses indicated a range of 1.1-1.4% (ogers and Gibbs, 2014).

Genome Size

In order to calculate the total number of mutations you have to multiply the 1.4% difference by the size of the genome. The best current estimate of the human genome size is 3.1 billion bp and it's reasonable to assume that the final version of the chimpanzee genome will be close to this size. (0.014 × 3.1 × 109 = 43.4 × 106; I rounded up to 44 million)

Mutation Rate

There's a lot of data on the mutation rate in humans. Different papers give values that cluster around 100 mutations per generation or slightly less. I used 100 mutations per generation for simplicity. The calculated time of divergence doesn't change very much with slighlty difference values. [Parental age and the human mutation rate] [Human mutation rates] [Human mutation rates - what's the right number?]

Generation Time


1. Assuming a genome size of 3.1 billion base pairs (bp) in each species. 1.4% × 3.1 × 109 bp = 43.4 million bp, rounded up to 44 million to simplify calculations.

2. A mutation rate of about 100 mutations per generation is consistent with all the data from various sources but many scientists emphasize the direct measurements which tend to give a somehwat lower value. The difference doesn't seriously affect the overall calculation—any reasonable rate is still consistent with a divergence time of 5-7 million years. [See Parental age and the human mutation rate.]

Britten, R.J. (2002) Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. Proceedings of the National Academy of Sciences 99:13633-13635. [doi: 10.1073/pnas.172510699]

Fitch, W.M. and Margoliash, E. (1967) Construction of phylogenetic trees. Science 155:279–284.

Margoliash, E. (1963) Primary structure and evolution of cytochrome c. Proc. Natl. Acad. Sci. USA 50:672-679.

Morgan, G. (1998) Emile Zuckerkandl, Linus Pauling, and the Molecular Evolutionary Clock, 1959-1965. J. Hist. Biol. 31:155-178. [PDF]

Mulligan, P.K. (2008) Proteins, evolution of in AccessScience, ©McGraw-Hill Companies.

Rogers, J. and Gibbs, R.A. (2014) Comparative primate genomics: emerging patterns of genome content and dynamics. Nature Reviews Genetics 15:347-359. doi: doi:10.1038/nrg3707 Sequencing, T.C. and Consortium, A. (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437:69-87. doi: 10.1038/nature0407

17 comments :

Joe Felsenstein said...

... except for 5% or so of the genome which is subject to purifying selection, which greatly reduces the rate of substitution, and sometimes advantageous mutations, which can increase the rate of substitution.

Mehrshad said...

Neutral theory had never have ( and still don't have ) any good scientific evidence. Kimura just proposed it to save Darwin's theory from Haldane dilemma ( or the wating time problem ) which speculates enormous time for cooperative mutation to come out and being fixed by mutation and natural selection. Specially for Humans who have low population size

Anonymous said...

Light gonna be here any minute now talkin some nonsense bout humans evolving from Neanderthals and their 99.7% similar DNA.

Light said...

How does a constant mutation rate square with punctuated equilibrium?

John Harshman said...

You might as well ask how apples square with aardvarks. With every post, you reveal more of your ignorance of evolution.

Light said...

Nice try John.

John Harshman said...

You keep using those words. I do not think they mean what you think they mean.

Light said...

How does a constant mutation rate square with punctuated equilibrium?
Is it because the mutation rate is at the molecular level and punctuated equilibrium is at the organism level? And different rules apply at each?

Larry Moran said...

@Joe Felsenstein: Let's assume that about 10% of the genome is functional and therefore it is under purifying selection. This means that the rate of substitution in those regions is less than the neutral rate.

It does not mean that there are no substitutions at all in those regions. I don't know the average neutral substitution rate in functional regions of the genome. Do you have a good estimate of that value? For example, what percentage of mutations in typical coding regions are effectively neutral?

I get your point that less than 100% of the human and chimp genomes are evolving at the neutral rate but I question whether this "greatly reduces the rate of substitution." Do you have data on this? It seems to me that this is a minor effect that doesn't make much difference to the big picture given the probable error rates associated with the other variables.

What I'm more worried about is whether the sequences of the standard reference genomes really represent the alleles that have become fixed in the populations. I'm certain that this isn't correct because both the human and chimp populations carry an enormous amount of variation at millions of sites. But can we use the standard reference genomes anyway because the "noise" cancels out in the end?

I'd like to incorporate the answer to that question in the post. In fact, that's why I've delayed putting up this post for more than 10 months. I posted the incomplete version because many creationists are making a big deal about the difference between the chimp and human genomes and they (and most defenders of evolution) seem to be unaware of the correct explanation for most of the differences.

Larry Moran said...

From time to time I learn some new facts about biology that seem to conflict with what I thought I knew. My first reaction to that apparent conflict is to assume that my understanding was incorrect so I do some research to see whether that's true.

For example, let's say I learn for the first time that there's a relatively constant rate of mutation due to the intrinsic error rate of DNA replication and repair. This seems to be in conflict with my understanding of punctuated equlibria. My reaction would be to go back and study punctuated equilibria to see if I understood it correctly.

The creationist reaction is to assume that scientists are stupid and evolution is wrong.

John Harshman said...

Just to clarity: is a constant rate of mutation really in conflict with your understanding of punctuated equilibria? And when you say "mutation" do you really mean "substitution" or "fixation"? Of course there's no conflict in any case, given that most of your genome is junk, and a punctuation event involves only a small number of alleles. And of course most evolutionary biologists don't think PE is a thing anyway.

Joe Felsenstein said...

Larry, to your concern "I get your point that less than 100% of the human and chimp genomes are evolving at the neutral rate but I question whether this "greatly reduces the rate of substitution."

Sorry, I meant greatly reduces the rate of substitution in those (functional) regions of the genome, not in the wholee genome. I wasn't clear enough.

Light said...

How does a constant mutation rate square with punctuated equilibrium?
Is it because the mutation rate is at the molecular level and punctuated equilibrium is at the organism level? And different rules apply at each?
Selection applies at the organism level and genetic drift applies at the molecular level.

Light said...

As a sidenote: that is what I would predict. Different levels with different rules.

John Harshman said...

I detect a slight glimmer of understanding here, which ought to be encouraged. It's not the mutation rate that should concern us but the fixation rate, and those are the same only under neutrality. Selection is primarily at the organism level, with the entire genotype contributing. Punctuated equilibria, if that were really a thing, would be a population-level phenomenon but would also involve selection at the individual level. Genetic drift also happens at the population level, since it's about changes in allele frequencies, and of course those changes are the summations of individual reproductive success at the individual level. Anyway, it's the same rules at the same level, more or less. It's just that only a small part of the genome is under selection, and even in those parts, changes are mostly neutral.

Larry Moran said...

I will continue to delete all irrelevant comments. Don't bother commenting unless you have something substantive to contribute.

Larry Moran said...

There is no mechanism on blogspot for banning Light/Doug Dobney. I can't prevent him from posting comments so what I will do is delete all of his comments as soon as I see them. There's no point in retaining the comments of anyone who replies to him because the context is missing.

I strongly recommend that you do not feed the troll or encourage him by responding to his ridiculous comments.