How Many Differences?
You can estimate the total number of single nucleotide differences by measuring the rate of hybridization of human and chimpanzee DNA in a technique developed by Dave Kohne and Roy Britten over forty years ago. This technique was applied to human and chimp DNA and the results indicated that the two genomes differed by about 1.5% (reviewed in Britton, 2002). That corresponds to 45 million bp in a genome of 3 billion bp.
This value of 1.5%, rounded up to 2%, gave rise to the widely quoted statement that humans and chimps are 98% identical. Britton (2002) challenged that number by pointing out that humans and chimp genomes differed by a large number of insertions and deletions (indels) that could not have been detected in hybridization studies. He claimed that there was an addition 3.4% of the genome that differed due to indels. That means the the real difference between humans and chimps is closer to 5% and we are only 95% identical!
Much of the difference is due to insertion and deletion of members of gene families. One study shows that the human genome has 689 genes not present in the chimp genome and chimps have 729 genes not present in humans [Mammalian Gene Families: Humans and Chimps Differ by 6%]. That's a total of 1,418 complete genes that are only found in one of the species.
At first glance this looks like 689 completely new genes have evolved in the human lineage since it diverged from our common ancestor with chimpanzees but looks can be deceiving. These genes are members of gene families and all that's happened is that 689 orthologous genes have
Much better date is available today than in 2002 when Britten wrote his paper. We now know by direct comparison that there are at least 30 million single nucleotide differences between human and chimp genomes. There are about 90 million base pair differences as insertion and deletions (Margues-Bonet et al., 2009). The indels (insertions and deletions) may only represent 90,000 mutational events if the average length of an insertion/deletion is 1kb (1000 bp). In fact, more than 75% of indels are less than 5 bp (Britton 2002) so the actual number of mutational events is in the millions. Many of these are undoubtedly due to sequence errors. The latest studies indicate that humans and chimps differ by only 26,500 large indels (>80 bp) (Polavarapu et al., 2011). To a first approximation, the single nucleotide differences are a good measure of the total number of mutational events that have occurred in the two lineages. (underlined portion added on Jan. 25, 2012 - LAM)
It's worth noting that many of the differences between the human and chimp genomes are polymorphic within their respective populations. In other words, the variant alleles have not become fixed in the population. This affects the calculations of mutation rate since that calculation assumes that an allele has become fixed in the population by random genetic drift.
The polymorphisms include SNPs, of course, and that's the basis of many studies that look for specific haplotypes associated with disease. At least one of the variants at a given polymorphic locus in humans will be different from the nucleotide in the chimp reference genome. Deletions in the human and chimp genomes can also be polymorphic. Copy number variants (CNVs) in humans have been characterized in a number of studies (Campbell et al. 2011). In terms of total nucleotides, there is more variation in copy number than in single nucleotide polymorphisms (Alkan et al., 2011).
Are the Differences Neutral?
We would like to know if the differences between the human and chimp genomes are neutral alleles or if natural selection has played an important role in fixing these differences. Nobody doubts that many of the changes we see are adaptive in one or other of the lineages but can we recognize those important adaptive changes in a sea of possible neutral changes?
Several lines of evidence suggest that most of the changes are non-adaptive. First, since most (~90%) of the genome is junk, and most of the differences are located in junk DNA, it follows that most of the new alleles had no effect on function.
Second, if we look at the pattern of changes this is what we see for one of the human chromosomes.
The percent identity between humans and chimps fluctuates between 98% and 99% identity and the differences are pretty evenly scattered throughout chromosome 7. Remember, most of that DNA is junk.
Calculating the rate of evolution in terms of nucleotide substitutions seems to give a value so high that many of the mutations must be neutral ones.
Motoo Kimura (1968)The third line of evidence has to do with the mutation rate and fixation in the two lineages. The mutation rate in humans is about 130 mutations per generation based on our knowledge of the biochemistry of DNA replication [Mutation Rates]. A value that's consistent with recent direct measurements [Human Y Chromosome Mutation Rates] [Direct Measurement of Human Mutation Rate]. Michael Lynch (2010) bases his estimate of human mutation rates on a number of other studies. He comes up with a value of about 80 new mutations per generation.
In an evolving population the rate of fixation of neutral alleles is equal to the mutation rate [Random Genetic Drift and Population Size]. How many mutations would we expect in the human lineage since it diverged from a common ancestor with chimpanzees if all of the fixed alleles were neutral? The two species diverged about 5 million years ago. The average generation time in the human lineage is about ten years, so that means 500,000 generations. If the rate of mutation is about 100 new mutations per generation, then we would expect to see about 50 million new mutations in the human lineage. The actual number is about 22.5 million (half of 45 million). We're certainly in the right ballpark.
The actual mutation rate may be lower than we calculate.
We're certainly safe in concluding that the number of differences between humans and chimps is consistent with Neutral Theory and we should accept this as the null hypothesis.
Alkan C, Coe BP, Eichler EE. (2011) Genome structural variation discovery and genotyping. Nat Rev Genet. 12:363-376. [PubMed]
Britton, R.J. (2002) Divergence between samples of chimpanzee and human DNA sequences if 5%, counting indels. Proc. Natl. Acad. Sci. (USA) 99:13633-13636.
Campbell, C.D., Sampas, N., Tsalenko, A., Sudmant, P.H., Kidd, J.M., Malig, M., Vu, T.H., Vives, L., Tsang, P., Bruhn, L., and Eichler, E.E. (2011) Population-genetic properties of differentiated human copy-number polymorphisms. Am J Hum Genet. 88:317-32. [PubMed]
Marques-Bonet, T., Ryder, O.A., and Eichler, E.E. (2009) Sequencing primate genomes: what have we learned? Annu. Rev. Genomics Hum. Genet. 10:355-386. [PubMed]
Lynch, M. (2010) Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. (USA) 107:961-968. [PubMed]
Polavarapu, N., Arora, G., Mittal, V.K., McDonald, J.F. (2011) Characterization and potential functional significance of human-chimpanzee large INDEL variation. Mob. DNA 2:13. [PubMed] [doi:10.1186/1759-8753-2-13]