In addition to this basic science, the analysis of multiple human genomes can be used to map genetic disease loci through association of various haplotypes with disease. The technique is called genome wide association studies (GWAS). The same technology can be used to map other phenotypes to identify the genes responsible.
The 1000 Genomes Project Consortium has just published their latest efforts in a recent issue of Nature (Oct. 1, 2015) (The 1000 Genomes Project Consortium, 2015; Studmant et al., 2015). They looked at the genomes of 2,504 individuals from 26 different populations in Africa, East Asia, South Asia, Europe, and the Americas.
The idea is to identify variants that are segregating in humans. Single nucleotide polymorphisms (SNPS) are difficult to identify because the error rate of sequencing is significant. When comparing a new genome sequence to the reference genome you don't know whether a single base change is due to sequencing error or a genuine variant unless you have a high quality sequence. Most of the 2,504 genome sequences are not of sufficiently high quality to be certain that the false positive rate is low but by sequencing multiple genomes it becomes feasible to identify variants that are shared by more that one individual within a population.
Recall that every human genome has about 100 new mutations so that even brothers and sisters will differ at 200 sites. The 1000 Genomes Consortium looks at the frequency of alleles in a population to determine whether the genetic variation is significant. They use a preliminary cutoff of 0.5%, which means that a variant (mutation) has to be present in 5 out of 1000 genomes in order to count as a variant that's segregating within the population. They estimate that 95% of SNPs meeting this threshold are true variants. For small insertions and deletions the accuracy is about 80%.
For variants at lower frequency, additional sequencing to a depth of >30X coverage was done and the putative variant was compared against other databases of genetic variation. The predicted accuracy of variants at 0.1% frequency is about 75%.
Given those limitations, the results of the studies are very informative. Looking at single base pair changes and small indels (insertions and deletions), the typical human genome (yours and mine) differs from the standard reference genome at about 4.5 million sites. That's about 0.14% of our genomes. Humans and chimpanzees differ by about 1.4% or ten times more.
SNPs and small indels account for 99.9% of variants. The others are "structural variants" consisting of; large deletions, copy number variants, Alu insertions, LINE L1 insertions, other transposon insertions, mitochondrial DNA insertions (NUMTS), and inversions. The typical human genome has about 2,300 of these structural variants of which about 1000 are large deletions.
Most of these variants are in junk DNA regions but the typical human genome carries about 10-12,000 variants that affect the sequence of a protein. Many of these will be neutral and some of the ones that have a detrimental effects will be heterozygous and recessive. The average person has 24-30 variants that are associated with genetic disease. (These are known detrimental alleles. If you get your genome sequenced, you will learn that you carry about 30 harmful alleles that you can pass on to your children.)
The Consortium reports that the the typical genome has variants at about 500,000 sites mapping to untranslated regions of mRNA (UTRs), insulators, enhancers, and transcription factor binding sites. I assume they are using the ENCODE data here so we need to take it with a large grain of salt. Most of these sites are not biologically relevant.
As expected, common variants are distributed in populations all over the world. These are the result of mutations that arose several hundred thousand years ago and reached significant frequencies before the present-day populations separated. However, 86% of all variants are restricted to a single continental group. These are the result of mutations that occurred after the present-day populations split.
The African populations contain more genetic variation than the Asian and European populations. Again, this is is expected since the European and Asian groups split from within the African group after Africans had been evolving on that continent for thousands of years. The differences are not great—Africans differ at about 4.3 million SNPs while the typical Europeans and Asian differ at only 3.5 million SNPs.
Only a small number of loci show evidence of selective sweeps, or recent selection (adaptation). It indicates that most of the differences between local ethnic groups are not associated with adaptation. The exceptions are SLC24A5 (skin pigmentation), HERC2 (eye color), LCT (lactose tolerance), and FADS (fat metabolism).
Sudmant, P.H., Rausch, T., Gardner, E.J., Handsaker, R.E., Abyzov, A., Huddleston, J., Zhang, Y., Ye, K., Jun, G., Hsi-Yang Fritz, M., Konkel, M.K., Malhotra, A., Stutz, A.M., Shi, X., Paolo Casale, F., Chen, J., Hormozdiari, F., Dayama, G., Chen, K., Malig, M., Chaisson, M.J. P., Walter, K., Meiers, S., Kashin, S., Garrison, E., Auton, A., Lam, H.Y.K., Jasmine Mu, X., Alkan, C., Antaki, D., Bae, T., Cerveira, E., Chines, P., Chong, Z., Clarke, L., Dal, E., Ding, L., Emery, S., Fan, X., Gujral, M., Kahveci, F., Kidd, J.M., Kong, Y., Lameijer, E.-W., McCarthy, S., Flicek, P., Gibbs, R.A., Marth, G., Mason, C.E., Menelaou, A., Muzny, D.M., Nelson, B.J., Noor, A., Parrish, N.F., Pendleton, M., Quitadamo, A., Raeder, B., Schadt, E.E., Romanovitch, M., Schlattl, A., Sebra, R., Shabalin, A.A., Untergasser, A., Walker, J.A., Wang, M., Yu, F., Zhang, C., Zhang, J., Zheng-Bradley, X., Zhou, W., Zichner, T., Sebat, J., Batzer, M.A., McCarroll, S.A., The Genomes Project, C., Mills, R.E., Gerstein, M.B., Bashir, A., Stegle, O., Devine, S.E., Lee, C., Eichler, E.E., and Korbel, J.O. (2015) An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571), 75-81. [doi: 10.1038/nature15394]
The Genomes Project Consortium (2015) A global reference for human genetic variation. Nature, 526(7571), 68-74. [doi: 10.1038/nature15393]