Tuesday, May 12, 2009
The Human Genome Sequence Is not Complete
The latest version of the human genome sequence is called Build 36 or GRCh37. Here's an overview from the Genome Reference Consortium.
The large red triangles represent regions where there is a lot of variability so that no single representation of the genome sequence will describe a majority of humans.
The black regions represent parts of the chromosomes that have not been sequenced and assembled into long stretches (contigs) of reliable sequence. Most of the unsequenced regions are at centromeres, or telomeres, or on the Y chromosome. These regions consist of thousands of copies of highly repetitive DNA. It is impossible to assemble these repetitive sequences.
Scientists are urging that more attention be focused on completing the chimpanzee and macaque genome sequences. We have been waiting a long time for the draft sequences of those genomes to be finished. The explosion of data on the human genome can only be realistically evaluated by comparing it to our closest relatives. (For example, are human non-coding RNAs conserved in primates?)
The fact that the human genome is not complete is not a problem. We know what's in the repetitive sequence regions even though we don't know exactly how it is arranged. The effort required to finish of the last bit is probably not as important as getting a final draft of other sequences.
Sandra Porter wonders Why don't we finish the human genome first?.