More Recent Comments

Wednesday, February 10, 2021

The 20th anniversary of the human genome sequence:
4. Functional DNA in our genome

We know a lot more about the human genome than we did when the draft sequences were published 20 years ago. One of the most important discoveries is the recognition and extent of true functional sequences in the genome. Genes are one example of such functional sequence but only a minor component (about 1.4%). Most of the functional regions of the genome are not genes.

Here's a list of functional DNA in our genome other than the functional part of genes.

  • Centromeres: There are 24 different centromeres and the average size is four million base pairs. Most of this is repetitive DNA and it adds up to about 3% of the genome. The total amount of centromeric DNA ranges from 2%-10% in different individuals. It's unlikely that all of the centromeric DNA is essential; about 1% seems to be a good estimate.
  • Telomeres: Telomeres are repetivie DNA sequences at the ends of chromosomes. They are required for the proper replication of DNA and they take up about 0.1% of the genome sequence.
  • Origins of replication: DNA replication begins at origins of replication. The size of each origin has not been established with certainlty but it's safe to assume that 100 bp is a good estimate. There are about 100,000 origin sequences but it's unlikely that all of them are functional or necessary. It's reasonable to assume that only 30,000 - 50,000 are real origins and that means 0.3% of the genome is devoted to origins of replication.
  • Regulatory sequences: The transcription of every gene is controlled by sequences that lie outside of the genes, usually at the 5′ end. The total amount of regulatory sequence is controversial but it seems reasonable to assume about 200 bp per gene for a total of five million bp or less than 0.2% of the genome (0.16%). The most extreme claim is about 2,400 bp per gene or 1.8% of the genome.
  • Scaffold attachment regions (SARs): Human chromatin is organized into about 100,000 large loops. The base of each loop consists of particular proteins bound to specific sequences called anchor loop sequences. The nomenclature is confusing; the original term (SAR) isn't as popular today as it was 35 years ago but that doesn't change the fact that about 0.3% of the genome is required to organize chromatin.
  • Transposons: Most of the transposon-related sequencs in our genome are just fragments of defective transposons but there are a few active ones. They account for only a tiny fraction of the genome.
  • Viruses: Functional virus DNA sequences account for less than 0.1% of the genome.

If you add up all the functional DNA from this list, you get to somewhere between 2% and 3% of the genome.

Image credit: Wikipedia.


  1. If I understand correctly when you say "functional" you mean "functional but non-coding?" You say "Genes are one example of such functional sequence but only a minor component." The protein coding DNA is 1.8% according to your post "What's in your genome?" (2011). 1.8% does not sound like a "minor component" compared to 2-3%. Am I missing something?

    1. The latest estimate for protein-coding regions is about 0.8% and a generous estimate for noncoding genes is 0.6% for a total of 1.4%. I added that estimate to the post.

      I'm proposing that the total amount of functional DNA might be as high as 10%. That's why known genes are only a minor percentage although, admittedly, we could quibble about whether 14% is minor.

  2. I assume that the extra centromere among 23 chromosomes is the second centromere on chromosome #2. I was wondering if there were more of these, but I guess not.
    Some might balk at referring to virus and transposon insertions as being 'functional'. 'Active' might be a better term for those.

    1. There are 23 autosomal centromeres plus one X-chromosome and one Y-chromsome. The reference genome contains a copy of each sex chromosome.

      There are about a dozen degenerate centromeres in our genome. That's part of the reason why not all centromeric-type DNA is counted as functional.

      It's difficult to draw an exact line between functional and non-functional. If we assmume, as I usually do, that DNA is junk if it can be deleted without harming the organism, then the reverse transcriptase genes in many transposons and viruses are junk. So are many other protein-coding genes in our genome. I'm a little uncomfortable with labeling an active gene as junk so I'm trying to avoid that particular battle in the function wars.

  3. It would seem to me that a working transposon isn't a functional sequence, not being under purifying selection, and that a few broken transposons probably are, just as any mutation might result in a functional sequence.