Thursday, February 07, 2008

Junk in Your Genome: Pseudogenes

Pseudogenes are non-functional DNA sequences that resemble genes. Much of the DNA related to transposable elements falls into this category. There are ribosomal RNA and tRNA pseudogenes but the term usually refers to sequences that resemble protein-encoding genes.


Genomes & Junk DNA

Total Junk so far

There are two kinds of pseudogenes derived from protein-encoding genes. Those derived from reverse transcription of mRNA and the re-integration of double-stranded DNA into the genome are called "processed" pseudogenes because the mRNA precursor was processed to give mature mRNA before being copied. Consequently, processed pseudogenes do not have introns. They also don't have promoters so they cannot be transcribed.

The other kind of pseudogene arises following a gene duplication event. One of the copies acquires a mutation that inactivates it. This is usually not harmful because the other copy remains intact. It is the fate of most duplicated genes to become a pseudogene by inactivation.

The original meaning of "junk" DNA referred to pseudogenes (reviewed in Gregory 2005) but the term is now used frequently to mean any non-functional DNA. That's the definition I use here.

Ensembl lists 2,081 pseudogenes in the human genome but that's very low compared to other studies [Human Genome]. The number of processed pseudogenes range from several thousand up to 17 thousand (Drouin 2006). The ENCODE project found 118 pseudogenes in their detailed analysis of 1% of the genome (Solovyev et al. 2006). This suggest that there are 11,800 pseudogenes in the entire genome.

A number of studies suggest that the number of processed pseudogenes is approximately the same as the number of inactivated duplicated genes (reviewed in Taylor and Raes 2005). In the case of processed pseudogenes, there are many copies of a relatively small subset of the total number of genes. In other words, lots of genes do not spawn pseudogenes and those that do have many offspring. This is because there is a bias in favor of genes that are highly expressed n the germ line.

The total number of pseudogenes in the genome is likely to be close to the number of genes based on extrapolations from detailed analyses of small segments of the genome or single chromosomes.

If we assume that there are 10,000 processed pseudogenes averaging 2 kb each then this represents 20 Mb or 0.06% of the genome. If there are an equal number of other pseudogenes then this is 10,000 × 60 kb = 600 Mb or 18% of the genome. This is all junk DNA but it overlaps extensively with the junk DNA from transposable elements. It is further evidence that substantial parts of the genome are non- functional but since most of that sequence would be introns in an active gene, it would count as junk DNA even if the gene were active. It's best to just count the inactive exons in order to avoid double counting.

Thus, pseudogenes are about 1.2% of the genome and all of it is junk.1,2

1. A small number of former pseudogenes have been reactivated. They are no longer pseudogenes so they don't count as junk. A small number of pseudogenes have acquired a separate function so they don't count as junk. There do not appear to be very many examples.

2. There are many scientists who have tried to make the case for pseudogenes having some sort of function. The most common speculation is that they serve as an important reservoir of sequence information that can be accessed by recombination and/or re-activation (e.g., Balakirev and Ayala 2003).

Balakirev, E.S. and Ayala, F.J. (2003) PSEUDOGENES: Are They “Junk” or Functional DNA? Ann. Rev. Genet. 37:123-151. [doi:10.1146/annurev.genet.37.040103.103949]

Drouin, G. (2006)Processed pseudogenes are more abundant in human and mouse X chromosomes than in autosomes. Mol. Biol. Evol. 23:1652-1655 [PubMed]

Gregory, T.R. (2005) "Genome Size Evolution in Animals" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).

Solovyev, V., Kosarev, P., Seledsov, I. and Vorobyev, D. (2006) Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7 Suppl 1:S10.1-12 [ PubMed

Taylor, J.S. and Raes, J. (2005) "Small-Scale Gene Duplications" in The Evolution of the Genome. Elsevier Academic Press, New York (USA).


  1. Where you and I part company is your insistence on calling this material "junk". It already has a name: "pseudogene". A pubmed search (all years) on "junk dna" returns 82 records total. A similar pubmed search on "pseudogene(s)" returns 5019 records. In all of 2007 there were 747935 papers indexed by pubmed, and SIX of them use the phrase "junk dna" in the title or abstract. The term is not used because it is not useful. In 2007 there were 17636 papers that use the term "evolution"* in the title or abstract. 5 of them also use the term "junk dna", which is 0.03%. (The one paper from 2007 that uses the term "junk DNA" and not the term evolution* was published in Scientific American.) Of the 5 remaining 2007 papers that use the term "junk dna", a cursory examination of the abstracts indicates that 4 of the 5 use the term to argue against the "junk" actually being "junk".

  2. What word do you use to describe the DNA in our genome that is non-functional and can be deleted without any significant effect on the organism?

    We are not debating whether "junk" is a suitable synonym for "pseudogene." It is not. What we're debating is whether pseudogenes are junk or essential for the organism/species.

    Would you like to debate the idea that much of our genome is junk or are you trying to stifle the debate on the grounds that nobody else thinks it's debatable?

    PubMed searches can be a lot of fun. Here are some other results for the number of papers published in 2007.

    random genetic drift 10
    natural selection 489
    Neutral Theory 26
    adaptationism 0
    Central Dogma 11
    sociobiology 2
    punctuated equilibrium 2
    species sorting 7
    junk + genome 12
    evolution + genome ~3000
    spandrels 0
    Archaea 1164

  3. On the subject of searches, I tend to prefer Web of Knowledge. A search for "junk DNA" turns up 245 papers, 15 of which were in 2007 (one in New Scientist). 12 have "evolution" and "junk DNA" in title or abstract. "Pseudogene" provides 3648 and "pseudogenes" 4293.

    Carry on.

  4. The other kind of pseudogene arises following a gene duplication event.

    "Gene duplication" is a large class, of which processed duplicated genes (or retroposed duplicated genes) are a subclass. I think the term you were looking for, rather than "gene duplication", is DNA-based duplication event. These non-retroposed duplications probably arise via non-allelic recombination.

  5. Love this theme. Makes me think. However I have a issue with Larry being a bit of a know-it-all on this subject. The tone to me suggests that Larry is fanatical about junk DNA because it goes against intelligent design and he like that. However, to prove that this DNA has no function is difficult and I find his stance anti-science.
    I was just reading some recent papers about endo-siRNAs and how some of them are derived from pseudogenes. Genes with complementary pseudogene derived siRNAs are enriched for microtubule associated functions, strongly suggesting that these endo-siRNAs (in this case trans-NAT-siRNAs) have a regulatory function.
    So some pseudogenes serve a regulatory function.

  6. Am I wrong or should it say "600 Mb or 1.8% of the genome" instead of 18%?