More Recent Comments

Sunday, October 15, 2023

Only 10.7% of the human genome is conserved

The Zoonomia project aligned the genome sequences of 240 mammalian species and determined that only 10.7% of the human genome is conserved. This is consistent with the idea that about 90% of our genome is junk.

The April 28, 2023 issue of science contains eleven papers reporting the results of a massive study comparing the genomes of 240 mammalian species. The issue also contains a couple of "Perspectives" that comment on the work.

There's much to learn in these papers but I'm going to concentrate on what they tell us about the amount of functional DNA in the human genome and how that relates to junk DNA. The important paper is:

Christmas, M.J., Kaplow, I.M., Genereux, D.P., Dong, M.X., Hughes, G.M., Li, X., Sullivan, P.F., Hindle, A.G., Andrews, G. and Armstrong, J.C. et al. (2023) Evolutionary constraint and innovation across hundreds of placental mammals. Science 380:366. doi: 10.1126/science.abn3943

Evolutionary constraint and acceleration are powerful, cell-type agnostic measures of functional importance. Previous studies in mammals were limited by species number and reliance on human-referenced alignments. We explore the evolution of placental mammals, including humans, through reference-free whole-genome alignment of 240 species and protein-coding alignments for 428 species. We estimate 10.7% of the human genome is evolutionarily constrained. We resolve constraint to single nucleotides, pinpointing functional positions, and refine and expand by over seven-fold the catalog of ultraconserved elements. Overall, 48.5% of constrained bases are as yet unannotated, suggesting yet-to-be-discovered functional importance. Using species-level phenotypes and an updated phylogeny, we associate coding and regulatory variation with olfaction and hibernation. Focusing on biodiversity conservation, we identify genomic metrics that predict species at risk of extinction.

The authors aligned the genomes of 240 different species and discovered that only 11% of the human genome aligned to at least 95% of the mammalian genomes. They estimate that 332Mb of human DNA sequence is constrained by purifying selection. This corresponds to 10.7% of the genome using 3.1Gb as the size of the human genome.1

This is not a particularly surprising result since previous studies have shown that somewhere between 5% and 12% of the human genome is conserved.2

There's a lot of controversy over the number of functional non-coding genes in the human genome. The latest Ensembl annotation lists 25,959 non-coding genes of which 18,882 are presumed to be genes for lncRNAs. There is very little evidence to support the claim that all these genes are functional non-coding genes and most of them are not conserved (Ponting, 2017). It would have been very helpful if the authors of this paper had discussed the conservation of potential non-coding genes but they don't mention it. They don't even talk about the alignment of the many clusters of ribosomal RNA genes. I wonder why they didn't address this very important issue?

Junk DNA

Let's remember what Elizabeth Pennisi wrote in Science when the 2012 ENCODE results were published [Science Writes Eulogy for Junk DNA].

This week, 30 research papers, including six in Nature and additional papers published by Science, sound the death knell for the idea that our DNA is mostly littered with useless bases. A decadelong project, the Encyclopedia of DNA Elements (ENCODE), has found that 80% of the human genome serves some purpose, biochemically speaking. “I don't think anyone would have anticipated even close to the amount of sequence that ENCODE has uncovered that looks like it has functional importance,” says John A. Stamatoyannopoulos, an ENCODE researcher at the University of Washington, Seattle.

Beyond defining proteins, the DNA bases highlighted by ENCODE specify landing spots for proteins that influence gene activity, strands of RNA with myriad roles, or simply places where chemical modifications serve to silence stretches of our chromosomes. These results are going “to change the way a lot of [genomics] concepts are written about and presented in textbooks,” Stamatoyannopoulos predicts.

Given all the hype about junk DNA back in 2012, you would expect that the scientists who did this study should be well aware of the controversy over junk DNA. They are well aware of the fact that their results are compatible with a human genome that's 90% junk and not compatible with a genome that's mostly functional. In spite of this history, the word "junk" does not appear in their paper and there is no attempt to explain why 90% of the genome is not conserved. None of the commentaries mention the junk DNA controversy either. Isn't that strange?

1. I'm pretty sure that the multiple sequence alignment was done by some computer program and I'm skeptical about the quality of such an alignment. There has been a lot of shuffling of DNA during the evolution of mammals so alignment will involve non-contiguous stretches of DNA from most species. I think it would be hard to determine the various breakpoints since most of them will occur in junk DNA. Also, the number and size of gaps is important and I don't know what criteria were used. For example, if two 8bp transcription factor binding sites are located within 10-20bp in two different species, are gaps introduced to bring them into register and count the binding site as conserved?

2. I'm using the word "conserved" in the strict sense where it's a synonym for evolutionarily constrained. "Constrained" is better but most readers won't be familar with that term. Some people use the word "conserved" as a synonym for highly similar as in "92% of the human genome sequence is conserved in chimpanzees." Strictly speaking, this is not due to conservation in the constrained sense since it merely reflects the fact that 92% of the genome hasn't yet had time to diverge (Haerty and Ponting, 2014; Ponting, 2017).

Haerty, W. and Ponting, C.P. (2014) No Gene in the Genome Makes Sense Except in the Light of Evolution. Annual review of genomics and human genetics 15:71-92. doi: 10.1146/annurev-genom-090413-025621

Ponting, C.P. (2017) Biological function in the twilight zone of sequence conservation. BMC biology 15:1-9. doi: 10.1186/s12915-017-0411-5

No comments :