Friday, August 12, 2022

The surprising (?) conservation of noncoding DNA

We've known for more than half-a-century that a lot of noncoding DNA is functional. Why are some people still surprised? It's a puzzlement.

A paper in Trends in Genetics caught my eye as I was looking for somethng else. The authors review the various functions of noncoding DNA such as regulatory sequences and noncoding genes. There's nothing wrong with that but the context is a bit shocking for a paper that was published in 2021 in a highly respected journal.

Leypold, N.A. and Speicher, M.R. (2021) Evolutionary conservation in noncoding genomic regions. TRENDS in Genetics 37:903-918. [doi: 10.1016/j.tig.2021.06.007]

Humans may share more genomic commonalities with other species than previously thought. According to current estimates, ~5% of the human genome is functionally constrained, which is a much larger fraction than the ~1.5% occupied by annotated protein-coding genes. Hence, ~3.5% of the human genome comprises likely functional conserved noncoding elements (CNEs) preserved among organisms, whose common ancestors existed throughout hundreds of millions of years of evolution. As whole-genome sequencing emerges as a standard procedure in genetic analyses, interpretation of variations in CNEs, including the elucidation of mechanistic and functional roles, becomes a necessity. Here, we discuss the phenomenon of noncoding conservation via four dimensions (sequence, regulatory conservation, spatiotemporal expression, and structure) and the potential significance of CNEs in phenotype variation and disease.

We've known about overall sequence conservation for a long time. We can quibble about the exact percentage—I prefer to use 8-10%—but there's no quibbling about the fact that the amount of conserved sequence is considerably higher than the amount of coding DNA (Rands et al., 2014; Ponting, 2017). Although we didn't know the amount of conservation until the genome sequences of several mammals were published, I think it's fair to say that the conservation of some noncoding DNA was not a surprise. Knowledgeable scientists knew about noncoding genes, regulatory sequences, origins of replications, centromeres, and telomeres long before the human genome was sequenced.

In fact, there was even a review published in Trends in Genetics back in 2008 noting that some noncoding DNA is conserved (Elgar and Vavouri, 2008).

This makes it somewhat surprising to see Leypold and Speicher make a point of saying that "it is generally assumed that most functional DNA is contained within coding regions" (p. 903) and noting that many genetic diseases are associated with mutations in noncoding DNA (e.g. regulatory sequences). They then state the point of their review.

Here, we review the mysterious and enigmatic evolutionary conservation of noncoding genomic regions ...

Surely we've gone beyond that sort of naivity? Why would reviewers allow comments like that to be published?

I can only think of one logical answer. A huge number of scientists really believe that the only functional parts of the human genome are the protein-coding regions so the announcement of conserved noncoding DNA is a surprise to them. But they must have thought that 98% of our genome was junk and I doubt very much that scientists like Leypold and Speicher believed that.

It's a puzzlement.



Elgar, G. and Vavouri, T. (2008) Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends in Genetics 24:344-352 [doi: 10.1016/j.tig.2008.04.006]

Rands, C.M., Meader, S., Ponting, C.P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLOS Genetics, 10(7), e1004525. [doi: 10.1371/journal.pgen.1004525]

Ponting, C.P. (2017) Biological function in the twilight zone of sequence conservation. BMC biology 15:1-9. [doi: 10.1186/s12915-017-0411-5]

10 comments:

  1. Because biologists don't appreciate that there is error correction in the DNA codes.

    I suspect that the 'error correction feature', is to make the actual functional DNA resistant to breaks, where stuff that is not 'error corrected', like homing endonucleases, and viruses get degraded over evolutionary time, while 'important' stuff does not get degraded.

    https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

    ReplyDelete
  2. It seems odd that most biologists haven't heard of, for example, 28s rRNA.

    ReplyDelete
    Replies
    1. A guy in what was then our Genetics Department was the one who discovered 28s rRNA (and 18s rRNA). So I suspect his colleagues had heard of it.

      Delete
    2. It's also odd that they haven't heard of tRNA, miRNA, snRNA, snoRNA, or 3' and 5' UTR's. That's even before we get to gene promoters. Surely they learned about the lac promoter at some point in their schooling?

      Delete
    3. Correction: a guy who I knew in our Genetics Department had *earlier* discovered ...

      Delete
    4. Joe, if his colleagues had heard of it, and if we believe that review, they apparently forgot to tell their students about it.

      Delete
    5. ... and must have forgotten to have their students read molecular biology or cell biology textbooks that talk about it.

      Delete
    6. This comment has been removed by the author.

      Delete
    7. Hey, if that article got through review by, presumably, molecular biologists, that means that they must not have read the textbooks. Go figure. (Or perhaps the article was reviewed by physicists or dermatologists.)

      Delete
  3. Here's just one example of published work by former colleagues from the early 1990's making the observation of conserved non-coding regulatory regions regions (their model was globin gene clusters) part of the argument for fully sequencing the mouse genome:
    Hardison, RC, Oeltjen, J and Miller, W. "Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome." Genome Res. 1997. 7: 959-966 (doi: 10.1101/gr.7.10.959).
    As is clear in their work from this time, conservation of non-coding regions flanking genes became apparent from their pioneering multi-pairwise alignment methods. Regulatory and transcription factor binding activities of these regions were then confirmed experimentally. This approach developed apace for a decade to follow. How this knowledge was forgotten is a mystery.

    ReplyDelete