Monday, September 22, 2014

Are lncRNAs really mRNAs in waiting?

Biology News Net has become a joke. It's rare to see a paper that it hasn't mangled or a press release that it hasn't fallen for, hook line and sinker. I read it for amusement.

A recent report began with ... [Parts of genome without a known function may play a key role in the birth of new proteins]
Researchers in Biomedical Informatics at IMIM (Hospital del Mar Medical Research Institute) and at the Universitat Politècnica de Catalunya (UPC) have recently published a study in eLife showing that RNA called non-coding (lncRNA) plays an important role in the evolution of new proteins, some of which could have important cell functions yet to be discovered.
That sounds intriguing. Maybe I should read the paper even though it's in eLife.

It took a little more work than I expected, but eventually I found the paper (Ruiz-Orera et al., 2014). Here's the abstract.
Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.
The study suggests that a lot of "noncoding" RNAs are being translated. The products appear to be short polypeptides of less than 100 residues.

New protein encoding genes do arise from time to time although the number of proven examples is very small. Let's assume, for the sake of argument, that a new gene arises about once every million years in a given lineage. That would mean about five new genes in humans since they split from chimpanzees and that seems about right for an upper limit.

Now, if you make a lot of junk RNAs by randomly transcribing junk DNA, then some of them will undoubtedly make short polypeptides. There's a chance that random mutations will create a peptide that takes on a functional role of some kind. There's an even smaller chance that this function will confer a selective advantage on the individual carrying the mutation. That's one way new genes are born.

Is this a reason for carrying a huge amount of junk DNA in your genome and making thousands of lncRNAs? Is the potential to make a new gene one million years in the future sufficient explanation for the preservation of junk DNA? The answer is "no."

You don't have junk DNA because it might proven useful in the future. You have it because you can't get rid of it. You don't transcribe your junk DNA because it might be useful, you transcribe it because the general properties of RNA polymerase and transcription factors don't allow for perfect discrimination between real genes and junk DNA. Junk transcripts aren't translated because they contain potential coding regions, they are sometimes translated because they must, by chance, contain some open reading frames.

Sloppiness might, by accident, lead to new genes but that's not why things are sloppy. If having junk DNA were a clear advantage for future evolution then the genomes of all extant lineages should have lots of junk DNA and should make lots of lncRNAs.

Ruiz-Orera, J., Messeguer, X., Subirana, J.A., and Alba, M.M. (2014) Long non-coding RNAs as a source of new peptides. eLife 2014;3:e03523 [doi: 10.7554/eLife.03523]


  1. If sloppiness is never useful, then how did bacterial mutator alleles evolve? If accuracy is paramount, then these alleles would have been eliminated. Yes, evolution can't "think ahead", but it doesn't have to if during crises, the "sloppier" organisms simply preferentially survive. I'm as tired as you with ENCODE-style defenses of the utility of junk DNA, but arguing that sloppiness is always a mistake doesn't agree with the data.

    1. Are their any species or wild-type strains of bacteria that have fixed mutator alleles?

    2. They certainly exist in nature and aren't just a laboratory curiosity. A study back in the 1990s by a group at the FDA found high rates of mutation and the mutator form of mutS in strains related to food illness (LeClerc, et al, Science 274:1208-1211, 1996), and more recently I've been to a couple of microbiome talks that have shown that cystic fibrosis patients have a higher rate of mutator strains in their lungs as well.

  2. Note that if you translate random DNA, you get a stop codon with probability 3/64 each time, so the resulting proteins should have a mean length of (64/3)-1 = 20 amino acids. It changes a bit if the bases do not have equal frequency, but not by much. So any such proteins should be short.

    Or do I misunderstand?

    1. No, I think that's about right. Of course they are selecting for those lncRNAs that have "long" open reading frames because they are looking at ribosome protection. If you have 10,000 lncRNAs and each one is 1000 bp in length, then how many are going to encode a 100 residue protein?

      That sounds like something you (but not me) could calculate.

    2. Let me try. If the start is at the beginning of the lncRNA, then we just have to calculate what the chance is that none of the first 100 codons is a start codon. If the bases are assumed equal in frequency (OK, an oversimplification) and we have 3 possible stop codons we just need to compute (61/64) raised to the 100th power. That is 0.008222163 so on average 8 of the 1000 lncRNAs would code for a protein of length 100 or more.

    3. Typo: " ... that none of the first 100 codons is a stop codon."

  3. Off thread but a quickie.
    If this bio mag is a joke, probably, then why is it nor a accurate sample for creationists to say there is a general problem in understanding and investigating science matters especially in origin matters?
    Why just this mag? A lot of them miss creationists excellent points?

    1. I imagine it would probably come down to what is meant by "creationist's excellent points".

  4. Prof. Moran, in your previous post, you wrote "what if 90% of all 10,000 lncRNAs have no function" . Just to clarify, was that just a hypothetical figure, or do you think that something like Struhl's 2007 estimate of 90% of lncRNAs in Saccharomyces cerevisiae resulting from the inefficiency of RNA polymerase II is also going to be a close estimate for human lncRNAs?

    1. It's just my best guess. The important point is that too many scientists are assuming that the value is much closer to 0%. It's bad enough to make unjustified assumptions but it's even worse when you don't realize that you are making an assumption.

  5. The problem with all these " ... in-waiting" arguments is, what protects the sequence from being deleted in the meantime? I suppose that if they are just random sequences waiting to be expressed as random polypeptides, point mutation wouldn't hurt them, but random deletion would not be opposed. Saying that the species would not survive unless sequences like that were around is invoking a group selection (or even species-selection) mechanism. Which doesn't mean that it is wrong, but means that we have to think about the strength of that selection.

    There is the same issue with "front-loading" arguments, with the additional weakness that the genes that are set up in order to be expressed billions of years later would be eroded continually by point mutation as well as eliminated by deletion, and nothing would oppose that.

  6. Ah, yes. There was a reason I eventually started most biology classes with a list of assumptions we would make. Most assumptions were from physics. One was "In a time series, causes happen before the events they cause." Students would look at me like I was crazy; of course causes come before effects!

    And here we have what skates very close to "Future useful genes cause lncRNA's." Random, unavoidable variation and erroneous transcription occasionally throwing up useful products seems more probable to me. lncRNA's aren't useful because some small percent of them may in the future become useful; that's just a nice side effect of their existence.

  7. Hi Larry – forgive me for resorting to an argumentum ad absurdum…

    If having junk DNA were a clear advantage for future evolution then the genomes of all extant lineages should have lots of junk DNA and should make lots of lncRNAs.

    Hmmm… Does it then follow that; If having nerve cells underlie the retina resulting in sharper vision with no blind spot were a clear advantage, then the eyes of all metazoans should have eyes just like squids.

    I thought we were calling Creationists "IDiots" because there in fact is no evident teleology and evolution is often jerry-rigged.

    I must be missing something here. What am I not following?

  8. Replies
    1. Hi Larry,

      I agree with you that the squid eye statement is a non sequitor.


      To my understanding, your statement:

      If having junk DNA were a clear advantage for future evolution then the genomes of all extant lineages should have lots of junk DNA and should make lots of lncRNAs.

      ... represents exactly the identical category of non sequitor. That's the part I don't get and clearly I must be missing something.

  9. It seems substantially less likely that random intergenic sequence can provide novel functional peptides than existing translated genes, introns or untranslated pseudogenes. These already possess the upstream and downstream sites necessary for transcription, cleavage, capping, export to the ribosome, and translation initiation - plus the kinds of motif that appear to make for successful enzymes. All of this would somehow have to come together in an 'intergene' before it can even get out the starting gate.

    Only .02% of a random genome would be the sextuplet (?) AAUAAA necessary for binding CPSF, for example, which would also need a randomly GU-rich region, and an initiator methionine triplet with a random STOP sufficiently far from it to actually make something, with a consistently foldable product. Beyond these passive barriers, one would expect that the production of random catalysts was suppressed if anything, rather than being an evolutionary force selecting for retention of the potential.

    1. I agree with you that creation of a new gene by this mechanism is extremely unlikely. Furthermore, there are very few examples (none?) so this is mostly speculation. Some scientists, who should know better, are being influenced by reports of "orphan" genes. These are actually putative genes that almost always turn out to be false alarms. If you were to believe all those false positive claims then new genes are popping up from junk DNA all over the place.

  10. OK Reality check

    some lncRNAs have been found to participate in the regulation of such diverse activities as
    • splicing,
    • translation,
    • imprinting, and
    • transcription. Two examples:
    o XIST. XIST RNA, which contains thousands of nucleotides, inactivates one of the two X chromosomes in female vertebrates. [Discussion]
    o Some lncRNAs participate in bringing the enhancer and promoter regions of genes close together ("looping" — View) to regulate gene transcription. (More)

    I am fascinated by XIST lncRNA-mediated Barr body formation and wonder out loud whether lncRNA is in general crucial for another level of gene control often not considered in introductory textbooks… namely chromatin architecture in the nucleus.

    I realize I am rehashing – but I am going to float this balloon again with premeditation aforethought; in order to have my exuberant naiveté reined in.

    I thank any and all in advance for their patience and indulgence.

    Perhaps chromosomes have their equivalent to tertiary and quaternary structure. Otherwise how does one explain constancy of karyotypes across primate lineages unless invoking positive selection?

    This makes intuitive sense to me - Check out this link:

    1. Otherwise how does one explain constancy of karyotypes across primate lineages unless invoking positive selection?

      Is it that constant (compared to other equivalent groups)? Genuine question; I don't know either way and would be interested in your data.

      One of the primary drivers of gross karyotype rearrangement in mammals appears to be the polarity of female meiosis, which appears to be stronger than conservative selection in those groups subject to frequent reversal.