Thursday, April 11, 2013

Educating an Intelligent Design Creationist: Rare Transcripts

I'm replying to a post by andyjones (More and more) Function, the evolution-free gospel of ENCODE. This was the fourth post in a series and I'm working my way through five issues that Intelligent Design Creationists need to understand.

Educating an Intelligent Design Creationist: Introduction
Educating an Intelligent Design Creationist: Pervasive Transcription

Andyjones says he didn't know that many of the unusual transcript are very rare. That's a shame because it's one of the very important things you need to know in order to have an intelligent opinion about junk DNA. Here's a question from andyjones ...
The second point is interesting, but I have to ask the question: given the fact that we don’t know everything about the genome, isn’t it precisely those parts that are rarely transcribed that would give most difficulty when it comes to determining their functions?
The simple answer to your question is "yes" but that doesn't mean we don't have clues. The best explanation depends on how rare the transcripts are and on whether there's another, equally reasonable, explanation that accounts for their existence. What we can say right now is that the presence of these rare transcripts is consistent with junk DNA. We can also say that there's no reasonable functional explanation for huge numbers of transcripts that are present at less that one copy per cell. Think about that for a minute. It means that right now there are only two scientifically reasonable explanations: (1) junk DNA/RNA, and (2) we don't know if they have a function. It is scientifically incorrect to say that these transcribed regions are functional and therefore junk DNA is refuted.1

Let's review the evidence for transcript abundance. The number of specific mRNA molecules per mammalian cell range from tens of thousands to several hundred to ten or less. These three classes are quite common. It's been known since experiments in the 1970's that there are very few highly abundant mRNAs (e.g. ovalbumin in oviducts, globin mRNA in erythrocytes). Most fall into the intermediate class and a small number are low abundance mRNAs. It's unlikely that a steady-state level of only a few mRNAs could support enough protein synthesis to make much of a difference but it can't be ruled out. It's even more unlikely that such rare transcripts could perform a regulatory function since they would have to be ten times more abundant that the mRNA they regulate.

Theme

Genomes
& Junk DNA
Once of the most widely-read discussions of transcript abundance comes from a paper written by my colleagues Ben Blencowe and Tim Hughs (van Bakel et al. 2010). They have a nice figure to illustrate the difference between the mass of RNA being analyzed and its complexity (= the number of different sequences). In this case they are looking at poly A+ RNA—that's supposed to be almost exclusively mRNA that encodes protein. [I wrote up some things about this paper when it first came out: Junk RNA or Imaginary RNA.]

You would expect that the bulk of this RNA would correspond to exons and that's exactly what they find. In the experiment with human RNA they show that 88% of the mass of RNA is transcribed from exons. That figure is also 88% in the mouse experiment. Now look at the parts of the genome that are covered by these preparations of RNA. That's shown on the right in the figure below. In this case, 51% of the sequences represented in the preparation are complementary to introns. There shouldn't be any introns in mRNA so this represents mostly contamination or artifact. What it says is that 5.8% of the bulk RNA (red bars) covers more than half of the total complexity of the RNA preparation.

Each individual bit of intron RNA is extremely rare. The intron fraction is high complexity, low abundance.

The other three fractions (EST exon, EST intron, and other) reveal a similar pattern. These are unannotated gene sequences that are most likely junk DNA. They make up 6.4% of the bulk of RNA (human) but cover 26% of the total genome coverage. These are very rare bits of RNA from all over the genome. They are probably present at much less than one copy per cell.

If one looks at poly A+ RNA from many different tissues—as the ENCODe project did—then what you find is that you eventually begin to saturate the known protein-encoding genes but the rare transcripts from intergenic regions continue to cover more and more of the genome until, eventually, it looks like almost all of the genome is transcribed in one tissue or another. This is "pervasive transcription."

As van Bakel et al. (2010) point out ...
THEME:
Transcription

... the fact that such pervasive transcription would only be detected at sequencing depths more than two orders of magnitude above current levels suggests that these transcripts may largely be attributed to biological and/or technical background. Indeed, the vast majority of intergenic and intronic seqfrags have very low sequence coverage (Figure 2E, 2F), exemplified by the fact that 70% (human) to 80% (mouse) of the transcribed area in these regions is detected by a single RNA-Seq read in only one sample, much of which is consistent with random placement.
If you can only detect one single transcript of a particular region then transcription of that part of the genome must be exceedingly rare.

This does not mean that all intergenic transcription is rare because van Bakel et al. (2010) go to great lengths to identify some 16,000 sites where transcription is frequent enough to suggest functionality. However, this is only a small fraction of the genome. The rest of the transcripts are consistent with models of random transcription.

The same rationale applies to bulk RNA (i.e. not poly A+ RNA) except that you detect a lot more intergenic transcription.

Because our genomes have introns, Alu elements, and endogenous retroviruses, these things must be doing us some good. Because a region is transcribed, its transcript must have some fitness benefit, however remote. Because residue N of protein P is leucine in species A and isoleucine in species B, there must be some selection-based explanation. This approach enshrines “panadaptationism,” which was forcefully and effectively debunked by Gould and Lewontin in 1979 but still informs much of molecular and evolutionary genetics, including genomics.

Ford Doolittle (2013)
The authors suggest that the observed results can be more satisfactorily explained by accidental or spurious transcription leading to rare transcription of an intergenic region of the genome. ("Pervasive transcription of intergenic regions as described in previous studies occurs at a significantly reduced level and is of a random character.")

They also suggest that the appropriate null model in these studies should be accidental transcription ("To be conservative, a null hypothesis should perhaps be that novel transcripts—particularly those that are small and low-abundance—are a by-product rather than an independent functional unit. Searching for phenotypes caused by genetic perturbation may be the most useful approach to disproving the null hypothesis.") The onus should be on those who claim function to support their case. It's not up to the opponents of pervasive functional transcription to prove lack of function. Function is not the default option as long as you understand that transcription is not perfect.2

There's an important point here. It's not sufficient just to show that one's RNA prep covers a large part of the genome. You also have to include quantitative data so the result can be realistically evaluated. When van Bakel et al. (2010) challenge "pervasive transcription" they are not challenging the data that show RNA hybridization to the bulk of the genome. What they are challenging is the fact that much of this coverage is consistent with rare, random transcription. That's not "pervasive transcription" in their minds.

John Mattick challenged the conclusions of van Bakel et al. (2010) (Clark et al. 2010) and my colleagues responded (van Bakel et al. 2011). This debate is well known3 but the controversy is completely ignored in the summary papers from the ENCODE project last September.

Some opponents of junk DNA (i.e. proponents of function for all/most transcripts) are aware of the problem and own up to the difficulty of defining function. I want to close with a quotation from Willingham and Gingeras (2006) to prove that good scientists discuss both sides of a controversy.
Noncoding RNAs and Their Functions

A key question hangs like an ominous cloud over these observations of widespread transcription: are these transcripts biologically functional, or are they the transcriptional noise of a less than precise set of biological processes? Recent experiments in mice in which megabase “gene desert” regions have been deleted underscore the relevance of this question. Deletion of 1.5 Mb and 0.8 Mb genomic intervals, which together contain 1243 noncoding sequences conserved between rodent and primate, resulted in viable mice with no obvious deleterious phenotypes (Nobrega et al., 2004). However, if history is our guide, then the answer to this question may be complex.


1. That's why The Myth of Junk DNA is not a science book. It's basically an argument from ignorance where "we don't know" is translated to mean "it must be functional."

2. This concept is not new. Michael White wrote about the proper null hyoothesis some years ago in Genomic Junk And Transcriptional Noise. I elaborated a little bit in my blog post about his paper [see How to Frame a Null Hypothesis]

3. Jonathan Wells devotes several pages to attacking the reputation of Hughes and Blencowe and their colleagues.

Clark, M.B., Amaral, P.P., Schlesinger, F.J., Dinger, M.E., Taft, R.J., Rinn, J.L., Ponting, C.P., Stadler, P.F., Morris, K.V. and Morillon, A. (2011) The reality of pervasive transcription. PLoS biology 9, e1000625. [doi: 10.1371/journal.pbio.1000625]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) 110:5294-5300. [doi: 10.1073/pnas.1221376110]

Gould, S.J. and Lewontin, R.C. (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc. Royal Soc. (London) Series B. Biological Sciences 205:581-598.

Willingham, A.T. and Gingeras, T.R. (2006) TUF love for “junk” DNA. Cell 125:1215-1220. [doi: 10.1016/j.cell.2006.06.009]

van Bakel, H., Nislow, C., Blencowe, B.J. and Hughs, T.R. (2010) Most “dark matter” transcripts are associated with known genes. PLoS Biology 8(5): e1000371. [doi: 10.1371/journal.pbio.1000371]

van Bakel, H., Nislow, C., Blencowe, B.J. and Hughes, T.R. (2011) Response to “the reality of pervasive transcription”. PLoS Biology 9, e1001102. [doi: 10.1371/journal.pbio.1001102]

14 comments :

  1. I think you have mistyped "we don't know" in your 1. footnote.

    ReplyDelete
  2. Anyway, very informative post Larry thank you. Not just ID proponents or creationists are recieving education here.

    ReplyDelete
  3. Creationists really have a problem with the concept of the null hypothesis, don't they? Theists do in general, come to think of it.

    ReplyDelete
  4. Is that Willingham and Gineras or Gingeras?

    ReplyDelete
  5. Larry, have you interpreted the bar graph correctly?

    Now look at the parts of the genome that are covered by these preparations of RNA. That's shown on the right in the figure below. In this case, 51% of the sequences represented in the preparation are complementary to introns. There shouldn't be any introns in mRNA so this represents mostly contamination or artifact. What it says is that 3% of the bulk RNA (red bars) covers more than half of the total complexity of the RNA preparation.

    If by red bars you mean orange bars, it looks like 5.8% to me.

    And this:

    The other three fractions (EST exon, EST intron, and other) reveal a similar pattern. These are unannotated gene sequences that are most likely junk DNA. They make up 3% of the bulk of RNA (human) but cover 26% of the total genome coverage.

    To me it looks like 3.3% + 0.9% + 2.2% = 6.4%.

    ReplyDelete
  6. Jonathan Wells devotes several pages to attacking the reputation of Hughes and Blencowe and their colleagues.

    Ooh ooh ooh! Can you copy in some choice bits? I love bitchy Wells ad hominems!

    ReplyDelete
  7. What could be the role of rare transcripts? Following up on my previous comment, one host RNA molecule with a sequence complementary to a viral transcript (enough for two helical turns), should suffice to activate intracellular alarms (e.g. see Marcus, P. (1983) Interferon induction by viruses: one molecule of dsRNA as the threshold for induction. Interferon 5, 115-180).

    The interferon signal can alert other cells not yet infected by the virus. Thus, even if, on average, there were less than one specific antiviral host RNA molecule/cell, if the initially infected cell had that RNA, the host would be alerted. This could be adaptive (http://post.queensu.ca/~forsdyke/theorimm4.htm).

    ReplyDelete
    Replies
    1. That's interesting, but if only two helical turns are needed, then that would be a small fraction of the nucleotides within pervasive transcription.

      Delete
    2. For the actual calculation see section 13 of our 2001 paper at http://post.queensu.ca/~forsdyke/EBV.htm

      Delete
    3. Don says,

      one host RNA molecule with a sequence complementary to a viral transcript (enough for two helical turns), should suffice to activate intracellular alarms

      Have you done some calculations to see how long it would take for this single RNA molecule to find the viral transcript in a typical mammalian cell with probably a BILLION other RNA binding sites? I'm thinking many days at 37°C.

      Delete
    4. Perhaps someone out there would attempt the calculation, taking into account the crowded nature of the cytosol, which the hand of Nature is likely to have optimized (pH, salt concentration, etc.) to favour what is perhaps the kinetically most important reaction in cells - that between tRNA anticodon loops and mRNA codons.

      When we biochemists carry out RNA-RNA hybridizations in our plastic Eppendorf tubes, we try to simulate these reaction conditions by adding crowding agents, such as polyethylene glycol, and fine-tuning pH and salt concentrations. How close the results obtained from such systems correspond to what would obtain in real cells is problematic. Fancy biophysical techniques (FRET analysis) are beginning to caste light on this, but pending such studies I vote for minutes or hours, not days!

      Delete