More Recent Comments

Friday, September 27, 2013

Dark Matter Is Real, Not Just Noise or Junk

UPDATE: The title is facetious. I don't believe for one second that most so-called "dark matter" has a function. In fact, there's no such thing as "dark matter." Most of our genome is junk. I mention this because one of the well-known junk DNA kooks is severely irony-impaired and thought that I had changed my mind.
A few hours ago I asked you to evaluate the conclusion of a paper by Venters and Pugh (2013) [Transcription Initiation Sites: Do You Think This Is Reasonable?].

Now I want you to look at the Press Release and tell me what you think [see Scientists Discover the Origins of Genomic "Dark Matter"].

It seems pretty clear to me that Pugh (and probably Venters) actually think they are on to something. Here's part of the press release quoting Franklin "Frank" Pugh, a Professor in the Department of Molecular Biology at Penn State.
The remaining 150,000 initiation machines -- those Pugh and Venters did not find right at genes -- remained somewhat mysterious. "These initiation machines that were not associated with genes were clearly active since they were making RNA and aligned with fragments of RNA discovered by other scientists," Pugh said. "In the early days, these fragments of RNA were generally dismissed as irrelevant since they did not code for proteins." Pugh added that it was easy to dismiss these fragments because they lacked a feature called polyadenylation -- a long string of genetic material, adenosine bases -- that protect the RNA from being destroyed. Pugh and Venters further validated their surprising findings by determining that these non-coding initiation machines recognized the same DNA sequences as the ones at coding genes, indicating that they have a specific origin and that their production is regulated, just like it is at coding genes.

"These non-coding RNAs have been called the 'dark matter' of the genome because, just like the dark matter of the universe, they are massive in terms of coverage -- making up over 95 percent of the human genome. However, they are difficult to detect and no one knows exactly what they all are doing or why they are there," Pugh said. "Now at least we know that they are real, and not just 'noise' or 'junk.' Of course, the next step is to answer the question, 'what, in fact, do they do?'"

Pugh added that the implications of this research could represent one step towards solving the problem of "missing heritability" -- a concept that describes how most traits, including many diseases, cannot be accounted for by individual genes and seem to have their origins in regions of the genome that do not code for proteins. "It is difficult to pin down the source of a disease when the mutation maps to a region of the genome with no known function," Pugh said. "However, if such regions produce RNA then we are one step closer to understanding that disease."
I'm puzzled by such statements. It's been one year since the ENCODE publicity fiasco and there have been all kinds of blogs and published papers pointing out the importance of junk DNA and the distinct possibility that most pervasive transcription is, in fact, noise.

It's possible that Pugh and his postdoc are not aware of the controversy. That would be shocking. It's also possible that they are aware of the controversy but decided to ignore it and not reference any of the papers that discuss alternate explanations of their data. That would be even more shocking (and unethical).

Are there any other possibilities that you can think of?

And while we're at it. What excuse can you imagine that lets the editors of Nature off the hook?

P.S. The IDiots at Evolution News & Views (sic) just love this stuff: As We Keep Saying, There's Treasure in "Junk DNA".


Venters, B.J. and Pugh, B.F. (2013) Genomic organization of human transcription initiation complexes. Nature Published online 18 September 2013 [doi: 10.1038/nature12535] [PubMed] [Nature]

22 comments :

SPARC said...

IDiots love such stuff although they can't even sumarize it correctly.

Mikkel Rumraket Rasmussen said...

It is truly frustrating to have gone ahead and bothered to educate myself on junk DNA and molecular biology, only to discover how time and again there are professional scientists who aren't even aware of the developments in their own field, and who continue to get their facts wrong.
Something have gone completely wrong somewhere in these people's educations. They're completely oblivious to the history and the science behind junk.

Considering how I'm only a lab technician and have only recently come to understand the subject, I can barely imagine how frustrating it must be for someone like Larry.

Konrad said...

"What excuse can you imagine that lets the editors of Nature off the hook?"

Well, at least they're consistent - they publish stuff of this quality all the time.

Mikkel Rumraket Rasmussen said...

Are there any other possibilities that you can think of?
Well, if they're truly unaware of the controversy, they're probably also falling victim to the selectionist fallacy.

If they aren't unaware of it, they have seemingly chosen to ignore it completely, possibly because they've decided what side of the fence they come down on, and might even think the matter is settled (however horrifying that possibility is).

Georgi Marinov said...

Think of how frustrating it is for people in the functional genomics field who do understand the basics of molecular evolution. Over the last few days Larry has repeatedly basically dismissed the whole field as pointless exercise in technical wizardry that does not contribute anything to our understanding of biology. And he's not alone in that view.

That is both wrong and hurts deeply those who are trying to put the power of the technology to good use.

Joe Felsenstein said...

I think that there is another error in the quote. It approvingly quotes Pugh and Venter who manage to get the definition of "dark matter of the genome" wrong. I always thought that when people use the phrase "dark matter of the genome" they refer to whatever accounts for the "missing heritability". The large amount of the genome that changes without making any difference to the phenotype is fascinating, but it then cannot be that "dark matter". It may be a lot of DNA that has no known function, but that does not make it account for the missing heritability.

So for Pugh and Venter to refer to noncoding RNAs that do not do anything as "dark matter of the genome" they are changing the definition of that phrase.

Mikkel Rumraket Rasmussen said...

Oh, someone hurt their feelings? Stop the press.

This is science, criticism is an intrinsic property of the game. If they want to avoid harsher criticism, maybe they should take some time to accurately report their findings instead of stooping to unsupportable sensationalism.

Unknown said...

I think the missing heritability quote is the most egregious aspect of the press release. It seems that people use these terms without properly understanding why they were coined, or what they actually mean. Agree that non-coding RNA has no significance in terms of missing heritability.

Someone, I can't recall who, once lamented that molecular biology was unlicensed biochemistry. It seems that for many people genomics is unlicensed molecular evolution.

Larry Moran said...

@George

I'm perfectly willing to admit that all those expensive mapping studies might contribute to our understanding of genomes and gene expression. I'm sure they have. I'm sure there are lots of solid conclusions that are going to make it into the next editions of the textbooks.

I apologize for forgetting about the important insights that have come from mapping transcription binding sites and methylation sites. Could you just briefly remind me what they are?

Georgi Marinov said...

What is your definition of important insight?

Georgi Marinov said...

Also, it is not at all as expensive to do these as it used to be. At this point in time, if you want to profile a few dozens of sites by ChIP-qPCR, the cost of the primers and the reagents can easily run into the neighborhood of the cost of going genome-wide. Might just as well do the whole genome and generalize your results. Whole-genome bisulphite is of course expensive but it will become cheaper.

It's just a tool that has become a normal part of doing the science in the field. I don't know why you are under the impression that there is a separation between the people who do "expensive mappings" and those who do traditional biology, There isn't - the latter group simply doesn't exist, everybody has adopted the functional genomic methods because they have a lot of advantages.

Now, there is one thing I have myself repeatedly pointed out and it is that the laboratories that have well defined biological problems they are trying to address and are using genomics to answer them in a focused way have indeed contributed more "important insights" (at least what I would call an important insight) that the large-scale consortia. But the large-scale consortia weren't really set up to provide biological insights, their goal is primarily to produce data. And they drove the development of the technology. Which, I would expect you to agree, is a positive thing - otherwise we can just go back 30 years ago and say that there was no point in developing genome sequencing because it's just a technical advance that won't provide any "important insights"

Finally, none of this conversation would be happening if there was enough funding for both the large-scale efforts and the individual labs, there was no competition between them for a limited resource. But that's a different topic

Larry Moran said...

Georgi asks,

What is your definition of important insight?

I'll accept anything you want to offer. For example. have the studies clarified the number of genes in the human genome? Have they led to any new (true) insights on the regulation of gene expression?

As you know, many of your colleagues in the field have CLAIMED that they have answers to those questions. They claim that their data shows that most of the genome is functional and they claim that their data shows that each gene is regulated by a host of transcription factors and noncoding RNAs.

Do you agree with them? If not, what is the most important contribution in your opinion?

Georgi Marinov said...

Laurence A. Moran
Georgi asks,

What is your definition of important insight?

I'll accept anything you want to offer. For example. have the studies clarified the number of genes in the human genome? Have they led to any new (true) insights on the regulation of gene expression?


OK, let's go back to the very early days of high-throughput sequencing.

This (piRNAs, ping-pong mechanism) does not happen without it:

http://www.ncbi.nlm.nih.gov/pubmed/16751777
http://www.ncbi.nlm.nih.gov/pubmed/17346786
http://www.ncbi.nlm.nih.gov/pubmed/17446352
http://www.ncbi.nlm.nih.gov/pubmed/17975059

If you don't consider that a fundamental insight, you must have impossible standards, which if we are to follow we would be better off if we just stopped doing science altogether.

As far as who claimed what, that's completely orthogonal to the question of whether the technological advances are useful. It's more of sociology of science question question. My position has been made clear many times

Larry Moran said...

@Georgi,

First, "expensive" includes the time an effort of graduate students and postdocs who could be doing something more important.

Second, I agree with you when you say, "But the large-scale consortia weren't really set up to provide biological insights, their goal is primarily to produce data." That being the case, the consortia should stop pretending the they have discovered any biological insights.

Maybe it's time to stop collecting more data and actually try to use it to gain some biological insights. If I were in charge of one of those consortia I would definitely be investigating the binding of various factors to random DNA in order to try and sot out function from accident. I'd also put some of my people on projects to look at specific "promoters" and "enhancers" to try and find out if they have a biologic function.

The last thing I'd do is waste them on pumping out a dozen more data sets. Enough is enough, already. The questions are there. It's time to start answering them.

Georgi Marinov said...

The last thing I'd do is waste them on pumping out a dozen more data sets. Enough is enough, already. The questions are there. It's time to start answering them.

Well, that's not exactly true. Eventually every transcription factor will have to be ChIP-ped if we are to understand it's biology. We're not talking fundamental principles here, just characterizing the transcription factors (from which something interesting might pop out, who knows, but that's not necessarily the expectation).

There is indeed a tendency to go in the direction of "Let's go get more data" than to actually go deep in the data that already exists. It's easier. I've made that mistake myself. And we indeed have more data than we can fully make sense of already. But the that statement is not incompatible with the statement that we don't have all the data we need,

Larry Moran said...

Georgi says,

If you don't consider that a fundamental insight, you must have impossible standards, which if we are to follow we would be better off if we just stopped doing science altogether.

Well, I'm not an expert on Piwi RNAs and I haven't studied all the examples of small regulatory RNAs and anti-sense RNAs that have been discovered in the past decade.

However, I've been teaching students about small regulatory RNAs and antisense RNAs since 1980 and I put several examples in my first textbook published in 1987.

We have known for almost 40 years that transcription factors and RNA polymerases bind to random sequences of DNA. We've known for all that time that there MUST be spurious binding sites in a large genome. What's the point of mapping all those spurious binding sites for every known transcription factor?

The important question is not whether there are fortuitous binding sites (there are) but how many of them represent real biological functioning promoters and enhancers. Isn't it about time to start addressing that question?

Georgi Marinov said...

Well, I'm not an expert on Piwi RNAs and I haven't studied all the examples of small regulatory RNAs and anti-sense RNAs that have been discovered in the past decade.

However, I've been teaching students about small regulatory RNAs and antisense RNAs since 1980 and I put several examples in my first textbook published in 1987.


Nobody disputes that.

But we did not know that there is a dedicated small RNA-based system for silencing transposons and that is so important. I consider that a major discovery.

We have known for almost 40 years that transcription factors and RNA polymerases bind to random sequences of DNA. We've known for all that time that there MUST be spurious binding sites in a large genome. What's the point of mapping all those spurious binding sites for every known transcription factor?

The important question is not whether there are fortuitous binding sites (there are) but how many of them represent real biological functioning promoters and enhancers. Isn't it about time to start addressing that question?


I think we have a slightly different definition of what a spurious binding site is. The binding sites we see are quite robust and reproducible. Why does the fact that there are tens of thousands of them have to bother you? There are at least 20K functional binding sites for the general transcription factors. I don't think you would dispute that. And I don't think you would dispute that there are thousands of enhancers in the genome. And when I say that, please note what I also said above about the claims in the paper that's the topic of this thread.

Of course that doesn't mean that all of them are functional. But it's not true that everyone is assuming they are and that nobody is testing them. It's just that it's a lot easier to map TF binding sites than to test their function. But it is something that's being worked on and results will be coming out in the coming years. We've only had (relatively) easy to use genome editing for less than an year now.

Larry Moran said...

Please see the UPDATE at the top of this post.

Larry Moran said...

Georgi says,

I think we have a slightly different definition of what a spurious binding site is.

Apparently we do. My definition is that a spurious binding site is where a transcription factor binds by accident because a random piece of junk DNA just happens to have a sequence that resembles the binding site consensus sequence.If the binding site is six (6) nucleotides then there will be one million of them in the genome. Only a handful are functional.

What's your definition?

The binding sites we see are quite robust and reproducible.

Of course they are. That's what I expect.

Why does the fact that there are tens of thousands of them have to bother you?

Like I said, it doesn't bother me in the least that there are tens of thousands of spurious binding sites in the human genome. What bothers me is those workers who interpret them as functional enhancers.

There are at least 20K functional binding sites for the general transcription factors. I don't think you would dispute that.

Do you mean that there's at least one functional binding site for each gene? If so, I agree.

And I don't think you would dispute that there are thousands of enhancers in the genome.

I would not dispute that. There are 25,000 genes (approximately) and almost all of them will have an enhancer of some sort. Many will have several enhancers.

Georgi Marinov said...

I think there isn't really anything we're actually arguing over.

Except for the definition of "spurious"

As you yourself pointed out, TF motifs are not very information rich in metazoans, But you do not see all of them occupied, i.e. there is more than just the motif that makes a binding site, whether it's chromatin structure, cofactors, etc. . That's why I don't like the word "spurious" to refer to those, while once again, I stress that I am not in the "If you see it, it's biologically meaningful" camp.

When I said 20K sites for GTFs, I meant the TBPs and TFIIs. Each gene has a promoter after all.

Joe Felsenstein said...

Your update still uses "dark matter" to signify junk DNA. See my terminological complaint above. You're accepting Pugh and Venter's mistaken terminology.

Larry Moran said...

Understood. I changed it.