Sandwalk: Transcription Initiation Sites: Do You Think This Is Reasonable?

Friday, September 27, 2013

Transcription Initiation Sites: Do You Think This Is Reasonable?

I'm interested in how scientists read the scientific literature and in how they distinguish good science from bad science. I know that when I read a paper I usually make a pretty quick judgement based on my knowledge of the field and my model of how things work. In other words, I look at the conclusions first to see whether they conflict with or agree with my model.

Many of my colleagues do it differently. They focus on the actual experiments and reach a conclusion based on how the perceive the data. If the experiments look good and the data seems reliable then they tentatively accept the conclusions even if they conflict with the model they have in their mind. They are much more likely to revamp their model than I am.

I'm about to give you the conclusions from a recently published paper in Nature. I'd like to hear from all graduate students, postdocs, and scientists on how you react to those conclusions. Do you think the conclusions are reasonable (as long as the experiments are valid) or do you think that the conclusions are unreasonable, indicating that there has to be something wrong somewhere?

The paper is Venters and Pugh (2013). It's title is Genomic organization of human transcription complexes. You don't need to read the paper unless you want to get into a more detailed debate. All I want to hear about is your initial reaction to their final two paragraphs.

Consolidated genomic view of initiation

...The discovery that transcription of the human genome is vastly more pervasive than what produces coding mRNA raises the question as to whether Pol II initiates transcription promiscuously through random collisions with chromatin as biological noise or whether it arises specifically from canonical Pol II initiation complexes in a regulated manner. Our discovery of ~150,000 non-coding promoter initiation complexes in human K562 cells and more in other cell lines suggests that pervasive non-coding transcription is promoter-specific, regulated, and not much different from coding transcription, except that it remains nuclear and non-polyadenylated. An important next question is the extent to which transcription factors regulate production of ncRNA.

We detected promoter transcription initiation complexes at 25% of all ~24,000 human coding genes, and found that there were 18-fold more non-coding complexes than coding. We therefore estimate that the human genome potentially contains as many as 500,000 promoter initiation complexes, corresponding to an average of about one every 3 kilobases (kb) in the non-repetitive portion of the human genome. This number may vary more or less depending on what constitutes a meaningful transcription initiation event. The finding that these initiation complexes are largely limited to locations having well-defined core promoters and measured TSSs indicates that they are functional and specific, but it remains to be determined to what end. Their massive numbers would seem to provide an origin for the so-called dark matter RNA of the genome, and could house a substantial portion of the missing heritability.

Looking forward to hearing from you.

Keep in mind that this is a Nature paper that has been rigorously reviewed by leading experts in the field. Does that influence your opinion?

Venters, B.J. and Pugh, B.F. (2013) Genomic organization of human transcription initiation complexes. Nature Published online 18 September 2013 [doi: 10.1038/nature12535] [PubMed] [Nature]

21 comments :

Matt G said...: I was wondering when you'd get around to this paper!

My first thought was: they seem to assume that because these initiation sites are there, they must have a purpose. It has an almost IDC feel to it.; Friday, September 27, 2013 9:23:00 AM
Bryan said...: Messed up my comment - what I meant to write was:

My knowledge of transcriptional start sites is limited, so this may reflect my ignorance more than anything, but from what I recall:

1) High-level transcription generally requires a large number of sequential (ignoring trans-acting elements) transcription factor binding sites, and
2) Individual transcription factor binding sites are often small and have degenerate sequences

It seems like no stretch to me (assuming my memory served me correctly about the above facts) that random mutations should occasionally produce DNA sequences in our junk that match these transcription factor binding sites. This would account for a) the high number of non-promoter initiating complexes observed in this paper, and b) the "over-abundance" of pervasive, low-level transcriptions we observe.

This part bugs me: and could house a substantial portion of the missing heritability

This seems to go back to the idea that the 18-24K genes we have isn't "enough" to make a human. Basically, a bit of hubris that doesn't pass the onion test...; Friday, September 27, 2013 10:20:00 AM
Georgi Marinov said...: If you just do regular TBP ChIP-seq you get nowhere near 150,000 peaks.

1. wget http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeAwgTfbsUniform/wgEncodeAwgTfbsSydhK562TbpIggmusUniPk.narrowPeak.gz
2. gunzip -c wgEncodeAwgTfbsSydhK562TbpIggmusUniPk.narrowPeak.gz | wc -l
17558

18K reproducible TBP ChIP-seq sites in K562.

Now, I haven't had time to compare the TBP ChIP-seq and TBP ChIP-exo data in detail, but my alarm bells went off when I read the paper last week - I have a strong suspicion they drastically overcalled. Binding sites are always distributed on a continuum and you can get as many as you want depending on where you draw the threshold (which does not mean all of them will be real, that's why we do things like IDR). But ChIP-exo is relatively new and there is no well established way to call ChIP-exo peaks. So the lack of discussion on how exactly those 150,000 sites were called is disturbing

That said, the results of the paper overall are by no means surprising and by no means should be interpreted in the way the authors do (i.e. these things matter) - for all I know they show evidence for how the loose definition of TSSs makes it possible for many spurious initiation sites to exist in a large genome. That's why you probably need additional mechanisms for distinguishing true TSSs from the spurious ones. This paper from a few months ago and a few similar ones I can't immediately recall probably point in the direction of what's really making the system work:

Almada AE, Wu X, Kriz AJ, Burge CB, Sharp PA. 2013. Promoter directionality is controlled by U1 snRNP and polyadenylation signals.Nature. 2013 Jul 18;499(7458):360-3.; Friday, September 27, 2013 10:23:00 AM
Claudio said...: Those two paragraphs are nothing bad. Consider the press release for the paper instead: http://www.genengnews.com/gen-news-highlights/origins-of-genomic-dark-matter-uncovered/81248868/; Friday, September 27, 2013 11:05:00 AM
Blas said...: "I know that when I read a paper I usually make a pretty quick judgement based on my knowledge of the field and my model of how things work. In other words, I look at the conclusions first to see whether they conflict with or agree with my model. "

So instead the data "your model" is the measure of reality.; Friday, September 27, 2013 11:41:00 AM
Unknown said...: A recent paper from Michael Eisen's lab should prove an antidote to the one you've discussed:

Paris et al. (2013). Extensive Divergence of Transcription Factor Binding in Drosophila Embryos with Highly Conserved Gene Expression. http://dx.plos.org/10.1371/journal.pgen.10037

The authors examined the evolutionary conservation of transcription factor binding sites across four Drosophila species. They found mRNA levels were remarkably well conserved, but transcription factor binding was not. Their conclusion:

"That two thirds of the regions bound by any of the four TFs we examined are poorly conserved and thus are probably under weak or no purifying selection supports the emerging view that a large fraction of measurable biochemical events are not functional (in contrast to claims made by ENCODE [48])."; Friday, September 27, 2013 12:01:00 PM
Larry Moran said...: Yes. That's exactly what I'm saying.My "model" is based on decades of previous work. I also know from experience that "data" isn't always what it seems.

Why didn't you answer the question?; Friday, September 27, 2013 12:07:00 PM
judmarc said...: Nice, Jonathan. Almost makes one think that nothing in biology makes sense except in the light of evolution - oh, wait.... ;-)

(Hat tip to T. Dobzhansky.); Friday, September 27, 2013 12:45:00 PM
Anonymous said...: So last question first: The fact that its Nature does influence my opinion. I'm more likely to accept it.
I could swear I saw this paper 2-3 weeks ago and read the abstract. My impression was that they made a good case for more pervasive functional transcription. I didnt look at the details but I thought that rather than look at individual binding of proteins, which could be nonspecific, they looked for correlations that would indiciate functionality beyond a certain expected level. That seems reasonable to me....and it IS Nature of course.
I skimmed through it to see if it supported the "80% functionality" claim of ENCODE and though I missed the part you quote above I got the impression they've only accounted for a small fraction of the genome.
Disclaimer- I was a grad student 13 years ago; Friday, September 27, 2013 12:52:00 PM
Georgi Marinov said...: That paper cannot support or reject a 80% functionality claim (which wasn't even really made in the ENCODE paper, at least in the sense that most people understand functionality, it was all the writing and interviews about the ENCODE paper that did the damage)

It's a TBP ChIP-exo paper. The ENCODE claim is based on vastly more data and of many other different kinds.

Finally, there is no way that paper could be about "functional" pervasive transcription because there is really almost nothing about function in it to begin with.; Friday, September 27, 2013 12:57:00 PM
John Harshman said...: Well, the way you say it does make it sound like a bad thing. What you're really doing is being a good Bayesian. If your prior probability is low, it takes a lot to raise the posterior probability to any reasonable level. Or, to put it another way, if we have good reasons to believe X is not true, it takes better reasons before we should believe X is true. Or, still another way, extraordinary claims demand extraordinary proof.; Friday, September 27, 2013 1:31:00 PM
John Harshman said...: Really? The fact that it's in Nature tends to make me *less* likely to accept it. They seem too often to be going for sensationalism at the expense of science.; Friday, September 27, 2013 1:34:00 PM
Mikkel Rumraket Rasmussen said...: Yes, exactly. When you have accumulated a large body of knowledge that all implies that a specific model is accurate, it takes more than a single observation to overturn the whole thing again. We need to make sure the new observation cannot be accounted for under the previous model. This is correct empirical reasoning. The large body of knowledge previously collected justifies remaining skeptical of new contradictory observations until they can be independently verified as truly constituting contradictory evidence.

As you say, extraordinary claims require extraordinary evidence. What determines the "extraordinarity" of the claim is whether it contradicts already extremely well established facts.
In this respect, a 'mundane' claim is one that is already compatible with, or supports the established facts. Such a claim would only require mundane evidence.

Creationists emphatically do NOT understand bayesian empirical reasoning, which is why they also make horrible skeptics and have so many general issues with science.

In my opinion, bayesian reasoning as a basis for all empirical reasoning (and how and why expressions like "extraordinary claims require extraordinary evidence are so important, and what this really means both in theory and in practice) and as a rigorous approach to skepticism should be taught already in primary school and should most definitely be part of any and all critical thinking courses.; Friday, September 27, 2013 1:54:00 PM
Anonymous said...: OK, well I thought that finding TI init complexes is a bit more suggestive of function than individual protein binding, most of which is probably nonspecific. It suggests that more locations could be desribed as funnctional than are currently annotated, but far short of the ENCODE hyperbole and far short of what that DI would hope for.
I knew the Nature comment wouldnt go over well but I decided to be honest. I do trust them more, depite the fact that either Nature or Science has published stuff on: cold fusion, martian meteorite bacteria, arsenic containing bacteria.....and I think I remember some homeopathy crap being published in the early 80s.
But wasnt this less about the paper and more of a 'sociology of science' survey by LarryM ?; Friday, September 27, 2013 2:50:00 PM
SPARC said...: I am not a specialist in sequnece comparisons but I found the consensus sequences suspecious. They used the following sequences

TATA-Box: TATAWAWR
BREu: SSRCGCC
BREd: RTDKKKK
INR: YYANWYY

In addition the allowed for up to 3 mismatches per site.
I may be wrong but I guess that this means that one can find a TATA-box like sequence every 128 bps. Through defining spacing between the sites they added some further constraints, though.
However, IIRC it takes even less to obtain basal transcription. E.g., there are TATA-less promoters with or without initiators (INRs), OTOH INRs are dispensable from TATA containing promoters. Just combine a single SP1 site in either orientation with a TATA-box or an INR and you will obtain some transcription. I don't have time to re-read the paper but I guess O'Shea-Greenfield A and Smale ST 1992 had some data on this). BTW, SP1 is quite redundant as well: It's just a string of a few G interrupted by a single H prefearable a C. In addition, one should keep in mind that SP1 can be replaced by other TF binding sites. Thus, why should anybody be surprised by spurious background transcription?; Friday, September 27, 2013 4:09:00 PM
Anonymous said...: Biological slop is cell type specific, who cares? When I spill two buckets of paint, each will produce different patterns, but comparing those two patterns doesn't tell me much about the buckets of paint. Unfortunately, I see crap like this get forwarded around among teachers.; Friday, September 27, 2013 7:15:00 PM
Peter said...: My impression is that the very first sentence sets up a false dichotomy. I presume the rest of the paper proceed to pound the straw man into the ground

The discovery that transcription of the human genome is vastly more pervasive than what produces coding mRNA raises the question as to whether Pol II initiates transcription promiscuously through random collisions with chromatin as biological noise or whether it arises specifically from canonical Pol II initiation complexes in a regulated manner.

My mental model for biological noise does not involve Pol II binding at random to DNA and initiating transcription. Rather, it assumes that in an extremely long sequence of random nucleotides, there will be a large number of sites that coincidentally resemble canonical Pol II initiation sites and thus acquire many or all of the same modifications as "true" promoter sites. Obviously, therefore, they will show up in ChIP experiments as more or less indistinguishable from the promoters of protein-coding genes.

This experiment therefore adds almost nothing to the sum of human knowledge. Potentially a close study of all the identified transcription start sites would be able to refine our understanding of the minimal sequence elements necessary to create a TSS. It tells you nothing whatsoever about whether the transcripts are functionally relevant to the cell.; Saturday, September 28, 2013 5:54:00 PM
whimple said...: It's interesting to hear the authors' opinions. Since the conclusion one way or the other isn't central to either my expertise or to how I design my next experiment, I'm perfectly happy to let the field sort it all out with data, check back in on the evolving story from time to time and get the final word from the field in about 10 years time or so.

Don't confuse a paper in Nature for a chapter in a textbook. Part of the value of these type of provocative (to those in the field) concluding statements is to spur the field on to reproduce/extend/refute the results, which is the essence of the scientific process.; Saturday, September 28, 2013 6:24:00 PM
Unknown said...: With ChIP-exo, they're looking for peak pairs, which probably means that you can call many more peaks with the same significance threshold, right?. Whether it's meaningful to do so is another question...

One analysis from this paper seems to be missing - how many occurrences of their extended core promoter element motif are in the genome? If it occurs much more frequently than a typical transcription factor motif, then maybe 150,000 peaks wouldn't be so surprising.; Monday, September 30, 2013 11:36:00 AM
Georgi Marinov said...: As I said, I have not yet looked at the data. But there is the basic observation that you don't see nearly as many in regular ChIP-seq. That means that the order of magnitude difference is due to the ability to detect low-signal interactions that fall below the threshold of ChIP-seq. Which is not a bad thing on its own but it raises the question of how frequent and important they are.; Monday, September 30, 2013 11:44:00 AM
Unknown said...: I guess you were right in having doubts about this paper: it just got retracted this month - http://www.nature.com/nature/journal/v502/n7469/full/nature12535.html; Monday, July 28, 2014 6:40:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Friday, September 27, 2013

Transcription Initiation Sites: Do You Think This Is Reasonable?

21 comments :