tag:blogger.com,1999:blog-37148773.post6732039959575869411..comments2024-03-27T14:50:47.345-04:00Comments on <center>Sandwalk</center>: The "duon" delusion and why transcription factors MUST bind non-functionally to exon sequencesLarry Moranhttp://www.blogger.com/profile/05756598746605455848noreply@blogger.comBlogger22125tag:blogger.com,1999:blog-37148773.post-79416260506867100602014-04-12T21:11:31.093-04:002014-04-12T21:11:31.093-04:00I think the problem with the paper can be summariz...I think the problem with the paper can be summarized by noting that they did not show that TFs actually bind to these duons and consequently didn't show such binding had any effect on gene expression. They only showed that TF binding sites exist inside coding regions of genes, which is not news by any stretch as mentioned repeatedly above. <br /><br />A ChIP-Seq experiment involves pulling down a particular factor and sequencing the DNA that comes along with it, then mapping it to the genome. If they had done that and showed that significant peaks of ChIP-Seq data overlap the TF recognition site within an exon, they will have shown that the TF binds to this intra-exonic region, but it would still say nothing about the effects of this binding on the expression of the gene. To show this, they could split the sample such that some of it is analyzed for expression and then compare the two data sets.<br /><br />Personally, I wouldn't dismiss the possibility of seeing misregulation of gene expression based on TF binding to intra-exonic regions, but this just hasn't been shown yet, certainly not in the ENCODE paper. <br /><br />Since they did none of the experiments I proposed or other similar ones to show actual binding and/or misregulation of gene expression as a result of such binding, I completely agree with the criticism above, but I will say that I don't think it's fair to criticize ENCODE as a whole for the far-reaching and unjustified conclusions of the paper. After all, ENCODE is an enormous project with so many researchers, they can't be held accountable for the works of individual labs or even several labs, least of all for conclusions made in a paper published by one of these labs, just because the authors also happen to be involved in other ENCODE projects. The reviewers of Science deserve most of the criticism for this misleading article and the even more misleading hype around it. Ronahttps://www.blogger.com/profile/11788871697392812828noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-7179448107465527822014-01-10T21:28:57.958-05:002014-01-10T21:28:57.958-05:00I'm happy to concede that the data is correct....I'm happy to concede that the data is correct. It's the interpretation that's wrong.<br /><br />Genome-wide analyses may uncover something new but when authors claim that their genome-wide experiments are showing something that four decades of work with individual genes never showed, I have a right to be skeptical. Larry Moranhttps://www.blogger.com/profile/05756598746605455848noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-71742427755757006792014-01-10T20:39:53.657-05:002014-01-10T20:39:53.657-05:00Larry Moran:
"This corresponds to 1.8% of the...Larry Moran:<br />"This corresponds to 1.8% of the total. In other words, 98.2% of the binding sites were in noncoding DNA and 1.8% were in coding DNA"<br /><br />And yet we distinguish between DNA sequence: exon vs. intron. What's the difference? <br /><br />Larry Moran<br />"Undergraduates learn it in introductory courses. It's not a big deal and nobody has tried to make up a new word (like "duon") to describe this dual function."<br /><br />You always seem to overlook that the main advancement made with these genome-wide analysis is one of extent. Yes, we've know about TFs in exons but now we're getting at an accurate sense of scale and pervasiveness across the genome. And just because the analysis and methods used in these papers are over your head doesn't mean that they're wrong. caynazzohttps://www.blogger.com/profile/11263280738905977688noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-54134485832448070872014-01-10T00:16:36.079-05:002014-01-10T00:16:36.079-05:00Just out of curiosity: How reliable a method is ge...Just out of curiosity: How reliable a method is genomic footprinting considered nowadays? Figure S3 in the supplement suggests that DNAseq yields clear signals with full protection of the occupied TF bindig sites. However, I wonder if full signals are displayed or if some lower part of the signal has been ommitted? From what I remember from those days when genomic footprinting was done by running genomic DNA on and blotting from sequencing gels and hybridisation with radioactive single strand probes and the later established PCR based methods genomic footprinting is a quite fuzzy business. Een vin in vitro footprinting DNAse I treatment is critical because the result heavily depends on the concentration of the enzyme and the duration of the treatment. I.e. one will either miss binding sites due to over-digestion or consider DNAse-I resistance sequneces as being bound by a TF.SPARChttps://www.blogger.com/profile/09563722742249547887noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-74954362483307113272014-01-09T17:21:29.132-05:002014-01-09T17:21:29.132-05:00Where by "binding sites" I mean "oc...Where by "binding sites" I mean "occupied binding sites", I always use those interchangeably and it's wrong :(Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-63075552027476858922014-01-09T17:20:30.876-05:002014-01-09T17:20:30.876-05:00The expectation would also be that you would see m...The expectation would also be that you would see more binding sites in intronic sequence in highly expressed genes. I don't know if anyone has looked at that. Of course, housekeeping genes have fewer and shorter introns so one has to correct for that too. It's something to investigate further. Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-1270441400074223942014-01-09T17:18:53.157-05:002014-01-09T17:18:53.157-05:00The ""neutral" model is that TFs bi...The ""neutral" model is that TFs bind to DNA because they can, not because they are necessarily needed. The expectation would then be that you would see more binding sites in coding sequence in highly expressed genes because highly expressed genes have a more open chromatin structure (because they are transcribed more often) and more binding sites are accessible. It does not have to be a very strong effect though, it's just a general positive correlation you expect to see. Thus my post above.Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-16107153384768833872014-01-09T17:11:32.771-05:002014-01-09T17:11:32.771-05:00Thanks for the reply Georgi, but the plots you poi...Thanks for the reply Georgi, but the plots you pointed to are a bit underwhelming.<br /><br /><i>highly expressed genes should have more footprints under the ``neutral'' model.</i><br /><br />I'm not sure I understand that.<br /><br />We refer to Fig. S4 here: http://www.sciencemag.org/content/suppl/2013/12/11/342.6164.1367.DC1/Stergachis-SM.pdf<br /><br />Fig S4-C is really underwhelming. It shows average gene expression for exonic regions in three categories: having 0, 1-4, and 5+ TF binding sites. The boxes are at nearly the same height, and have big error bars. I worry that the binning values could have been chosen to make the effect seem larger. <br /><br />S4-C shows a very weak correlation such that more TF binding sites corresponds to slightly more gene expression, which according to Georgi rejects or weakly supports the "neutral" model.<br /><br />S4-D is a table of pearson's r-values, and they're weak, in the range 0.15-0.22. Again, rejects or weakly supports the "neutral" model.<br /><br />I'm not sure I get the principle, though. Do you mean that, if more TF binding were only due to, say, coding genes being in "open" regions of the genome (not bound to chromatin), then we should expect their expression level to be higher?Diogeneshttps://www.blogger.com/profile/15551943619872944637noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-21897430396124216512014-01-08T20:17:48.116-05:002014-01-08T20:17:48.116-05:001) Undergraduates are sometimes taught about the d...1) Undergraduates are <b>sometimes</b> taught about the details of Pol III transcription. It's by no means universally taught everywhere - I personally was never taught that.<br /><br />2) I can't comment on who knows what. First, I don't personally know the authors of that paper at all. Second, you have to spend a really large amount of time with someone to really get an idea what they know and what they don''t. Very few people spend that much time together.<br /><br />3) In general, if you want my bleak, cynical, pessimistic view of life, completely unrelated to the subject here, one does not actually need to know much about anything to produce papers in biology today. You only need to know enough to do the research and write it up. The system certainly does not force you to learn much and there are no checks that even whatever little is learned is retained long-term. Unless you invest a great deal of effort on your own, most of it completely unrelated to your research, you leave graduate school knowing a lot less than you did when you entered it, because you will have forgotten a lot of what you have learned as an undergrad (I have noticed this many times with myself, to my great dissatisfaction - things I had completely mastered years ago but have never touched since then and as a result I only have a vague recollection right now) and you will have only learned things in one narrow area. And you can actually be very successful following this model while trying to keep up with the literature on a wide variety of topics and broaden your horizons by exploring other fields can actually hurt you because it's time not spent directly doing research. Note that when I say this, I do not have anyone specific in mind, it's just how the system is set up. Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-54261377390818923582014-01-08T17:25:08.594-05:002014-01-08T17:25:08.594-05:00Top 10 alternate meanings for the ENCODE acronym:
...Top 10 alternate meanings for the ENCODE acronym:<br /><br />#10 Enabling Numerous Claims Of Designer(ID) Efficiency<br /><br />#9 I cant actually think of anymore. I suppose this is why I'm not a comedy writerAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-37148773.post-55408082413522006432014-01-08T16:56:01.037-05:002014-01-08T16:56:01.037-05:00In the paper, they do look at conservation too ......<i>In the paper, they do look at conservation too ...</i><br /><br />Hmmm .... they find that coding regions of genes are conserved. But they also seem to find that the transcription factor binding sites are "significantly younger;" whatever that means. I've read that section of the paper several times and tried very hard to understand Figure 1 but I don't get it. I guess I'm too stupid to appreciate the sophistication of their assays. <br /><br /><i> there is nothing too surprising about regulatory elements located in coding sequence </i><br /><br />Exactly, I was going to bring up Pol III promoters in my next post. We have known for over thirty years that there are transcription factor binding sites within the genes for transfer RNA and 5S RNA. Thus, these genes contain nucleotides that play a dual role, they determine part of the functional region of their RNA products AND they are the sites of transcription factor binding. <br /><br />This is in all of the textbooks. Undergraduates learn it in introductory courses. It's not a big deal and nobody has tried to make up a new word (like "duon") to describe this dual function. (And in this case it really is dual function, not just speculation.) <br /><br />Georgi, how many of the authors on this paper know about Pol III transcription? Do you think all of them do? Could you have written that paper without mentioning that internal promoters is old hat?<br /><br />I don't think I'm being unfair when I criticize ENCODE workers. That doesn't mean that ALL of you are ignorant. :-)<br />Larry Moranhttps://www.blogger.com/profile/05756598746605455848noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-42972982651217807732014-01-08T16:48:46.561-05:002014-01-08T16:48:46.561-05:00It has been checked but not exactly in the way you...It has been checked but not exactly in the way you want it. People have been expressing exogenous DNA binding proteins in eukaryotic cells for a very long time, and they do bind to DNA. It's just that nobody has expressed a bacterial TF and then done ChIP-seq on itGeorgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-88389491876365810702014-01-08T16:21:10.760-05:002014-01-08T16:21:10.760-05:00Don't want to demonise anyone. Let me be more ...Don't want to demonise anyone. Let me be more precise: I find ChIP-seq work on TF binding and chromatin marks wonderful in the sense of opening hypothesis about how gene regulation is going on. What I don't like in ENCODE and similar works is the automatic link between binding event/chromatin modification and biological function. In this regard, observing a bacterial TF binding in thousands of places in a mammalian genome would illustrate that the binding/function link should not be done until one gets more evidence, and that's what I find would be difficult to reconcile with the conclusions of ENCODE-like papers. I don't think ENCODE people are being dishonest or anything, it's just an interesting scientific debate that needs to be done. About pioneer x "conventional" factors, it's true a bacterial TF should behave more like a "conventional" TF, binding previously open chromatin (including exons), it would be great to check this. Anonymoushttps://www.blogger.com/profile/03815336520462671330noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-79511185909485231992014-01-08T15:54:49.246-05:002014-01-08T15:54:49.246-05:001) There's been an entirely unnecessary demoni...1) There's been an entirely unnecessary demonization of ENCODE people - they happen to be a lot more reasonable than what you might be lead to believe by what goes on in the blogosphere. What I would classify as really egregious claims (because they are backed by a long history of repeatedly making them before that) has been coming out of people who are not part of ENCODE. <br /><br />2) I don't see why anything would be difficult to explain. Current thinking is that you can separate TFs into two broad categories - pioneer factors that can bind to compact chromatin and open it, and others that preferentially bind to open chromatin. There is a flaw in this, which is that the pioneer factors are not much different in their sequence preferences from the others - it is still 6-8bp motifs and they do not open the chromatin everywhere those sequences are found, so there has to be another source of specificity, whether it is combinatorial occupancy or something else, but the distinction is useful for the discussion here. There is no histone code in bacteria because there are not histones thus it is unlikely that sigma32 would act as a pioneer factor. Therefore the expectation is that it will bind to regions of already open chromatin that contain the recognition sequence. Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-78252025409076398462014-01-08T15:41:30.669-05:002014-01-08T15:41:30.669-05:00Ok, I agree that it would be cheaper with a cell l...Ok, I agree that it would be cheaper with a cell line, it's just that usually I don't find work with cell lines very reliable :) and transgenic mouse lines could be deemed more "physiological". And yes, everyone knows the results will be "positive", perhaps that's why it's not done. Imagine if the binding pattern of sigma32 in neurons or pancreas were qualitatively similar to the endogenous binding of Otx2 or Ngn3? ENCODE people would find it quite difficult to explain.Anonymoushttps://www.blogger.com/profile/03815336520462671330noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-7296101447695425492014-01-08T15:25:56.603-05:002014-01-08T15:25:56.603-05:00Everyone can do it - it's a <$2K experiment...Everyone can do it - it's a <$2K experiment. You don't have to make a transgenic mouse - just express a GFP-tagged version it in a cell line and ChIP. <br /><br />It has indeed not been done and the reason it has not been done is that everyone knows you will get a positive result. This is the same principle that ZFN, TALEN and CRISPR-based genome engineering technologies use and they work quite well. Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-76521931914061449842014-01-08T15:13:28.218-05:002014-01-08T15:13:28.218-05:00Typically, one does not calculate the expected num...Typically, one does not calculate the expected number of occurences based on the k-mer length of the recognition sequence, but in bits based on PWMs (Position Weight Matrices), And you do not do this against an uniform base composition background model but against whatever the sequence composition of the genome you actually work with is. <br /><br />Restriction enzymes are actually different from eukaryote TFs in that respect - they have very strong sequence specificity even if within a short k-mer. All positions carry 2 bits of information in most cases. Transcription regulators differ too between bacteria and eukaryotes - bacterial ones tend to be a lot more specific than eukaryotic ones, which have very loose recognition preferences. Which is very interesting but is a long subject on its own. <br /><br />But all of that does not really matter here - there is the distinction between transcribed vs non-transcribed genes that Larry mentioned in the OP, with transcribed genes being more accessible to DNA binding proteins. That would account for the overrepresentation. It is not easy to disentangle cause from effect here though - one can argue that TFs bind to these genes because chromatin is more accessible, but on the other hand, one does expect to see expressed genes being regulated by TFs (because expression in eukaryotes is generally positively driven as you have to overcome the chromatin barrier to start transcribing). <br /><br />There is one prediction that can be tested and that can address the functional vs non-functional question (as a statistical trend, not for each individual instance) and it is that highly expressed genes should have more footprints under the ``neutral'' model. They did look at this - Figure S4 here:<br /><br />http://www.sciencemag.org/content/suppl/2013/12/11/342.6164.1367.DC1/Stergachis-SM.pdf<br /><br />And they do find correlation between expression levels and the number of footprints.<br /><br />This can be further refined into housekeeping and non-housekeeping genes - generally housekeeping genes tend to be subject to less regulatory complexity than tissue-specific genes. They did not look at that and it would be interesting to parse this further.Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-54592493543789122802014-01-08T14:59:40.088-05:002014-01-08T14:59:40.088-05:00Nice analogy with restriction enzymes. So I guess ...Nice analogy with restriction enzymes. So I guess we need someone to do the following experiment: make a transgenic mouse line that expresses a bacterial TF with no vertebrate counterpart (say, a sigma factor) and then do some ChIP-seq analyses. As you say, I bet they should find thousands of totally spurious binding sites wherever there is accessible chromatin, in a pattern resembling the binding of mammalian TFs. But who would have the money (and the courage) to do this? Anonymoushttps://www.blogger.com/profile/03815336520462671330noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-30263939138575766422014-01-08T14:56:51.575-05:002014-01-08T14:56:51.575-05:00In the paper, they do look at conservation too, in...In the paper, they do look at conservation too, including at 4-fold degenerate sites, and that is a strong case for functionality of these binding sites. That said, there is nothing too surprising about regulatory elements located in coding sequence - enhancers located in introns are well known, some of the Pol3 promoters are specified by sequences quite downstream of the transcription start site, etc. etc. From the perspective of the transcription apparatus there is no coding vs noncoding sequence distinction - that is only made much further downstream by the translation machinery. <br /><br />The important thing here is that it was explicitly shown (well, OK, not quite, a lot of it is based on footprints and motifs plus some ChIP-seq, but is still strong evidence) that it is TFs that constrain some degenerate positions. Which is by no means a new idea but is still an important contribution.<br /><br />How the PR was handled is an entirely different subject...Georgi Marinovhttps://www.blogger.com/profile/12226357993389417752noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-36780631380241608932014-01-08T14:29:02.354-05:002014-01-08T14:29:02.354-05:00The case that Stamatoyannopoulos and colleagues ma...The case that Stamatoyannopoulos and colleagues make in Science builds on ideas advanced by Tamar Schaap in 1971. They cite the many fine papers of Richard Grantham but, perhaps due to space pressure from the Editors, omit reference to Schaap’s seminal study. His paper in the Journal of Theoretical Biology (32, 293-298), and the work of Grantham et al., form the basis of webpages that deal extensively with the issues raised (e.g. Schaap http://post.queensu.ca/~forsdyke/bioinfo3.htm ). <br /><br />Weatheritt and Babu comment on the paper in the same issue of Science. They point out that the ‘duon’ notion refers only to two of the many forms of information that compete for genome space. As to whether these forms of information can “harmoniously exist,” there has long existed evidence on the intragenomic conflicts requiring the “possible tradeoffs” to which they refer. Much of this was dealt with in my textbook – Evolutionary Bioinformatics – which is now in its second edition (2011) - and in my contribution to Lewin’s GENES XI (2014).<br /><br />Consistent with the Stamatoyannopoulos thesis, very strong selection acting at synonymous coding positions is becoming more widely recognized, for example in HIV-1 (Mayrose et al. 2013; Forsdyke 2014) and in the fruit fly genome (Lawrie et al. 2013). <br /><br />Lawrie et al. (2013) Strong purifying selection at synonymous sites in D. melanogaster. PLOS Genetics 9 e1003527.<br />Mayrose et al. (2013) Synonymous site conservation in the HIV-1 genome. BMC Evolutionary Biology 13:164.<br />Forsdyke DR (2014) Microbes and Infection (in press; DOI : 10.1016/j.micinf.2013.10.017)<br />Donald Forsdykehttps://www.blogger.com/profile/18038104286639798795noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-24308277256551661922014-01-08T14:27:22.224-05:002014-01-08T14:27:22.224-05:00Let's run a few numbers. If a TF recognizes 6 ...Let's run a few numbers. If a TF recognizes 6 bp, then it will occur at random every 4000 bp, as Larry pointed out.<br /><br />Let's assume these guys tested all coding exons. Say there are 20,000 coding genes [leaving out RNA genes] and the average is, say, 1500 bp. That's 30 million bps.<br /><br />If a TF recognizes 6 bp and would bind at random every 4000 bps, they should have found 30 million/4,000 = <b>7500 binding sites per TF they tested.</b><br /><br />In fact they found 24,842 TF binding sites, about 3.26 times more than you would expect at random for one TF that recognizes 6 bps.<br /><br />So how many TF's did they test for?<br /><br />This result would be affected by: <br /><br />1. If they did not test all coding exon regions;<br /><br />2. If their TF's recognize more than 6 bps (in which case the number of random hits should go down.)<br /><br />Anyone have stats on this? Georgi perhaps?<br />Diogeneshttps://www.blogger.com/profile/15551943619872944637noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-49217141858116473372014-01-08T14:20:00.487-05:002014-01-08T14:20:00.487-05:00Although TF binding within exons may serve multipl...<i>Although TF binding within exons may serve multiple functional roles, our analyses above is agnostic to these roles</i><br /><br />So you have no evidence of functionality. IOW, you got squat.Diogeneshttps://www.blogger.com/profile/15551943619872944637noreply@blogger.com