More Recent Comments

Thursday, September 13, 2012

Groupthink Science And That 'Junk DNA'

The IDiots (e.g. Tom Bethell) over at Evolution News & Views are gloating about a comment made on The Wall Street Journal website [Why ENCODE Is a Significant Defeat for Darwinism].

The WSJ article is: Groupthink Science And That 'Junk DNA'.
Anyone with even the slightest understanding of the evolutionary process knows that evolution is too relentlessly efficient to have allowed most, or even large sections, of DNA to be "junk" ("'Junk DNA' Theory Debunked," U.S. News, Sept. 6). Any intelligent scientist would have simply said, "I don't know."

Unfortunately, this says something important about the quality of contemporary Ph.Ds. Groupthink has become pervasive in part because of how research is now financed: grants. The disillusioning sociological aspects of scientific research that Thomas Kuhn identified more than four decades ago have become more pronounced, not less.

Tom Shillock

Portland, Ore.
This is exactly backwards, in my opinion. The real problem is that many scientists think, incorrectly, that natural selection would have removed all junk DNA so they are looking for reasons why it isn't junk. If they can't find evidence then they just make up a story or re-define the word "function." They don't have even the slightest understanding of evolution, just like Tom Shillock.

UPDATE: Shapiro and Sternberg Anticipated the Fall of Junk DNA.


61 comments :

Jud said...

This would be that beacon of intelligent science, the Wall Street Journal? The one that lacks any "sociological aspects" to its evaluations of scientific research on subjects such as oh, let me see, climate change?

So they've got no axe to grind here at all, do they?

Curious Wavefunction said...

I find it remarkable that even now, in the twenty first century, scientists are trying to find purpose in every part of the genome. Except that instead of God, they are looking for natural selection's fingerprints in every single DNA base.

T Ryan Gregory said...

In other words, "Darwinism" predicts no junk DNA.

konrad said...

The comment is based on what appears to be a widespread perception that most noncoding DNA was assumed to be nonfunctional based merely on absence of evidence for function. The idea that there can be _positive_ reasons for thinking specific segments of DNA are non-functional just doesn't seem to occur to many commenters (perhaps because to non-specialists it's not obvious how such reasons can exist at all). That's a key point that the field of molecular evolution has obviously not succeeded in communicating to the rest of the world.

Curious Wavefunction said...

I agree. From a positive evidence standpoint, the presence of junk DNA makes perfect evolutionary sense if you realize that evolution is a messy, inefficient, haphazard process.

Allan Miller said...

... and on bass, let's hear it for Thomas Kuhn ... (ba-dum-dum-dum-dum).

Love how creationists invoke Kuhn as some kind of Law - 'Everything We Know Is Wrong' - then further mangle him by effectively suggesting that the hoped-for 'revolution' is a return to a 150-year-old-paradigm.

Anyone with even the slightest understanding of the evolutionary process knows that evolution is too relentlessly efficient to have allowed most, or even large sections, of DNA to be "junk"

So ... some selection coefficients to back that up, perhaps?

Larry Moran said...

Exactly!

I didn't realize the hypocrisy until after I had posted. The IDiots have been telling us for years that Darwinism predicted junk DNA and when we finally discovered that there was no such thing it would refute Darwinism.

It's hard to square that with their latest post.

Shawn said...

Godism predicts no junk DNA/Darwinism predicts no junk DNA....one of these things is not like the other (nod to Sesame Street). Seems to me one of these views would be much more susceptible to evidence than the other. Naturally.

Joe Felsenstein said...

Anyone with even the slightest understanding of the evolutionary process knows that evolution is too relentlessly efficient to have allowed most, or even large sections, of DNA to be "junk"

I teach one of the few professional-level courses in theoretical population genetics in North America, and I don't know that evolution is that relentlessly efficient. As Alan said: show us the selection coefficients.

Psi Wavefunction said...

Sigh... this whole 'debate' has entered the same realm as creationism as far as I'm concerned, where the opposing side is not even wrong, and there's just no point in engaging them.

"Anyone with even the slightest understanding of the evolutionary process knows that evolution is too relentlessly efficient"
Thanks, random writer, glad you have so much more understanding of the evolutionary process than the bulk of people actively studying it. Wish I had so much 'confidence'.

The Thought Criminal said...

Shawn, I would guess that roughly 98% of religion has never expressed a single thing about "junk" DNA or DNA of any kind.

There was a time when what you just did would be called, informally, of constructing a "straw man", to be knocked down for your own edification. But, since you are both setting it up and knocking it down, anyone who doesn't hold with it can rightly say it's got nothing to do with what they believe. But that was before pop atheists destroyed the usefulness of the term "straw man".

Shawn said...

Shawn, I would guess that roughly 98% of religion has never expressed a single thing about "junk" DNA or DNA of any kind.

heh heh. Sure, religion had to wait for DNA and its role in life to be discovered by science.

But anyway the comments above pertain to that view held by some, that something designed by god would be "perfect" and not have useless components. This is in the spirit of the OP afterall, that an intelligently designed thing would not have junk. The statements were not meant to imply that all religions or all religious people share that view. Its not the "straw man" you are looking for.

The Thought Criminal said...

Back up your statement. If it covers the modern period that should be easy enough.

Anyone who believes God created the universe would hold that God crated everything, including DNA, no matter what character it is believed to have at any point. As this week's discussion shows there isn't total unanimity on what what that is among scientists who have the slightest idea what this argument is,. I'd guess they're pretty much limited to those who specialize in the relevant topics.

In one of my most satisfying moments in a discussion on how soon physics would arrive at a theory of everything I got Sean Carroll to admit that there is not a single object in the universe for which physics knows comprehensively and exhaustively. Not any atom or molecule, crystal, .... not one. My point in getting him to admit that, and it was like trying to get blood from a stone, was to point out that expecting a theory of everything in the universe when there wasn't a theory of everything about a single subatomic particle seemed a bit premature. Given that, anyone who thinks biology, studying something as extremely complex as DNA, collectively, including many different kinds of DNA and in the context of complex cellular chemistry and biology, asserting anything as definite as that 80% of it is non-functioning "junk" is probably going a bit farther on a limb by a factor of many times over ten.

I'd advise talking a little less tall.

Anonymous said...

Shawn,

Please look carefully at TTC's comment before you try and answer. Notice that he did not actually answer to what you said, but to what he thinks you said. It is a waste of time to answer him. He just won;t get it. He understands whatever he wants, and answers to that. When I say "whatever he wants to understand" I don;t mean "of everything you say," I mean whatever he wants it to mean.

After a few "misunderstandings" he will tell you something about Darwin quote mining and advocating for eugenics, hating women, or something about PZ myers, or something about materialists and logical positivists as if any of that had anything to do with whatever you say.

I think this is not out of malice, but out of utter stupidity. Mental incompetence. Others think it is trolling. Whatever.

sk said...

Larry, will you reconcile John A. Stamatoyannopoulos sitting next to Ewan said

By scanning the genome in hundreds of different cell and tissue samples, we have annotated regulatory regions that -- when tallied together -- account for around 40% of the genome sequence.

while Ewan Birney a week ago wrote

using very strict, classical definitions of “functional” like bound motifs and DNaseI footprints; places where we are very confident that there is a specific DNA:protein contact, such as a transcription factor binding site to the actual bases – we see a cumulative occupation of 8% of the genome.

Georgi Marinov said...

These numbers depend on how you define things - it has a lot to do with the resolution of high-throughput sequencing-based functional genomics assays. A ChIP-seq peak is always much larger than the actual binding sites for the TF, a DNAse hypersensitive region is larger than the actual meaningful sequences inside it.

The one assay that ENCODE has done that really gives you base-pair resolution is DGF (Digital Genomic Footprinting) and it was John Stam's group but I don't think that's what he is referring to - he is talking about regular DNAse.

These are the papers you may want to read to see what the results from it are:

http://www.ncbi.nlm.nih.gov/pubmed/22959076
http://www.ncbi.nlm.nih.gov/pubmed/22955618

The 8% number is the relevant one

The Thought Criminal said...

Shorter Negative Entropy: I didn't like what he said.

You do know, don't you, that a question has to be asked before an answer is given to it, don't you, NE? I see no questions in Shawn's comments I addressed. Addressed, not answered. You atheist boys seem to miss some of these subtler aspects of basic rhetoric and literacy.

Claudiu Bandea said...

Joe: I don't know that evolution is that relentlessly efficient

That’s a very important point and, unfortunately, it has been rarely discussed in the published literature on the junk DNA (jDNA).

In the introduction of my hypothesis that jDNA functions as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting coding sequences from insertional inactivation or alteration of their expression (see the article at Larry’s post: A Tribute to Stephen Jay Gould), I pointed out that:

It is not known if secondary DNA has accumulated simply because its rate of deletion has been lower than that of origin, or because individuals possessing secondary DNA (i.e. jDNA) have a selective advantage (parenthesis added)

However, clearly, some organisms, such as bacteria, have very efficient mechanisms of discarding jDNA. Don’t you think that the organisms that have lots of jDNA would have maintained similar mechanisms if the jDNA did not provide some kind of advantage?

Larry Moran said...

The human genome is 3,200 Mb in size. If 8% is directly bound by transcription factors then this means that transcription factor binding sites account for 256 Mb of DNA. That's 256,000,000 base pairs.

Most binding sites are 6-8 bp in length but let's be generous and assume that all sites are sites of dimer interaction and they are the maximum size we've ever seen, say 25 bp in length.

That means 10,000,000 transcription factor binding sites in the genome.

We know for sure that there are about 25,000 genes consisting of 20,500 protein encoding genes and about 4,500 genes that make functional RNAs (tRNA, ribosomal RNAs, small RNAs, regulatory RNAs etc). Let's be generous and assume that there are another 5,000 undiscovered regulatory RNAs out there for a total of 30,000 genes.

What is the implication? It means that if ENCODE is right there are, on average, 333 transcription factors regulating every gene. The regulation of hundreds of mammalian genes has been studied intensely over that past three decades. I don't know of a single example where the known regulatory sites even begin to approach this number.

The second paper you referenced says that there are 45,000,000 "occupany events." Those workers claim that this represents 8,400,000 binding sites for an average of 280 per gene. The total amount of DNA in those sites would correspond to about 84,000.000 bp of DNA if we use a more realistic value of 10 bp per binding site. That's only about 3% of the genome and it's a lot less than the 8% that Birney quotes.

Do the majority of ENCODE PI's actually believe that it takes about 300 transcription factor binding sites to regulate transcription of a typical gene?

Georgi, in your group meetings has anyone ever done a back-of-the-envelope calculation like this so you can discuss the implications of the claims being made on behalf of the consortium?

Georgi Marinov said...

Eukaryotes have very elaborate genome defense mechanisms (piRNA for example)

Georgi Marinov said...

Those are binding events - it does not mean they are functional. The biggest challenge in the field is to model the relationship between TF binding and gene expression and incorporate that into predictive models. However, one thing that has become abundantly clear (and we've struggled a lot with) is that whatever way you look at it, there is pretty much no correlation between those things - you see activators bound to genes that don't do anything as a result, repressors on active genes, etc. Based on those observations one is forced to accept that many of these sites have to be irrelevant - but which are the ones that matter is an open question, and sometimes it is some of the strongest ChIP peaks that fall in the "mysterious" category of sites either in the middle of nowhere or where they should not be. Probably a TF binds if it can - the question is why it can bind to some places and not to others given how degenerate most motifs are, and why it has effect on some sites and most likely no effect on others.

In the Cell paper they make the assumption that binding within 10kb of the TSS means regulation and they exclude everything else - that's not a valid assumption and they themselves admit that at the very end but they have to make it if they are to be able to do the network analysis (I myself wouldn't pick 10kb as the value for that parameter - it seems too loose, but it is arbitrary any way you do it). Other than that what one can claimed is occupancy, not necessarily function. That data is fairly recent, BTW, and the Cell paper was not part of the official package - it was in fact news to me when I saw it the day after the whole package was released.

Anonymous said...

Sure. "Addressing" instead of "answering" makes your misunderstandings understandable.

Perfectly in line with your mental challenges TTC. Please keep showing off. Makes evidencing your stupidity much easier.

SLC said...

I would point out that Tom Bethell is a world class moron who also denies the Theory of Relativity, in addition to evolution. A nincompoop of the first order.

The Thought Criminal said...

Negative Entropy, did you ever take 8th grade English? Or the equivalent? Because your thinking is about on a 12-year-old's level. Shawn asked no questions, I addressed what he said. I was under no obligation to render an answer because no question was asked.

You remind me more of "gilt" every time I read one of your dumps of distraction.

Shawn said...

Back up your statement. If it covers the modern period that should be easy enough.

Huh? Isn't that what a component of this discussion is about, that Intelligent Design advocates doubt that junk DNA would exist because "you know who" wouldn't generate junk? Isn't it the same creationist argument offered with respect to vestigial organs, that they are merely organs for which function has yet to be found, because an intelligent designer ("you know who") would not create non-functional elements? Hasn't it always been a dominant view that all of the universe is perfect and therefore clearly designed by "you know who"?
There was no straw man argument being made as I was referring only to the issue of intelligent design and junk DNA, but I could have been broader with my brush and still not be guilty, by and large, of such an argument.

Diogenes said...

TTC: In one of my most satisfying moments in a discussion on how soon physics would arrive at a theory of everything I got Sean Carroll to admit that there is not a single object in the universe for which physics knows comprehensively and exhaustively. Not any atom or molecule, crystal, .... not one. My point in getting him to admit that, and it was like trying to get blood from a stone, was to point out that expecting a theory of everything in the universe when there wasn't a theory of everything about a single subatomic particle seemed a bit premature.

I sincerely doubt the truth of this. Quantum field theory is by far the most accurate, most predictive, most self-consistent and complete branch of knowledge that humans have ever devised in any field, period.

We sure as hell understand exactly what's going on in electrons and quarks.

So I'd like to see a hyperlink where Sean Carroll says we don't understand the electron!

The Thought Criminal said...

Oh, but I can show you where I did it.

At comment 7 Anthony McCarthy asked:

I’d like to repose that question. Is there a single object of which physics has a comprehensive and exhaustive knowledge?

At comment 21 I sweetened the pot:

I’ll make a deal, if Sean will answer the question I put to him, I won’t post another comment here. Is there a single object that physics knows comprehensively and exhaustively?

At comment 25, Sean Caroll took the deal:

Anthony @ 21: “No.”
Thanks for commenting.


http://blogs.discovermagazine.com/cosmicvariance/2010/09/23/the-laws-underlying-the-physics-of-everyday-life-are-completely-understood/

I'd been trying for days and days to get him to answer the question on another comment thread:

At 136 Anthony McCarthy asked:

If Sean Carroll was participating in this discussion I’d like to ask him if he is asserting that there is a single object studied by physics about which, literally, everything is known to science. And I really do mean a comprehensive and exhaustive knowledge.

http://blogs.discovermagazine.com/cosmicvariance/2010/09/02/stephen-hawking-settles-the-god-question-once-and-for-all/

I first asked it on Sept. 6th and finally bribed Carroll into answering it on Sept. 23. I kept my side of the bargain and haven't posted another comment on his blog.

I knew he'd hate to answer it and I knew the answer he'd have to give. But it was relevant to the discussion.

Now, how much would Larry Moran guesstimate is the percentage of that same level of exhaustive and comprehensive information that molecular biology currently knows about DNA and the entire context in which it exists and operates? More than physics knows about electrons, to use Diogenes example?

The Thought Criminal said...

I suppose it would be immodest of me to speculate but I think Sean Carroll might have posted that second piece in response to my pertinent question. It does seem to me that, in light of his confession, the idea that physics is going to have a real Theory of Everything any time soon is a bit silly. To assert it about DNA is ridiculous.

Anonymous said...

TTC,

If you think that 8th grade English supports your attempt at playing the semantics between "answering" and "addressing" to justify your reading comprehension disabilities, then you are in much worse shape than I would have anticipated.

Playing vocabulary tapes all night to learn such words as "gilt," while thinking that putting it between quotes makes you look knowledgeable, does not help your case much either.

Poor TTC indeed.

The Thought Criminal said...

Huh? Isn't that what a component of this discussion is about, that Intelligent Design advocates doubt that junk DNA would exist because....

It's the New Atheist Two-Step. You said:

Godism predicts no junk DNA/Darwinism predicts no junk DNA....one of these things is not like the other (nod to Sesame Street). Seems to me one of these views would be much more susceptible to evidence than the other. Naturally.

I doubted that c. 98% of religion had ever said anything about DNA, you began walking it back at that point but you didn't admit that you were talking about a small faction of ID Industry creationists, which is hardly all of religion.

"Perfection" is hardly something that people are equipped to identify in the physical universe or anywhere else. I don't think people would know it if they saw it.

The Thought Criminal said...

NE, if you want to continue demonstrating that the new atheism is a shallow, bigoted, pseudo-intellectual fad that depends on the ignorance and conceit of its adherents and their indifference to the truth, I'm powerless to stop you.

Claudiu Bandea said...

Georgi Marinov: Eukaryotes have very elaborate genome defense mechanisms (piRNA for example)

Please, let’s keep this quiet! If the proponents of ENCORE’s 80% functional DNA paradigm hear about piRNA they might use it as a ‘get out of jail free card’, or might even surprise us with a new paradigm that 110% of the human DNA is ‘functional DNA’ (fDNA).

On a science note, I think that piRNA is strong evidence for the extraordinary selective pressure imposed by inserting elements, which supports my hypothesis on the role of jDNA. As I discussed before, it is very likely that the entire RISC machinery has evolved as a defense mechanism against endogenous and exogenous viral elements, and only later it has acquired the secondary function of regulating the expression of some host genes.

Georgi, because you brought into discussion piRNA, I want to expand on the fact that they are expressed primarily in the germ-line tissues. The question is why? Obviously, these are the tissues where the endogenization of exogenous viruses and the amplification of existing endogenous elements occur, so it makes sense. However, there might be more to this story.

Clealry, there are many transposition events by endogenous transposable elements as well as numerous proviral insertion events by exogenous viruses (e.g. retroviruses, such as HIV) in somatic cells. So, why isn’t piRNA expressed at high level in these cells?

Possibly, becouse the vast majority number of insertional mutagenesis events in the somatic cells, even if they lead to cellular death, is of little significance as these events are covered up by normal cellular turnover. There is an exception, however, a big exception, and that is: the danger of neoplastic transformation, cancer, induced by insertional mutagenesis events. This is one of the main tenets of my hypotheis about the selective forces behind jDNA as a protective mechanism in organisms, such humans.

I’m fairly sure you will follow up with this question: if that is the case, and the selection for protection mechanisms against cancer is so strong, why not have also the piRNA sytem working in the somatic cells?

Anonymous said...

"Sigh... this whole 'debate' has entered the same realm as creationism as far as I'm concerned ..."

I agree. It's an embarrassment to science. Lots of hand-waving and speculation, lots of bias, lots of mean-spirited critiques about the way other people use words (e.g., "junk"), lots of big egos being threatened by other people's efforts.

Diogenes said...

Fuck, I'm so jealous of Sean Carroll. Carroll could make TTC disappear just by saying physics don't understand an electron.

If I could make TTC disappear, I'd say DNA can't tell OJ Simpson from an eggplant.

Diogenes said...

And weaseling out when asked a simple question!

At the Science Live Chat, Stamatoyonnopoulous was asked, "How does work performed by "junk DNA" differ from epigenetics?" and he dodged the question-- he and Ewan did not acknowledge the existence of the word "Junk."

At the REDDIT interview, I asked some pointed questions intended to reveal contradictions in how they re-define "function." However, no one responded and my questions were voted down-- I'm now at zero, slightly above BornAgain77.

Diogenes said...

Georgi, thanks for all your help.

You understand the "Junk DNA" hypothesis; you understand there are positive arguments for "Junk DNA".

Do you agree that the the Junk DNA hypothesis requires a definition of "function" that is different from, and much stricter than, is used in the 80% number in the abstract of the ENCODE summary and the press releases?

Do you agree that the the Junk DNA hypothesis requires a definition of "function" that is different from, and much stricter than, is used in the 40-50% number given by John Stamatoyonnopoulos in the Science Live Chat?

Do you agree that 8-9% is a more appropriate number for "function" using the definition relevant to the Junk DNA hypothesis?

The Thought Criminal said...

Ah, Diogenes, for some reason your comment reminds me of Truman Capote's answer on being asked what it was like to have sex with Errol Flynn. "If it wasn't Errol Flynn, I don't think I'd have remembered it". It was Sean Carroll who I got to admit that point. You've got nothing I want.

And it was the refutation of the pretense that physics was near having a TOE. a rather more important point.

Diogenes said...

Indeed he is a moron-- he wrote "The Politically Incorrect Guide to Science." Horrible.

That includes arguments like: cancer isn't caused by mutations (probably got that from Jonathan Wells) and a moderate amount of pollution is good for you. I hope every morning he enjoys a nice mug of mercury and dioxin.

Diogenes said...

Oh, gimme a break. Let's suppose it takes 100 years to unite QM with gravity. So what? The fact remains we have a total theory of electrons and quarks right now, now matter what gravity is up to.

We can now calculate the mass of the proton, from its constituent quarks + QCD, using lattice gauge methods. Sure, you need years on supercomputers, but it can be done. What exactly do we not understand about ANY subatomic particles? What, exactly, is the question?

Name one interaction that an electron or quark participates in, that's can't be predicted, at least on a probabilistic QM basis.

Diogenes said...

@Larry and Georgi,

The second paper you referenced says that there are 45,000,000 "occupany events." Those workers claim that this represents 8,400,000 binding sites for an average of 280 per gene. The total amount of DNA in those sites would correspond to about 84,000.000 bp of DNA if we use a more realistic value of 10 bp per binding site. That's only about 3% of the genome and it's a lot less than the 8% that Birney quotes.

Let's reverse engineer Stamato---'s 40% regulatory genome number to get what number he's using for the size of each binding site. Wanna bet it's bigger than 6-8 bp?

40% x 3,200 Mbp in genome = 1,280 Mbp
1,280 Mbp / 8.4 million binding sites = 152.4 bp / site.

Hmm, a bit bigger than 6-8 bp.

But the NY Times article by Gina Kolata said:

“Now scientists have discovered a vital clue to unraveling these riddles. The human genome is packed with at least four million gene switches that reside in bits of DNA that once were dismissed as “junk” but that turn out to play critical roles in controlling how cells, organs and other tissues behave."

Let's try that again, this time using those 4+ million switches every one of which plays a "critical roles."

40% x 3,200 Mbp in genome = 1,280 Mbp
1,280 Mbp / 4+ million binding sites = < 320 bp / site.

Georgi posted a nice graphic before showing the resolution of the binding assay. The resolution looked like a gaussian-- I don't know what they took as its width. [http://www.nature.com/nmeth/journal/v5/n9/images/nmeth.1246-F1.jpg]

Georgi, do you have any sense as to the width number Stamato--- is using for each peak? Or conversely, what number of binding sites he used?

Before you wrote:

Georgi: Because you're size-selecting at ~200bp, the peak is always much much bigger than the 8bp of the binding site. So when you do peak calling you get something much bigger as that's what the resolution of the assay is; you have to bring in a lot of orthogonal evidence to say where the actual binding site is. You can have other binding sites very close by, etc, it's not at all that straightforward to get the actual binding site and count just that,

So let's rerun it, in reverse this time:

40% x 3,200 Mbp in genome = 1,280 Mbp
1,280 Mbp / 200 bp = 6.4 million binding sites.

Not the 8.4 million Larry quoted-- they might overlap.

Georgi Marinov said...

For DGF, the size of the footprint is whatever it is. It's going to be 6-8 bp for a HLH factor, but for CTCF is going to be quite a bit more than that. And then the footrpint that's calculated need not be exactly the size of the binding site, it can be longer than that. The "four million gene switches" is a metaphor for the public - obviously we've known about these things for a long time, we just did not know where they are and now we went out after them and identified them.

John Stam is not talking about ChIP-seq, his specialty is DNAse-seq - with regular DNAse you get more spread out regions of read enrichment, it is only when you go deep into DGF territory that you get to actual footprinting. If you download the actual hotspot calls from UCSC, you can check the length distribution yourself:

http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnase/

What you're quoting I said with regards to ChIP-seq - there you would usually call a region of at least a 100bp, the binding site is of course a tenth of that at most except for the odd factor with very long recognition motifs. It need not be the only binding site in that region though.

The 8.4 million Larry quoted do not overlap, here is the direct quote:

Collectively, we detected 45,096,726 footprints, representing cell-selective binding to ∼8.4 million distinct 6–40 bp genomic sequence elements.

They have "elements" going all the way to 40bp. The calls are available (through the Nature companion paper), but I haven't had a chance to look at them so I don't know what the actual length distribution is.

Georgi Marinov said...

Just to add - ultimately, it does not matter much how and what you are counting as functional. These are numbers that matter only if you are hell bent on showing there is no junk DNA; if you accept there is junk DNA, what bed files were used for what calculation would not be something you would want to invest too much of your time in.

The important thing is that we gain a quantitative and predictive understanding of how the genome functions, i.e. if you know cell state A and some set of external inputs, you would want to know what the resulting cell state B would look like in terms of gene expression and how the change happens. Having the catalogue of transcription factor binding sites is a necessary step towards that goal (and we don't have that yet - ENCODE has a lot of cell type diversity space to sample), but not a sufficient one.

SLC said...

Re Diogenes

Quantum electrodynamics predicts a value for the anomalous magnetic moment of the electron that agrees with experiments to 10 significant digits. That would seem to indicate that we understand electrons pretty well and that QED provides an extremely accurate predictive model for them.

The Thought Criminal said...

Diogenes, have you gone over to Sean Carroll's and given him the benefit of your superior knowledge of his subject?

My question got the answer I was looking for, one I was pretty certain he'd have to give if I persisted long enough. It took 17 days. The longer it went on the more certain I was that he didn't way to say it.

There will be no Theory of Everything.

Michael M said...

What is the 20% of the human genome that has no observably reproducible "biochemical function" according to the ENCODE definition?

Centromeres?

Telomeres?

Both are bound by proteins, though, if I recall correctly, not necessarily histones or RNA polymerases and asscociated proteins. The issue that has puzzled me from the beginning of this whole fracas is that, while 20% is most certainly not even close to a majority of the genome, it seems like a "significant" portion of the genome, and I simply don't understand how the lack of reproducilble biochemical activity in 20% of the genome spells The Death of Junk DNA™.

Joe Felsenstein said...

Claudiu Bandea said:

However, clearly, some organisms, such as bacteria, have very efficient mechanisms of discarding jDNA. Don’t you think that the organisms that have lots of jDNA would have maintained similar mechanisms if the jDNA did not provide some kind of advantage?

I think it might depend on how much junk DNA could be eliminated by an improvement of functioning of (some sort of) removal machinery. If the machinery can remove in one generation in one individual 0.01% of the junk DNA, and the presence of all the junk DNA depresses fitness by only, say 0.01, and a new allele improves the removal to 0.011%, then the selection coefficient is very small.

Furthermore if the region from which the junk DNA is removed is far from the place where these alleles are, the improved-removal allele will not continue to be associated with the cleaner part of the genome, which will segregate from it and show little association at the population level. So selection would be brief and episodic, in addition to being weak.

A similar problem has long been known to exist for selection of alleles modifying the mutation rate in outcrossing species. It is hard to see how alleles that optimize the mutation rate could be selected for,

Michael M said...

My last comment seems to have disappeared into the internet black hole.

Diogenes said...

Getting into an intellectual contest with TTC is like getting into a big dick contest with a Ken doll.

Diogenes, have you gone over to Sean Carroll's and given him the benefit of your superior knowledge of his subject?

Why not? You did. And you also came here and gave me the benefit of your superior knowledge of my subject...

Anonymous said...

The Bacterial tendency towards elimination of DNA is not specific. They delete DNA somewhat randomly. They can afford the process because of their large effective populations, and because of the ample possibilities for horizontal gene transfer. Eukaryotes are another story.

Diogenes said...

Georgi,
Thanks, but I don't understand the 40%, not at all.

Collectively, we detected 45,096,726 footprints, representing cell-selective binding to ∼8.4 million distinct 6–40 bp genomic sequence elements.

But but but, 8.4 million even times 40 bp is 336 Mbp or 10%.

roger shrubber said...

I am disturbed that people don't understand why bacteria tend to eliminate junk DNA. Single cell organisms that live in boom bust ecological niches need to replicate quickly. Replicating a bacteria genome can become a rate limiting step. It isn't the energy cost, it's the time cost to replication with a compounding pay off. Trained biologists should know this.
Few multicellular Eukaryotes have similar constraints on DNA replication time.

The Thought Criminal said...

Well, Diogenes, you see, I got Carroll to admit that there is a huge problem with the sciency faith that physics is on the verge of a "theory of everything", it doesn't know everything about any, single constituent part of that very "everything" they would include that constituent part in it. And "everything" includes everything. That's not an especially hard concept to deal with once you've questioned the concept of a ToE from a more humble POV. Just like my skepticism about "junk" DNA on the basis that DNA is an extremely complicated idea, comprising DNA from many different species and individuals that exists and operates in an extremely complex and, I'd guess, still largely unknown range of circumstances. Making broad, sweeping statements out of a basis of not knowing leads to the first question that the ones making the statements should ask of themselves, "how do I know that when I don't know what I'd need to know to know it".

Otherwise, I really got you on your <> claim the other day, didn't I. You see, you haven't read your heroes' stuff, other people have. You've got to know what he said in order to know what he said. And I got Carroll to make that concession after repeatedly asking the relevant question. I have no confidence that he and other overconfident sci-guys - many of them atheists who use that claim in their ideological nonsense- won't continue to make an absurd claim that they're about to find their "theory of everything". If they knew the history of that quest they'd read a number of the previous greatest minds of physics thought they'd gotten that far in the late 19th century.

It's not all that hard to find problems with many overblown claims. All it takes is finding the basic flaw in the idea. Science can't overcome the requirement to be logically consistent. If somethings wrong with A then the results that depend on A will have something wrong with it.

The universe is big, really big, and how it is can be extremely subtle. And, if any of you had ever read Eddington, you'd have read his idea that there could be physical law that people are unequipped to imagine and which, so, would always elude us. I will state that I believe that condition is almost certainly real, that peoples' abilities are rather limited, that man is not the measure of all things, that man can't even take the measure of all things. You see, I'm not anthropocentric like the "Humanists" are.

This stuff isn't hard, it's just more subtle than the terms you're used to thinking in.

The Thought Criminal said...

Damned html, that should read I really got you on your creationists are the ones who say "intelligent design" claim the other day.... The one you made with the assertion that you sci-guys were in the know about such things.

http://sandwalk.blogspot.com/2012/09/james-shapiro-claims-credit-for.html?showComment=1347563346946#c8544944790945099313

Anonymous said...

Of course replication time contributes. But, do you think that effective populations have nothing to do with it?

"Nothing in evolution makes sense except in the light of population genetics"
-M. Lynch

Larry Moran said...

Replication time is irrelevant. Typical generation times in bacteria are on the order of two to three days (in the real world) and most genomes can be replicated in less than an hour even with a single origin of replication.

Under ideal laboratory conditions E. coli doubles in less than thirty minutes and this is even faster than the time it takes to replicate the entire genome. Clearly, replication time is not limiting.

Trained biologists should know these things. I'm also disturbed by this lack of knowledge.

Most species of bacteria have huge population sizes. As Negative Entropy points out, that's the key.

Claudiu Bandea said...

Joe Felsenstein: It is hard to see how alleles that optimize the mutation rate could be selected for

Negative Entropy: Bacterial tendency towards elimination of DNA is not specific. They delete DNA somewhat randomly. They can afford the process because of their large effective populations, and because of the ample possibilities for horizontal gene transfer. Eukaryotes are another story

roger shrubber: Replicating a bacteria genome can become a rate limiting step… Few multicellular Eukaryotes have similar constraints on DNA replication time

These are all valid, excellent points. Indeed, it appears that the deletion of DNA is more or less random. It seems also that there are only a few mechanisms for deleting DNA, such as those occurring during genome replication or during recombinational events, and that these mechanisms are rather ‘non-specific,’ in the sense that they have not evolved primarely as mechanisms for deleting DNA sequences. However, there is no doubt that there are slight differences in the rate of DNA deletion per cell cycle, both between various regions of the genome and between various genomes.

Here, I want to go back to the hypothesis about function of the junk DNA (jDNA) as a sink for the integration of proviruses, transposons and other inserting elements, thereby protecting coding sequences from insertional inactivation or alteration of their expression. And, I want to discuss it in the context of Joe’s definition of jDNA given in one of his previous comments, which I think is fundamental in understanding the significance of ECODES’s conclusion that 80% or more of human genome contains functional DNA (fDNA), as well as that of my hypothesis on the role of jDNA:

Junk DNA, by which we should mean DNA whose variation is not constrained by natural selection, is likely to be most of our genome, despite the ENCODE designation of much of it as "functional"

In my hypothesis, the function of jDNA is not based on the sequence per se, but on its bare, bulk presence in the genome. Therefore, this DNA is not under the constraints of natural selection based on its sequence, but based on its presence as a sink for the integration of inserting elements (i.e. as a protective mechanism against insertional mutagenesis)

So, we need to expand the role of natural selection to cover not only the function of DNA as an informational molecule, a function that is based primarily on its sequence specificity (i.e. the order of nucleotide in the chain), but also to cover the non-informational functions of DNA, such as a sink for the integration of inserting elements.

Therefore, we might want to classify DNA as informational DNA (iDNA), which codes for proteins, functional RNAs and regulatory elements, and non-informational DNA (niDNA), which fulfills other functions such as the protective mechanism I proposed. Surely, some or this protective DNA might have both types of functions, as a similarity of its sequence with that of actively inserting elements, such as retroviruses, might enhance its protective function.

Indeed, the protective role of jDNA in somatic cells, in which it prevents neoplastic transformations, or cancer, can be experimentally addressed: for example, transgenic mice carrying DNA sequences homologous to infectious retro-viruses, such as murine leukemia viruses (MuLV) might be more resistant to cancer induced by experimental MuLV infections as compared to controls.

Joe, I know that you and I have similar view on the role of natural selection in shaping the evolution of organisms, so I hope you agree with its expanded role on jDNA. Now, we have to work on convincing Larry, as without him we are not going to go too far on this blog!

Mike White said...

Yes, trained biologists should know these things, but we tend not to require most biology students to learn anything about population genetics.

Speaking of Michael Lynch, he's written an excellent discussion of these issues in his 2007 textbook, The Origins of Genome Architecture. Anyone who is confused about the relationship between the efficiency of natural selection and genome size should read the first few chapters & refs to the literature therein.

Here's a key passage:

"A key theme that will appear repeatedly in the following pages... is that population size is a central determinant of the efficiency of natural selection...

By reducing the efficiency of natural selection, diminished population size magnifies the tendency for mildly deleterious insertions to accumulate in the genome, while also reducing the ability of selection to promote advantageous deletions... purifying selection eliminates deleterious genomic elements from large populations... from a population genetic perspective, the uniformly simple genomes of prokaryotes are not surprising at all - they are the expectation." - p. 40-41


Lynch walks you through the math and the results in the literature. Lynch argues pretty strongly against replication time having any relationship to genome size.

Claudiu Bandea said...

What does’ Lynch say about junk DNA (jDNA)? Any function? I’m referring to a function(s) for all jDNA, not a few percentages here and there?

Diogenes said...

You're the same Mike White who blogs at HuffPo? I read your piece about Junk DNA, thanks for that.

Diogenes said...

@Mike:

bottom line: every biology grad student MUST take a class in population genetics. Why is this not mandatory?

Here's why the ENCODE debacle happened: too many damn molecular biologists don't know population genetics. I love molecular biologists. They are the smartest people in the world (besides physicists.) I married one. But, they don't know goddamn population genetics.

When scientists chosen as spokespeople of the ENCODE consortium, over at the REDDIT clusterfuck, sit there and say Junk DNA is based on the fallacy that "if I don't know it's function it's junk", there is something seriously wrong with the educational system.

Anonymous said...

This was just another case of atheism dictating research. These guys are in the field(origins) in the first place because they dont believe in God and have compulsion toward anything that will confirm that belief.

So all the nonsense about this junk will have to be edited out of the atheist arsenal. Its funny--they wanted this to be junk--so it *Was junk. They ridiculed those who said code needs instructions and demanded that no such process could square with probability.
Well now its here and what do they do? They see the dart on the wall and they draw bulls-eye around it as if it strengthens their views.

Look, we're already at multiverse in physics. They've already given up. Its time to stop denying what everyone else can see in all of 3 seconds. The world is designed. All we have left is a bunch of emperors with no clothes on.