More Recent Comments

Friday, March 15, 2013

On the Meaning of the Word "Function"

A lot of the debate over ENCODE's publicity campaign concerns the meaning of the word "function." In the summary article published in Nature last September the authors said, "These data enabled us to assign biochemical functions for 80% of the genome ...." (The ENCODE Project Consortium, 2012).

Here's how they describe function.
Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).
What, exactly, do the ENCODE scientists mean? Do they think that junk DNA might contain "functional elements"? If so, that doesn't make a lot of sense, does it?

Ewan Birney tried to address this definitional morass on his blog [ENCODE: My own thoughts] where he says ....
It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.
That's about as clear as mud.

We all know what the problem is. It's whether all binding sites have a biological function or whether many of them are just noise arising as a property of DNA binding proteins. It's whether all transcripts have a biological function or whether many of those detected by ENCODE are just spurious transcripts or junk RNA. These questions were debated extensively when the ENCODE pilot project was published in 2007. Every ENCODE scientist should know about this problem so you might expect that they would take steps to distinguish between real biological function and nonfunctional noise.

Their definition of "function" is not helpful. In fact, it seems deliberately designed to obfuscate.

Let's see how other scientist interpret the ENCODE results. In a News & Views article published in Nature last September, Joseph R, Ecker (Salk Institute scientist) said ...
One of the more remarkable findings described in the consortium's 'entré' paper is that 80% of the genome contains elements linked to biochemical function, dispatching the widely held view that the human genome is mostly 'junk DNA.'
That makes at least one genomics worker who thinks that "biochemical function" and junk DNA are mutually exclusive.

Recently a representative of GENCODE responded to Dan Graur's criticism [On the annotation of functionality in GENCODE (or: our continuing efforts to understand how a television set works)]. This person (JM) says ...
Q1: Does GENCODE believe that 80% of the genome is functional?

As noted, we will only discuss here the portion of the genome that is transcribed. According to the main ENCODE paper, while 80% of the genome appears to have some biological activity, only “62% of genomic bases are reproducibly represented in sequenced long (>200 nucleotides) RNA molecules or GENCODE exons”. In fact, only 5.5% of this transcription overlaps with GENCODE exons. So we have two things here: existing GENCODE models largely based on mRNA / EST evidence, and novel transcripts inferred from RNAseq data. The suggestion, then, is that there is extensive transcription occurring outside of currently annotated GENCODE exons.
There's another scientist who thinks that 80% of the genome has some biological activity in spite of the fact that the ENCODE paper says it has "biochemical function." I don't think "biological activity" is compatible with "junk DNA," but who knows what they think?

Since this person is part of the ENCODE team, we can assume that at least some of the scientists on the team are confused.

The Sanger Institute (Cambridge, UK) was an important player in the ENCODE Consortium. It put out a press release on the day the papers were published [Google Earth of Biomedical Research]. The opening paragraph is ...
The ENCODE Project, today, announces that most of what was previously considered as 'junk DNA' in the human genome is actually functional. The ENCODE Project has found that 80 per cent of the human genome sequence is linked to biological function.
It looks like the Sanger Institute equates "biochemical function" and "biological function" and it looks like neither one is compatible with junk DNA.

I think the ENCODE leaders, including Ewan Birney, knew exactly what they were doing when they defined function. They meant "biological function" even though they equivocated by saying "biochemical function." And they meant for this to be interpreted as "not junk" even though they are attempting to backtrack in the face of criticism.

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57-74. (E. Birney, corresponding author)

25 comments :

NickM said...

Good post! But:

"I don't think "biological activity" is compatible with "junk DNA," but who knows what they think?"

<-- Mere "activity" certainly is compatible with "junk DNA", did you mean to say that or is this a typo?

John Harshman said...

I believe Larry is distinguishing biological activity from biochemical activity, in which biological activity is limited to activity that has some effect on the biology of the cell or organism. Noise would be expected generally to have no such effect.

Markk said...

I guess what I would like to ask would be something like this:

What part of the DNA sequence that ENCODE calls functional would cause any change to an active cell other then responding to the assays differently if it was changed to a different sequence with similar physical properties. Would the cell function differently? Say for example a stretch that codes for "RNA" as said above, suppose the RNA could even be identified as some small piece of a gene? Suppose a different variety of that gene was coded for instead. Would there be any change to the activity of the cell except for this expression?

Obviously even non-functional DNA physically takes up space and so causes some effects on a cell, but evidently the ENCODE Team drew the line there.

If using their assays or something similar is the only way one could tell cell varieties with different varieties of the "Functional" stretches and there is no other expressed effect then I would call their usage misleading and to be called out.

Diogenes said...

I agree with Nick. Recall that in David Comings' book from 1972, he knew that at least 25% of the mouse genome was transcribed, much more than the coding regions, and Comings explicitly stated in 1972 that much Junk DNA will be transcribed.

This means that transcription is compatible with Junk, and always was. Anyone who says otherwise is an idiot, possibly small i.

If transcription = activity, then activity is compatible with Junk.

khms said...

As I see it, if they define function as biochemical activity, and transcription counts, then the answer has to be all of it, because then mitosis has to count.

If, to avoid this, mitosis shouldn't count, then neither should transcription.

Having it both ways is dishonest.

John Harshman said...

Diogenes: So far all we're arguing about is what "biological activity" means. Larry appears to define it to exclude non-functional transcription. This seems reasonable to me. Calm down and actually read what people say, rather than interpreting every comment as an assault on the reality of junk DNA.

Diogenes said...

John: Calm down and actually read what people say

Your comment was published while I was composing mine, so it's not like I skipped over it.

In Larry's post he mentions
"biochemical function"
"biological function"
"biochemical activity"
"biological activity"

This will get confusing very fast.

Junk DNA may be transcribed into RNA, as Comings defined it.

Biochemical activity is a superset of DNA transcribed into RNA, as Birney defines it.

Thus Junk DNA overlaps biochemical activity, neither subset nor superset. Venn diagram.

Is biological activity non-compatible with Junk?

Is biological activity also a subset of biochemical activity?

Then the Junk that is biochemically active and the DNA that is biologically active are two non-overlapping subsets of biochemical activity.

What could be simpler?

John Harshman said...

Sorry, thought you were responding to me. I hope "What could be simpler?" was meant ironically, but I can't tell for sure. But yeah, I think Larry was using "biologically active" to refer to functional DNA.

SPARC said...

Is there any information about the number transcripts derived from a single junk sequnece per cell compared to the number of trancripts of a lowly expressed well defined gene?

Joe Felsenstein said...

Most scientists, hearing phrases like "biochemical activity", "biological function", etc. conclude one thing -- that the site is doing something that makes a difference to the fitness of the organism, and hence is not "junk". The fundamental fact is that the way ENCODE publicized their work (as excellent as most of the work was) has persuaded the popular science press, and also the public, and also most other scientists, that "junk DNA" was a mistaken notion, that essentially all of the genome is doing something that matters to the fitness of the organism.

It will take probably about 10 years for people to realize that the notion of junk DNA was not a delusion. If the ENCODE consortium wants to help in that process, it can. But so far all I have seen from their publicity machine is elaborate circumlocution.

Claudiu Bandea said...

Joe Felsenstein says: “It will take probably about 10 years for people to realize that the notion of junk DNA was not a delusion”

I think most scientists, including many members of ENCODE project, have realized already that the data produced by this project, as valuable as it might be, do not indicate that 80% of the human genome is functional. However, this does not mean that that the notion of junk DNA (jDNA) is not a delusion.

As I discussed at length here and elsewhere (e.g. (see: http://comments.sciencemag.org/content/10.1126/science.337.6099.1159), the so called jDNA provides a defense mechanism against insertional mutagenesis and, therefore, it’s functional. This is an indisputable, statistical fact and, therefore, the concept of jDNA is indeed a delusion.

There is no way Joe, that you will dispute the fact that jDNA provides a defense mechanism against insertional mutagenesis. Apparently, betting is popular here at Sandwalk, so I’ll bet that your answer Joe will confirm my assertion; obviously, choosing not to respond this challenge will confirm my claim.

Joe Felsenstein said...

I have pointed out 6 months ago
in an earlier discussion on Sandwalk what is wrong with your argument. Your response did not deal with the issue I raised -- that natural selection to retain a piece of junk DNA would usually have extremely weak selection. So I will just point you there again. Obviously failing once again to deal with my counterargument will confirm my claim! (Pointing out natural cases with little junk DNA such as hummingbirds and pufferfish does not deal with my argument).

Georgi Marinov said...

That's not a sequence-specific function, and given the vastness of the genome, it's not really a function at all because even you can delete many megabases of junk DNA and the genome will still be big and protected against insertational mutagenesis.

Claudiu Bandea said...

Joe Felsenstein: “Your response did not deal with the issue I raised -- that natural selection to retain a piece of junk DNA would usually have extremely weak selection”

The reason I didn’t specifically address your response was that, just like your response here, your previous comment allowed for some “quite small” or “extremely weak selection” on retaining “a piece of junk DNA.” I completely agree with that: indeed, the selection acting on each piece of jDNA is extremely small.

However, the selection to retain a large population of jDNA pieces as a defense mechanism against insertional mutagenesis is extremely high. Take for example our immune system which is made up of hundreds of different components, including dozens of antimicrobial peptides. The selection force to retain a specific antimicrobial peptide is probably very small, but that for retaining a large population of antimicrobial peptides is very high. You are an expert in ‘quantitative biology’ and I think you and other experts in this field would be able to confirm or refute my argument analytically.

Although we both look at biological phenomena primarily from a selection perspective, let’s consider that jDNA evolved purely by genetic drift, in the absence of any selection for a particular benefit to the host. My question to you is: does jDNA serve currently as a protective mechanism against insertional mutagenesis by endogenous and exogenous viral elements?

I think you agree with me that the answer to this question clearly: YES!

Georgi Marinov: “That's not a sequence-specific function, and given the vastness of the genome, it's not really a function at all because even you can delete many megabases of junk DNA and the genome will still be big and protected against insertational mutagenesis”

Apparently, you are not familiar with the notion that in addition to its informational role, DNA can have other functions (please see http://comments.sciencemag.org/content/10.1126/science.337.6099.1159, or the recent PNAS paper by W. Ford Doolittle).

Regarding your comment on the phenotypic effects of deleting jDNA, see my answer to Joe above.

Joe Felsenstein said...

[Claudiu Bandea]: The reason I didn’t specifically address your response was that, just like your response here, your previous comment allowed for some “quite small” or “extremely weak selection” on retaining “a piece of junk DNA.” I completely agree with that: indeed, the selection acting on each piece of jDNA is extremely small.

Good to see that you agree with me on that. That's progress.

[Claudiu Bandea]: However, the selection to retain a large population of jDNA pieces as a defense mechanism against insertional mutagenesis is extremely high.

Yes, it could be moderate-to-high. That might explain selection for a mechanism that brings a lot of junk DNA into existence at one go. But of course if the junk DNA was itself copies of transposons, and they were active, that would also be bringing into existence more insertional mutagenesis, as well as protecting against some of it.

On the other hand if the junk DNA came and went in small pieces, then my argument would apply -- the strength of natural selection on those changes will be very weak. Much weaker than the strength of selection for an individual antimicrobial peptide, obviously. There may be hundreds of antimicrobial peptides but the number of pieces (of similar length) of junk DNA is millions.

You can't just wave your hand and say that junk DNA is, as a whole, good for you, and think that you have explained its presence. As far as I can see you're not accounting for its dynamics.

Claudiu Bandea said...

Joe, I think I have reasonably addressed the broad dynamics of jDNA origin and retention in the model I proposed 2 decades ago. Of course, the model requires additional development and experimentation, but that's true to any model.

However, the question that I'm trying to get a straight answer from you is this: does jDNA serve as a protective mechanism against insertional mutagenesis or not? My answer is yes. What is yours?

Georgi Marinov said...

I'm curious what your theory says about introns - were they selected for because they provided defense against insertational mutagenesis, and if yes, are you seriously trying to say that when they were originally inserted for the first time, the benefits of having more DNA around to prevent against insertions of new TEs outweighed the fitness cost of their insertion, which is they very thing your theory is saying they exist to defend against...

Joe Felsenstein said...

[Claudiu Bandea]: However, the question that I'm trying to get a straight answer from you is this: does jDNA serve as a protective mechanism against insertional mutagenesis or not? My answer is yes. What is yours?

Yes if it does not itself bring in a higher rate of insertional mutagenesis. And ... saying that it "serves as" a protective mechanism is ambiguous. It implies that it was evolved for that reason. I'm not sure I'd accept "served as", I'd say that the presence of jDNA makes for fewer negative effects of insertional mutagenesis.

But even if we were to accept "serves as", that is not enough to say that this "service" is why it is there. For that you need a full quantitative treatment. And since you are the one pushing the idea, that is up to you.

Claudiu Bandea said...

Georgi Marinov: “I'm curious what your theory says about introns - were they selected for because they provided defense against insertational mutagenesis…”

Yes, that’s what the model predicts. Here is a quote in which I discuss this paradigm:

“...gene splicing evolved to allow for the presence of ncDNA within transcribed regions, which are preferred targets for the integration of viral genomes (65-68). It is well known that much of the ncDNA is composed of remnants of viral sequences, and that the eukaryal introns resemble the group II self-splicing introns, which in turn resemble retroviral elements. Likely, these elements are evolutionary related (51;57) and it is highly probable that the spliceosomal machinery originated from symbiotic endogenous viral species that coevolved with their host to protect the coding regions from insertional damage (more on the selective forces leading to the evolutionary origin of introns and spliceosomal machinery in the next section).”

Georgi Marinov said...

That makes no sense.

If they are so beneficial, why are splicosomal introns absent from prokaryotes and why were so many of them lost in unicellular eukaryotes?

People have done detailed analysis of the positions of introns across all eukaryotes - most of them seem to have been present in last common eukaryotic ancestor, which means those small eukaryotes with compressed genomes lost a lot of the introns they originally had. Why did they do that?

Spliceosomal introns indeed evolved from group II introns - but it wasn't because this was beneficial, it was because it was either that or extinction, due to the deleterious effect of those same group II introns.

Claudiu Bandea said...

Joe Felsenstein: Yes, if…

I appreciate your YES, even with an ‘if’ attach with it.

Counting Georgi (quote : “even you can delete many megabases of junk DNA and the genome will still be big and protected against insertational mutagenesis) there are 2 people here Sandwalk accepting this idea. Now, that’s progress!

As critical as the protective role of jDNA might be in the germline, the protection against insertional mutagenesis in the somatic cells might be much more critical. In humans, for example, given the enormous number of somatic cells and their high turnover rate during our reproductive span, the number of insertion events that would potentially lead to cancer in the absence of protective mechanisms would be evolutionarily drowning. Think, for example, about the number of insertions in the somatic cells caused by an exogenous retrovirus, such as HIV.

Joe, maybe you and your colleagues in the field of quantitative biology and mathematical modeling can help develop this model. What do you think?

Georgi Marinov said...

That does not answer my objection at all - I said that you can delete a lot of dead TE genomic junk and the genome will still be huge and protected against insetional mutagenesis in your theory. Somatic or germline, it does not matter. The selective coefficient will be tiny at best.

It also does not pass the onion test.

Why do closely related species with very similar lifestyles and life cycles need different amounts of junk DNA?

Claudiu Bandea said...

Georgi Marinov: Spliceosomal introns indeed evolved from group II introns - but it wasn't because this was beneficial, it was because it was either that or extinction, due to the deleterious effect of those same group II introns.

Maybe I misunderstand, but I think you are confirming my model by saying that without the evolution of spliceosomal machinery as a protective mechanism against inserting group II and other types of viral insertional elements, the eukaryal hosts would have gone extinct.

Species with compact or compressed genome genomes, such as Bacteria and Archaea, evolved other protective mechanisms against isertional mutagenesis, such specific integration sites. The evolution of these protective mechanism in organisms that have strong constrains on genome size is strong testimony for the extraordinary selective pressure imposed by inserting elements on their host.

Claudiu Bandea said...

Georgi, maybe I misunderstood you. So here is a straight question, which I think deserves a straight answer: does jDNA serve as a protective mechanism against insertional mutagenesis or not?

My answer is yes. What is yours?

Joe Felsenstein said...

[Claudiu Bandea]: Joe, maybe you and your colleagues in the field of quantitative biology and mathematical modeling can help develop this model. What do you think?

I am busy with other matters. Since you think that you have a selective mechanism that explains the presence of junk DNA, I think the onus is on you to recruit someone to see if they can work out why. And do that before you continue posting assertions that you have an explanation.