More Recent Comments

Tuesday, July 29, 2014

The Function Wars: Part III

This is Part III of several "Function Wars"1 posts.

How much of the human genome is conserved?

The first post in this series covered the various definitions of "function" [Quibbling about the meaning of the word "function"]. In the second post I tried to create a working definition of "function" and I discussed whether active transposons count as functional regions of the genome or junk [The Function Wars: Part II]. I claim that junk DNA is DNA that is nonfunctional and it can be deleted from the genome of an organism without affecting its survival, or the survival of its descendants.

The best way to define "function" is to rely on evolution. DNA that is under selection is functional. But how can you determine whether a given stretch of DNA is being preserved by natural selection? The easiest way is to look at sequence conservation. If the sequence has not changed at the rate expected of neutral changes fixed by random genetic drift then it is under negative selection. Unfortunately, sequence conservation only applies to regions of the genome where the sequence is important. It doesn't apply to DNA that is selected for its bulk properties.

Let's look at how much of the human genome is conserved (sequence). Keep in mind that this value has to be less than 10% based on genetic load arguments. It should be less than 5%.

A recent paper by Rands et al. (2014) is informative. They start their paper by asking "What proportion of the human genome is functional?" and they propose a definition in the introduction ...
... evolutionary studies often equate functionality with signatures of selection. While it is undisputed that many functional regions have evolved under complex selective regimes including selective sweeps [7] or ongoing balancing selection [8], [9], and it appears likely that loci exist where recent positive selection or reduction of constraint has decoupled deep evolutionary patterns from present functional status [10], [11], it is widely accepted that purifying selection persisting over long evolutionary times is a ubiquitous mode of evolution [12], [13]. While acknowledging the caveats, this justifies the definition of functional nucleotides used here, as those that are presently subject to purifying selection.

This is of course not useful as an operational definition, as selection cannot be measured instantaneously. Instead, most studies define functional sites as those subject to purifying selection between two (or more) particular species. Studies that follow this definition have estimated the proportion of functional nucleotides in the human genome, denoted as αsel [14], [15], between 3% and 15% ([3] and references therein, [16]). Since each species' lineage gains and loses functional elements over time, αsel needs to be understood in the context of divergence between species.
By emphasizing presently, they eliminate pseudogenes and defective transposons that look conserved because they have descended from a common ancestor but aren't currently subject to purifying selection. The authors look at the fraction of the human genome whose sequence is conserved. That's not the only DNA that might be under constraint but it's still an important number.

The proportion of functional nucleotides (αsel) can be narrowed down by restricting the analysis to those sequences that can be aligned. That's the pert between deletions and insertions (indels). This value is αselIndel.

--------------------------------------
UPDATE: I assumed that the authors were looking at sequence conservation but I was wrong, according to Chris Rands, the first author on the paper (see comments). He says, "In brief, we have a background model of what we expect the distribution of indels to be in neutral sequence that is not under constraint." Now I'm completely confused because what they seem to be describing is the amount of DNA that doesn't have any insertions or deletions in the genomes of other mammals. I cannot recommend this paper and I dismiss the conclusion of 8% constraint until I have a better understanding of what they mean.
--------------------------------------

This paper uses a complicated algorithm that I do not understand and I do not intend to try. I don't really know how any of the estimates of constrain/conservation are obtained but I assume it involves looking at sequence similarity within windows of a fixed size as one scans along the aligned sequences. This paper looks at conservation between humans and several species of mammal including rhino, panda, and cattle. I'm under the impression that these other genomes are far from finished so I don't know how reliable that data is. I assume that the alignments are done by computer and I'm very skeptical of such alignments. I don't know how they eliminated pseudogenes and degenerate transposons sequences and I don't know how they deal with gene families.

Nevertheless, the estimate of constrained sequence is between 220 and 286Mb, This corresponds to 7.1-9.2% of the genome. (The authors are using 3,100 Mb as the total size of the sequenced portion of the human genome.) Let's say it's 8%. About 1% of the genome is conserved coding sequence and the rest (7%) is noncoding.

This estimate lies between some estimates that are as low as 5% and others that are about 15%. Here's what the Rands et al. have to say about that ...
Our estimate that 7.1–9.2% of human genomes is subject to contemporaneous selective constraint considerably exceeds previous estimates and falls short of others [3], [23]. We have shown that our method's previous estimates for specific species pairs, as well as the calculation that suggested 10–15% of the human genome is currently under negative selection were inflated [3], in large part owing to inaccuracies in whole genome alignments upon which our estimates were based.
So, the best we can say is that about 8% of the sequences in the human genome appear to be under purifying selection. If you assume that this defines functional sequences then >90% of our genome is junk.

One thing is clear.
Our estimate that 7.1%–9.2% of the human genome is functional is around ten-fold lower than the quantity of sequence covered by the ENCODE defined elements [1], [5], [6]. This indicates that a large fraction of the sequence comprised by elements identified by ENCODE as having biochemical activity can be deleted without impacting on fitness. By contrast, the fraction of the human genome that is covered by coding exons, bound motifs and DNase1 footprints, all elements that are likely to contain a high fraction of nucleotides under selection, is 9%. While not all of the elements in these categories will be functional, and functional elements will exist outside of these categories, this figure is consistent with the proportion of sequence we estimate as being currently under the influence of selection.
This is one more paper that disputes the ENCODE conclusions. At some point, the major journals like Nature and Science are going to have to admit that they were duped by the ENCODE Consortium. (Are you listening, Elizabeth Pennisi? (@epennisi))

Function Wars
(My personal view of the meaning of function is described at the end of Part V.)

1. Alex Palazzo suggested that we call these the "function wars." Thanks, Alex.

Rands, C.M., Meader, S., Ponting, C.P. and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS genetics 10, e1004525. [doi: 10.1371/journal.pgen.1004525]

56 comments :

Claudiu Bandea said...

As I mentioned in a comment to the second post of “The Function Wars,” we have known for almost half of century that most of the human genome and that of other organisms with relatively high C-value does not have informational functions, period. This is a fact.

In that comment I also mentioned that some of the researchers in the field of genome biology prefer to maintain the field of genome biology and evolution in confusion, so they can continue to obtain funds and perform nonsensical research.

What I didn’t say, until now, is that some other people, such as our host Larry Moran, who apparently agree with this fact, yet love to keep this confusion alive. Why? Why keep talking about issues that are irrelevant to the main question remaining in the field of genome biology and evolution: does most of the genome in organisms with relatively high C-value have non-informational functions, or most of it is non-functional, metaphorically speaking, junk?

Claudiu Bandea said...

Just to clarify, obviously, it is critical to study the informational functions of genomic DNA. It is also relevant to know exactly how much informational DNA (iDNA) is present in the human genome and that of other species. However, we have known for a very long time that iDNA cannot represent more than small fraction of the genome. But knowing exactly what this fraction is, whether 1%, 2% or 10%, doesn’t help with the question about the rest of the genome: it is functional or junk?

SPARC said...

I am just wondering if informational DNA or iDNA are established terms because most results of a Google search link to comments you left at different blogs. Pubmed didn't give any results for the term informational DNA and only 29 hits for DNA which is used as abbrevations for iron deficiency without anemia and immunization DNA in Pubmed abstracts. In addition, it appears as the name of a bioinformatics program and even as a part of a company's name (iDNA genetics Ltd.). Even in papers that deal with DNA the terms are not used in the sense as you used them above: It rather seems that they have been occasionally used to distinguish the codogenic and the coding strand of a DNA molecule during the early days of molecular biology. More recently, iDNA analysis has been coined for a method to assess mammalian diversity by the analysis of DNA isolated from carrion feeding flies. Beside some obscure esoteric article on iDNA (why am I not surprised that it is introduced by a picture of left-handed DNA?).
Thus I suggest to be reluctant regarding the use of both terms.

Larry Moran said...

Just to clarify, one of the most important points I am trying to make is that there are no simple definitions of "function" and "junk." There are some clear examples of spacer DNA whose sequence doesn't matter so you can't idenify function with sequence conservation.

If any of the bulk DNA hypotheses are correct then most of our genome will be functional even though it contains no relevant sequence information. It's wrong to rule out these hypotheses by defining "function" in a way that dismisses them by fiat.

As it turns out, the bulk DNA hypotheses can be dismissed for lots of other reasons but that's not the point.

SPARC said...

Sorry, I had to go to work and didn't finish the second last sentence and cannot remember what I was wanting to link to in addition. I also forgot the "i" in "29 hits for iDNA". So here's my comment again with some minor modifications.

I am just wondering if informational DNA or iDNA are established terms because most results of a Google search link to comments you left at different blogs. Pubmed didn't give any results for the term informational DNA and only 29 hits for iDNA which is used as abbrevations for iron deficiency without anemia and immunization DNA in Pubmed abstracts. In addition, it appears as the name of a bioinformatics program and even as a part of a company's name (iDNA genetics Ltd.). Even in papers that deal with DNA the terms are not used in the sense as you used them above: It rather seems that they have been occasionally used to distinguish the codogenic and the coding strand of a DNA molecule during the early days of molecular biology. More recently, the term iDNA analysis has been used for a method that assesses mammalian diversity in a given habitat by the analysis of DNA isolated from carrion feeding flies. And then there is some obscure esoteric article on iDNA (why am I not surprised that it is introduced by a picture of left-handed DNA?).
Thus I suggest to be reluctant regarding the use of both terms.

Unknown said...

Thank you Laurence for the interesting article. One point, you say "There are some clear examples of spacer DNA whose sequence doesn't matter so you can't idenify function with sequence conservation". In fact the paper examines sequence constraint with respect to insertion and deletion (indel) mutations, so for this method only the lengths of the ungapped alignment blocks are important and not the particular nucleotides present.
https://twitter.com/c_rands

Georgi Marinov said...

Do you have examples where very long spacers (as in tens of kb) are required and their length is very tightly constrained? That's an honest question, I am not trying to start an argument, I am just not aware of any myself.

Diogenes said...

Hi Casey & UDites!

You read this blog right?

So you all said "no junk DNA" was a prediction of Intelligent Design. A prediction.

Your dichotomy for today:

1. Is Intelligent Design falsified, or

2. Were you lying, & it's actually unfalsifiable?

Diogenes said...

A crucial point. This method can test the "bulk DNA" hypothesis.

Larry Moran said...

@Georgi Marinov,

No, I don't have any examples. The only example I use is the minimal length of an intron.

Larry Moran said...

@Chris Rands,

Thanks for responding. You do realize, I hope, that the methods you use are very poorly explained in the paper. I thought your paper was about sequence conservation because you were talking about purifying selection and "constrained sequence."

Are you now saying that what your paper actually identifies is 8% of the human genome that has not been interrupted by insertions and deletions in any other mammalian species? What is the minimum length that counts as a hit (window size)? Is it 100bp?

Unknown said...

I'm apologize if it was not clear, but we are looking at a form of sequence conservation, but conservation with respect to indel mutations rather than point mutations.

In brief, we have a background model of what we expect the distribution of indels to be in neutral sequence that is not under constraint. Then we look at the distribution of indels in the real biological data from processed whole genome pairwise alignments. We estimate sequence constraint with respect to indels by quantifying the excess of long segments without indels observed from the data over those expected under the background model.

The methods are given in detail in the supplementary material and the work builds on previously established methods described in these 2 papers:
http://genome.cshlp.org/content/20/10/1335.long
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0020005

Anonymous said...

Diogenes,

The ID is falsifiable by the same way evolution can not be...

Here is what Behe had to say on the subject:

“The National Academy of Sciences has objected that intelligent design is not falsifiable, and I think that’s just the opposite of the truth. Intelligent design is very open to falsification. I claim, for example, that the bacterial flagellum could not be produced by natural selection; it needed to be deliberately intelligently designed. Well, all a scientist has to do to prove me wrong is to take a bacterium without a flagellum, or knock out the genes for the flagellum in a bacterium, go into his lab and grow that bug for a long time and see if it produces anything resembling a flagellum. If that happened, intelligent design, as I understand it, would be knocked out of the water. I certainly don’t expect it to happen, but it’s easily falsified by a series of such experiments.

Now let’s turn that around and ask, How do we falsify the contention that natural selection produced the bacterial flagellum? If that same scientist went into the lab and knocked out the bacterial flagellum genes, grew the bacterium for a long time, and nothing much happened, well, he’d say maybe we didn’t start with the right bacterium, maybe we didn’t wait long enough, maybe we need a bigger population, and it would be very much more difficult to falsify the Darwinian hypothesis.

I think the very opposite is true. I think intelligent design is easily testable, easily falsifiable, although it has not been falsified, and Darwinism is very resistant to being falsified. They can always claim something was not right.”
All I can say is show Behe he is wrong."

I certainly don’t expect you or any other scientist to even try to prove Behe wrong, so ALL you and others have left is bullying… presumptuousness…and such…

Diogenes said...

So Quest, you acknowledge that this research falsifies Intelligent Design?

Diogenes said...

An objection: sequence is never "neutral". As Motoo Kimura explained, changes to a sequence may be neutral, if the fitness is the same before and after that particular change in the sequence, but the sequence is not "neutral". When people say sequence is "neutral", they either mean it is functionless or that no change to it can affect fitness.

Neutral changes can also occur in functional, even essential sequences. For a particular sequence, some changes may be neutral while others are not.

Larry Moran said...

Chris Rands says,

In brief, we have a background model of what we expect the distribution of indels to be in neutral sequence that is not under constraint.

That's about as clear as mud. I've added a note to my post.

Unknown said...

Sorry Laurence, let me try to explain more.

We have a neutral indel model of estimates the quantity of sequence constrained with respect to indels between a species pair. The model examines the distribution of inter-gap segments (IGSs; that is ungapped alignment blocks) from a set of whole genome pairwise alignments using a regression approach over a range of medium IGS lengths to estimate the parameters of a predicted geometric distribution of IGSs in neutral sequence. The quantity of constrained sequence is then estimated by summing up the quantity x - 2K over all the long IGSs inferred to be in excess of predictions under neutral evolution. Here where x is the length of the overrepresented IGS, and K is the estimated mean spacing between indels (“neutral overhang”). 20 equally populated G+C content bins are analysed separately to account, in part, for mutational variation that correlates with G+C content. The X chromosome is also analysed separately.

Full details are provided in the paper

https://twitter.com/c_rands

Unknown said...

To add, this constraint with respect to indels does agree very closely with constraint with respect to point mutations. In fact between human and mouse, 97% of our 'constrained sequence' also shows sequence constraint by the more conventional measure of point mutations that you refer to (Supplementary Text S6 gives the details)

TomSeanSmith said...

Larry, could you clarify what it is you don't understand about the methods? From what I understand from Chris's explanation and the paper, they have a predicted distribution of indels if all mutations were neutral and they looked for deviation from this null distribution. This deviation was then used to estimate the proportion of the DNA which was under purifying selection between pairs of species with a range of sequence divergences. Extrapolating from these pairs to the present day they reached an estimate ~8%.

Comparing pairs overcomes the problem of a low number of informative mutations between primates as well as the issue that more distant relatives only possess a portion of the genomic sequences under purifying selection in humans (see the comments in the paper regarding 2.2% sequence constraint in the human:mouse comparison).

Having enjoyed your blog for a number of years, I'm aware you'd rather engage with people who were open about their affiliations etc so I should state that I'm part of Chris Ponting's group (corresponding author). I had no involvement whatsoever in this publication but I was thinking of sending it to before I saw that you'd already picked up on it, I assumed you would be interested in the findings given your previous posts

Larry Moran said...

Chris Rands says,

We have a neutral indel model of estimates the quantity of sequence constrained with respect to indels between a species pair.

Sorry. That doesn't help much. I understand all the words but not the sense of what you are saying. Let's say you align the human and chimp genomes. If you subtract all the insertions and deletions is what's left the amount of "constrained" DNA?

Full details are provided in the paper

I figure that there are only a few dozen people in the entire world who could figure out what you did from what's written in the paper.

Larry Moran said...

TomSeanSmith says,

Larry, could you clarify what it is you don't understand about the methods?

Just about everything.

From what I understand from Chris's explanation and the paper, they have a predicted distribution of indels if all mutations were neutral and they looked for deviation from this null distribution.

What is the "predicted distribution of insertions and deletions" and how in the world do you predict them? There are lots of insertions and deletions in the coding regions of genes. Does this mean they are not constrained?

Unknown said...

Imagine that indels fall randomly across the genome, this is (approximately) what we expect. Regions of the genome that are functional will be preferentially purged of indels compared to those regions that are nonfunctional. The result is that there are fewer indels (and hence longer sequences that don't contain gaps) in functional sequence than under the random expectation for nonfunctional sequence. This pdf slide illustrate this:
http://wwwfgu.anat.ox.ac.uk/~chrisr/twitter/IGS_ilustration_slide.pdf

Unknown said...

Those coding regions that do contain lots indels, we will not predict as constrained. But this is the exception rather than the rule, we predict that about 85% of coding sequence bases are constrained with our indel approach.

Note that approaches looking at the sequence bases for constraint also predict about at best this percentage of coding sequence to be constrained. Sometimes a lower percentage is predicted, since the 3rd position of codons is often unimportant for the amino acid produced (e.g. the codons TGT and TGC both produce the amino acid Cys), so selection does not act to prevent this sequence change. By contrast, a frame shift mutation in protein coding sequence (that is an indel not of 3bp in length) is normally quite damaging as it completely changes the protein sequence.

Claudiu Bandea said...

Laurence A. Moran : “one of the most important points I am trying to make is that there are no simple definitions of "function" and "junk."

You have made this point quite well in your post suggestively entitled “Quibbling about the meaning of the word "function" : We’re interested in the big picture—whether most of our genome is junk—and that’s not going to be resolved by settling on a definition of “function.”

Although we should encourage scientists and philosophers to keep refining the concept of biological function, I think that our common sense can guide us towards a sensible understanding and use of the concept of biological function. As a matter of fact, nobody thinks that most of our genome has informational functions, and that includes the ENCODE scientists (see my comment at: http://www.ncbi.nlm.nih.gov/pubmed/23479647):

”After all, the ENCODE ‘function fiasco’ was not the result of misunderstanding the concept of biological function, nor was it due to scientific incompetence as suggested by others (2). On the contrary, because it conflicted with some of the project’s objectives and with its significance, there was a concerted effort not to bring this concept forward (3); indeed, as clearly shown in a recent ENCODE publication (4), at least some ENCODE members seem well aware of the scientific rationale and criteria for addressing putative biological functions for genomic DNA.”

So, now that everyone agrees that informational DNA (iDNA) represents only a fraction of the human genome, it would make sense to address the “BIG PICTURE,” just as you suggested.

The problem is that apparently you don’t want to address it. The question is WHY?

Claudiu Bandea said...

@SPARK

You are right, “informational DNA (iDNA)" is not an established term, although I’m working hard at it. I have used this term, along with its counterpart, non-informational DNA (niDNA), mostly in blogs, particularly here at Sandwalk, which I think is a fine place to introduce new ideas and concepts, don’t you?

As you well know, the term “onion test,” a sensible metaphor for C-value paradox, was also first introduced in blogs, before making it to the PubMed.

BTW, thanks for the information about the other uses of this term, which is very interesting.

roger shrubber said...

@Chris, Allow me to sum up your method so you can correct me.
You assume that most of the genome is not under purifying selection. This allows you to bootstrap a model of the distances between adjacent indels sites in a pairwise alignment. That produces a distribution which is your model. You then sweep across the alignment looking for zones where, locally, the observed distribution of distances between indels is significantly different from the typical distribution of indels. The actual distribution used is somewhat more sophisticated as it is a function of GC content.
Zones that show a skewed distribution toward larger average distances between indels have presumably been subject to purifying selection.
If that's correct, I would not trust the model unless you get the same result omitting 1 bp indels to avoid the most common sequence assembly errors.

Unknown said...

@roger, you have the right idea, although we do not assume most of the genome is not under purifying selection, but rather demonstrate that this appears to be the case through showing the lengths of the medium length ungapped alignments can be modeled by a geometric distribution. (We use medium lengths as shorter lengths are contaminated with sequence errors as you point out and longer sequences contain constrained elements).

The alignments are processed to remove poor quality aligning sequence prior to application of the model (using a log-odds approach to trim off sequence), and the results are fairly robust to different assembly builds.

roger shrubber said...

Thanks for the clarification. I maintain, however, that you do assume most of the genome is not under selection or else you could not bootstrap your model distribution and it would not make sense to take your approach.
Alternatively, you are just determining relative local window constraints of purifying selection. I wonder if the whole of it wouldn't be better done using Fourier transforms of the intergap sequence distances.

Unknown said...

The only word I object to is assume, as I feel we demonstrate rather than assume this, but I'm sure there are improvements that could be made to the model (models, as in fact we take 2 different approaches). If you're interested, then the 1st of these models has been described previously in this paper (although we have modified the approach)
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.0020005

Larry Moran said...

@Chris Rands

re: http://wwwfgu.anat.ox.ac.uk/~chrisr/twitter/IGS_ilustration_slide.pdf

After looking at your slide, I have several questions. First, the two sequences have to be aligned so it's obvious that sequence similarity is a key component of "constrained" sequences. In order to do the alignment you have to make some decisions about cutoff values. What are those values? Is it something like >35% sequence identity over 100bp allowing for some insertions and deletions?

How do you do the alignment and make decisions about gaps? The figure you show in your slide has an obvious error surrounding the 3bp segment. You can easily eliminate both the gaps on either side while giving up only one identity (out of two). You could also eliminate one of the gaps surrounding the second 2bp sequence (CG) and create a single 3bp gap. This suggests that your gap opening penalties are way too low. What is the penalty in your algorithm?

How do you decide which part of the intergap sequences are constrained? There appears to be a 9bp conserved block on the left-hand side but you've only counted 7bp as "constrained." Why? What is the window size for determining constrained sequence and what is the percent sequence identity that you use to conclude that the sequences are homologous? (If they aren't homologous then you can't really say that they are constrained, can you?)

When you align the human and mouse genomes, what total percentage of the human genome meets your criteria for homology? In other words, how much of it aligns using your alignment algorithm? How does this compare to your calculation of constrained sequence?

I'm trying to figure out the difference between "constrained" and "conserved." I'm also trying to figure out what you mean by "functional" in Figure 4.



Peter Perry said...

"A group of researchers from the University of Helsinki and the Universitat Autònoma de Barcelona have been able experimentally to reproduce in mice morphological changes which have taken millions of years to occur. Through small and gradual modifications in the embryonic development of mice teeth, induced in the laboratory, scientists have obtained teeth which morphologically are very similar to those observed in the fossil registry of rodent species which separated from mice millions of years ago."

"The researchers observed that the teeth formed with different degrees of complexity in their crown. The more primitive changes observed coincide with those which took place in animals of the Triassic period, some two hundred million years ago. The development of more posterior patterns coincides with the different stages of evolution found in rodents which became extinct already in the Paleocene Epoch, some 60 million years ago. Researchers have thus achieved experimentally to reproduce the transitions observed in the fossil registry of mammal teeth."

http://www.eurekalert.org/pub_releases/2014-07/uadb-sre073014.php

Claudiu Bandea said...

Laurence A. Moran: “Now I'm completely confused because what they seem to be describing is the amount of DNA that doesn't have any insertions or deletions in the genomes of other mammals. I cannot recommend this paper and I dismiss the conclusion of 8% constraint until I have a better understanding of what they mean.”

Larry,

I’m a fan of your blog, and like many other readers I appreciate your contributions and effort; it is admirable that you cover so many subjects, day after day, and we thank you for that.

When openly (i.e. ‘live’) writing about so many issues, like you do, it’s only normal to make mistakes; this might explain why so many scientists prefer to have their opinions and ideas screened by a closed peer review system before making them public, which lower the chances for embarrassment. So, again, I admire your contributions.

However, as pointed out by your colleague and friend T. Ryan Gregory (“Sorry, Larry -- you know I appreciate your blog posts, but it's clear that you didn't understand the paper(s) you criticize”; http://sandwalk.blogspot.com/2014/07/the-function-wars-part-ii.html), it seems that you are not reading carefully enough the papers you are writing about; case in point, the Rands et al. (2014) paper, which you describe in this post.

As clearly shown in the quotes below (first quote from the Abstract), even a casual reader would have realized what the paper was about. This is not fair to the authors of the paper, so I think you own them an apology, particularly to the first author, Chris Rands, who was kind enough to respond to your post in detail.

“Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences.”

”NIM1 is a quantitative model describing the distribution of distances between neighbouring indels (intergap segments; IGSs) in neutrally evolving sequence…”

“To provide further assurance of the accuracy of the derivation we introduced a new likelihood neutral indel model (NIM2)…”

“We applied the neutral indel model to estimate αselIndel on trimmed whole genome alignments between a wide range of eutherian species pairs for which high quality genome assemblies are available.”

“We therefore estimate that 8.2% of the human genome (253 Mb; 95% CI 7.1%–9.2%, 220–286 Mb) is presently under purifying selection with respect to indels.”

Claudiu Bandea said...

Diogenes:

“An objection: sequence is never "neutral". As Motoo Kimura explained.…”

Good point. I’m working on a paper that covers this and some other issues regarding the neutral theory, and I would love to send it to you for review when ready (if open for that, please send me an e-mail).

”A crucial point. This method can test the "bulk DNA" hypothesis”

The models employed by Rands et al. (2014) can only test for some "bulk DNA" hypotheses (there are many of them and very different of each other), and only if we really, really stretch them. For example, we could hypothesize that most of the genomic DNA serves as spacers between various informational DNA (iDNA) segments, for example between various promoter elements and the protein- and functional RNAs- coding regions, and that this ‘spacer DNA’ is highly constrained in regard to its length.

However, the current data clearly show that only a very small percentage (likely, less than 1% of the human genome) might function as highly specific (in regard to length) ‘spacer DNA.’

Unknown said...

@Laurence, the slide is merely an illustrative schematic and does represent a real alignment, it is too small to provide any meaningful information about sequence constraint. The gap penalties (determined mainly by UCSC genome informatics) are provided in the paper as are the answers to your other questions.

Unknown said...

@Claudiu Bandea, thank you for your posts. In regard to the neutral point of Motoo Kimura, you may be interested to see a short article I wrote about defining functional sequence:
http://wwwfgu.anat.ox.ac.uk/~chrisr/twitter/Defining_functional_sequence.pdf

Unknown said...

Correction: 'does NOT represent a real alignment'

Larry Moran said...

Chris Rands says,

the slide is merely an illustrative schematic and does NOT represent a real alignment, it is too small to provide any meaningful information about sequence constraint.

Okay. But I recommend that you make it more accurate. It's not that hard.

The gap penalties (determined mainly by UCSC genome informatics) are provided in the paper as are the answers to your other questions.

Thanks. I don't have time to try and extract the answers to my questions from the supplements and references but I understand why you don't want to put the answers here.

Larry Moran said...

@Claudiu Bandea

You have no idea how much I appreciate your constant criticisms of my reading and comprehension abilities.

In spite of what Chris Rands said, I'm now pretty sure that I was right the first time. They are only looking at conserved DNA where the sequences have significant similarity. That's because the only way they can actually recognize deletions and insertions is when they occur in regions they can align. I'm not sure about the difference between "conserved" sequence and "constrained" sequence but Chris tells me it's buried somewhere in the paper or the supplements. (I can't see most of the supplements.)

Claudiu Bandea said...

@Chris Rands

Thanks for your interesting paper. Do you plan to publish it?

Claudiu Bandea said...

Laurence A. Moran says: “I can't see most of the supplements”

They are openly available at the journal site, but if for whatever reasons you can’t open them, I’m pretty sure Chris will send them if you really want to study them.

Unknown said...

@Claudiu

I do not have plans to publish, but it would be nice to promote the ideas. You are welcome to use my ideas in your own paper if you wish, but a mention in the acknowledgments would be appreciated if you do.

If you feel I could contribute significantly do your work then we could even consider a joint publication I guess (although this may not interest you of course), do email me if you are interested: chris.rands@dpag.ox.ac.uk

Anonymous said...

Diogenes,

For more than 2 years or so you have been accusing ID proponents and others of lying, quote- mining, etc… You have also made many claims that many, many theories explain the theory of evolution… I’m not sure if you ever provided any evidence for your claims. So, after this long wait, I would like you to set the matters straight. You, Larry, Jeffrey S and Nick M are so confident that “the TRUTH” will eventually win, that they are almost willing to put their life on it…
All I want is proof; knock out the genes for bacterial flagellum and make it grow one. I will give you more than Behe; make the gene defective, so that the bacterium grows a defective flagellum, that doesn’t propel it but it, can instead use it for something else; paddling, sailing, or steering the bacterium. Just make sure that natural selection can provide evidence for your claims., that it is falsifiable as you have claimed all along….

Claudiu Bandea said...

Thanks Chris, I'll contact you.

Anonymous said...

Diogenes,

I think you missed my comment...;-)

judmarc said...

I will give you more than Behe; make the gene defective, so that the bacterium grows a defective flagellum, that doesn’t propel it but it, can instead use it for something else; paddling, sailing, or steering the bacterium. Just make sure that natural selection can provide evidence for your claims., that it is falsifiable as you have claimed all along….

Already been done. I believe Nick Matzke is co-author of the peer reviewed journal article on this.

judmarc said...

Providing additional information on my last comment: The actual forerunner identified in the article by Matzke et al. is not for locomotion purposes, but is part of a system used for secreting or injecting.

judmarc said...

Oh, and here's more, from, y'know, like, real scientists: http://jb.asm.org/content/189/19/7098.full.pdf+html

Anonymous said...

Oh yeah...? I've been waiting for Nick Matzke to respond to this and other challenge I presented him few months ago, but he has been avoiding me... I think Nick and others, who failed to respond to my challenges, hope that I will just go away... They simply don’t want these issues to be real…

You see, when you blindly believe that something is real, because…say… your career, your status or your reputation hangs on it… you refuse to see beyond this veil or filter.... You continue blindly to look beyond the obstacles and issue ignoring them as if they were not existent even if they are real and true… I like to call it “scientific blindness” that is calculated and deliberate…shemless…

Anonymous said...

judmarc,

Stop embarrassing yourself!!! Let Nick and others do that.... they are already used to it lol

judmarc said...

Your considering citation of peer reviewed scientific research to be embarrassing certainly explains a lot.

Anonymous said...

Judmarc,

Maybe you can explain me this issue or provide me with peer reviewed experiments that explain that;

Enzymes are needed to produce ATP. However, energy from ATP is needed to produce enzymes. However, DNA is required to make enzymes, but enzymes are required to make DNA.
However, proteins can be made only by a cell, but a cell can be made only with specific proteins. So, how is this ALL possible in view of evolutionary prospective?

Also, please don't embarrass yourself with links to speculative papers that try to explain how, in a magical way, a syringe became a flagellum with tens of additional parts appearing due to magic of natural selection....

While you at it, link me to actual experiments that prove all of this magic...lol

judmarc said...

Also, please don't embarrass yourself with links to speculative papers that try to explain how, in a magical way, a syringe became a flagellum with tens of additional parts

You mean tens of parts that were already present in much the same form? I'm not surprised it's difficult for you to understand what's actually being explained if your world view depends on the non-existence of any natural explanation.

There's a table in the Pallen/Matzke paper of the parts necessarily involved in the flagellum, and those for which pre-existing homologous structures have been found. There are a total of 23 parts that are necessary for functional flagella. Of these, pre-existing homologous structures have been found for 21. Guess it's impossible that we just haven't found the other 2, and thus completely necessary and logical that the Creator cooked them up, right?

Re "in a magical way," the paper I linked provides some science on the subject, no magic involved or necessary. Do they have the complete answer yet? Nope. Guess that means again that it's impossible we'll ever know more, so we should just admit right now the Creator did it. Not that that would constitute "in a magical way" - it's not magic, it's religion!

Diogenes said...

Quest, I asked you many direct questions and you never answered any. E.g. I asked you 100 times how you knew Witton was in jail and you never answered. Me asking you questions always made YOU shut up, not the other way around.

As for your "which came first, proteins or a cell" question, that was answered in great detail by one of us-- I believe Colnago/SLC-- who went to a lot of trouble referencing scientific data showing that your question is based on false premises, e.g. you do NOT need a cell to make DNA, so all that is false. And the RNA world hypothesis describes simpler versions of biochemical reactions without the interdependency that exists today.

This was explained to you in detail by Colnago/SLC, yet you keep copying the question and you falsely say no one answered it. That is Gish Gallop dishonesty. You have to point out flaws in Colnago's answer.


For more than 2 years or so you have been accusing ID proponents and others of lying, quote- mining, etc… You have also made many claims that many, many theories explain the theory of evolution… I’m not sure if you ever provided any evidence for your claims.


Bull. I include detailed references & hyperlinks for the things I say. $%&! you, who do you think you are fooling?

CatMat said...

"[Y]ou do NOT need a cell to make DNA, so all that is false. And the RNA world hypothesis describes simpler versions of biochemical reactions without the interdependency that exists today."

AIUI, the one thing that makes this easier to grasp is that now it is somewhat hard to construct a DNA generating pipeline precisely because the needed enzymes and raw materials tend to be locked in cells.

This was not the case for the scenario of the RNA world hypothesis, where the availability of raw materials would have been bound mostly by their chemical stability and hydrophobicity. This makes even surface catalysis a viable source of selection pressure until the game really started with the emergence of something like ribozymes.

Anonymous said...

Bacterial flagellum has 40 parts... If Nick Mitzke claims that the same flagellum will work with 23 parts, I will sponsor such an experiment.... I'm serious.... I will not even take credit for it if it is true... set it up!

Anonymous said...

Diogenes,

You are such a coward... I can't believe some people here think you may be a scientist... You always look for a way out instead of addressing the real issue... you religion "science" is going down and people like me and your favorite John Witon will spend every penny we have so that your religion is treated with what it deserves...