Saturday, April 19, 2014

Branko Kozulic responds: Part II

Branko Kozulic has asked me to post some additional comments. I'm happy to do so since it reflects a sincere attempt to learn about evolution and come to grips with the concepts of neutral alleles and random genetic drift. We should encourage the creationists to continue this effort whenever possible.

As usual, my policy is not to comment on posts from guests, especially those that disagree with my views. So here's Branko Kozulic's latest take on the "evidence" as he see it.
I thank Professor Moran for posting my text.

This reply to the comments has grown in size much more than initially anticipated. At the outset, to prevent misunderstandings, I declare my complete ignorance of paleontology and, in general, that my ignorance is by far larger than my knowledge. And I am prepared to repeat this as many times as the readers are willing to read it.

From this discussion up to date, it seems that we do not have a simple model that could show the fixation of 22,000,000 mutations. And that no simple model will work becomes evident, for example, with just a glance at the Beerli & Felsenstein paper.

The referring of our colleague McBride to the Li & Durbin paper provides an opportunity to touch upon two other topics. In addition to the Li & Durbin article, I have carefully read also the paper by Gronau et al., dealing with the same topic. Both research groups started with the data derived from the sequenced genomes of several humans (7 and 6, respectively) from various sub-populations, with the goal of illuminating past fluctuations of the effective population size. Both groups arrived at similar results: thus the first paper reports for the Yoruba population the Ne of 15,313 ± 559, while the second paper gives the range of 7,500 to 10,500. Given this congruence of the results and the quality of the journals in which the results are published, I have no doubts about correctness of the results: I trust in internal consistency of these results. But they depend on certain starting assumptions, like all other results that are an outcome of scientific models. Now it´s time to look closer at the starting assumptions of these models because there is an external inconsistency – with another set of experimental data – as will become apparent in view of other three papers.

Figure 1 of Nielsen et al., paper shows multiple (up to 21) synonymous and non-synonymous substitutions in thousands of chimp proteins compared to the related human proteins. In the second paper Behe & Snoke conclude that in order to generate a new function that requires mutations of two amino acids, like forming a disulfide bridge, 108 generations are needed with a population size of at least 109. In the third paper, Lynch countered that much smaller populations could reach this goal in less time. Now, if we take the human population size from the above two studies (Ne about 104) for the population size in Figure 3 of the Lynch article, we can see that it would take 108 generations for the arrival of a new function, even if the two changed amino acid were any 2 of 50 (with a high s = 0.01). For humans, 108 generations mean 2 Billion years: an impossibly long period. Needless to say, a new function requiring 3, 4 … up to 21 amino acid changes would take much longer than 108 generations. And yet there about 3,000 proteins with 3 amino acid substitutions, over 1,000 proteins with 5 substitutions, etc. A contradiction is thus evident between the experimental data on the one side and the Lynch model plus Li & Durbin and Gronau et al., modeled results on the other, by a wide margin. Therefore, either the starting assumptions of the Lynch model, or of the human population size models, or of both, are false.

Furthermore, we should not lose sight of singletons/orphans found in all sequenced genomes. Population genetics – which deals with changes of allele frequencies in populations – necessarily will remain “agnostic” in relation to the singletons: they are, simply put, beyond its horizon.

Some of the comments referred to religious implications. I deny the existence of direct contact between scientific conclusions and religion. In my view, each particular scientific conclusions must pass through the filter of philosophy and exit at the other side as part of a general statement (universal); then universals are incorporated into a philosophical system, and that philosophical system may or may not be in accordance with philosophy (or theology) of a religion. The only level at which the implications of scientific conclusions can be meaningfully discussed is the philosophical level, in my opinion.

Allow me to conclude with additional philosophical thoughts. Science is no democracy, but a dictatorship. A dictatorship of experimental data: and only of experimental data. Can a scientist ignore the experimental data that contradict his favorite theory without betraying his vocation of scientist? I do not think so.

Moreover, I do not expect other scientists to provide answers to all my questions; nor can I answer all their questions: this is the normal state of affairs.

I wish to extend my thanks to all participants, especially Professor Felsenstein, for their courtesy shown during this discussion.
Beerli, P. and Felsenstein, J. (2001) Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl, Acad. Sci. (USA) 98:4563–4568. [doi:10.1073/pnas.081068098]

Behe, M.J. and Snoke, D.W. (2009) Simulating evolution by gene duplication of protein features that require multiple amino acid residues. Protein Science 13:2651–2664. [doi: 10.1110/ps.04802904]

Gronau, I., Hubisz, M.J., Gulko, B., Danko, C.G., and Siepel, A. (2011) Bayesian inference of ancient human demography from individual genome sequences. Nature Genetics 43:1031–1034. [doi: 10.1038/ng.937]

Li, H. and Durbin, R. (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493–496. [doi: 10.1038/nature10231]

Lynch, M. (2009) Simple evolutionary pathways to complex proteins. Protein Science 14:2217–2225. [doi: 10.1110/ps.041171805]

Nielsen, R., Bustamante, C., Clark, A.G., Glanowski, S., Sackton, T.B., Hubisz, M.J., Fledel-Alon, A., Tanenbaum, D.M., Civello, D., White, T.J., Sninsky, J.J., Adams, M.D., and Cargill, M. (2005) A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees. PLoS Biology doi: 10.1371/journal.pbio.0030170


14 comments :

  1. "For humans, 108 generations mean 2 Billion years"

    What what what now? Is he having timesing trouble? It seems the IDers are again tripped up by that (for them) insurmountable obstacle, multiication.

    I would like to know if 108 generations is time to mutation, time to fixation, or both.

    Note that Kozulic is trying the old "but the organisms must stop mutating & wait until modern-day mutation X happens before they're allowed to start mutations Y, Z, etc." argument.

    ReplyDelete
    Replies
    1. He evidently meant 10^8, but some formatting must have been lost in the process of copying the letter.

      Delete
    2. Apologies for misunderstanding.

      Kozulic's problem is trusting Behe's shit math. Molecular biologists doing random mutagenesis create disulfide bridges *by accident* all the time.

      Overheard in a lunchroom:

      "Hi Bob, why so glum?"

      "Was doing random mutagenesis on my protein and made a disulfide bridge by accident."

      "Well Bob, IDer Michael Behe says blind chance can only do that once every 2 billion years, therefore, when it happens it can only be God $%&#ing with you. It's not like IDiots bias their bullshit probability calculations. Only possible explanation: Middle Eastern war deity $%@king with your test tubes."

      Delete
  2. Needless to say, a new function requiring 3, 4 … up to 21 amino acid changes would take much longer than 10^8 generations.
    Isn't Branko misapplying the Behe vs Lynch debate? If I remember correctly, Behe was talking about 2 or more simultaneously required mutations. If so, Branko is implicitly assuming that there's no (nearly) neutral road of mutations to any of these proteins from their common ancestral state, and that up to as many as all 21 substitutions are required to happen simultaneously to get any kind of functional product. That's absurd.

    I'm afraid Branko is the one who should be checking his assumtions at the door.

    ReplyDelete
  3. I find it uncharitable of him not to acknowledge the fact that his misunderstanding of paleoanthropology didn't constitute a challenge to population genetics. His ignorance of paleontology wasn't the issue under discussion. The issue was that he relied on a misunderstanding of paleoanthropology to construct a case against modern evolutionary theory, and when that misunderstanding was exposed he just moved to another topic without acknowledging that his first challenge was null.

    As for this response, I think that he's going on a tangent by not addressing Larry's main points in the original post. The first paper that he cited (Nielsen et al.) is talking about *positively selected* genes and the Behe & Snoke paper (regardless of its merits) is talking about evolutionary innovation and the rise of novel functions at the genetic level. All the while Larry and others have been talking about neutral substitutions that neither harm nor benefit the organism in any measurable or direct manner.

    I think it's time that Dr. Kozulic addresses the main points that Larry discussed in his original post. Is neutral theory broadly consistent with the data we have from the genomes of chimps and humans? If not, then what parameters in Larry's calculation are wrong, inaccurate or perhaps misleading?

    ReplyDelete
  4. Lynch already discussed Kozulic's errors in the very paper Kozulic cites. I'd like to naively try to restate them, if only to test my understanding:

    Kozulic calculates that it is unlikely a protein with a "new function" would have arisen in 5 million years. Then he claims that, in contradiction, we *do* see proteins with new functions: 3000 proteins have 3 nonsynonymous mutations. But:

    a. nonsynonymous mutations do not equal "new function": even nonsynonymous mutations are generally neutral, as Lynch extensively discusses. Both facts Kozulic cites are therefore consistent and suggest there were very few proteins with "new functions" in the last 5myr. Indeed, many of the differences between us are thought to be due to regulatory changes.

    b. Kozulic missed Lynch's discussions of the many, many ways his estimate of the neofunctionalization time may be a significant overestimate, and that the evolutionary pathway Behe and Lynch are discusing is highly atypical and difficult anyway. New functions are much easier to evolve.

    So Kozulic significantly underestimates the predicted number of "new functions", while significantly overestimating the observed number of "new functions".

    Kozulic also cites his vixra paper on singletons, which is embarrassingly wrong. He seems to assume that 'singleton' genes in humans spontaneously appeared from nothing at the moment of ape divergence!

    ReplyDelete
  5. One issue frequently missed, and not just by opponents of evolutionary theory, is the contribution of mechanisms that change multiple residues at a time. Frame shifts, inversions (giving a portion of antisense transcription), wholesale deletions or translocations of segments within and between ORFs, and chimeric recombinations between independent genomes provide a massive increase in the dimensionality of sequence space over that explorable by point mutation alone. They can also (depending on the algorithm and evolutionary model used in alignment) lead to an obscuring of phylogenetic signal.

    Recombination itself substantially increases the likelihood of a particular multi-position change arising, which Behe omits from calculations. If A-only and B-only subpopulations persisting through drift can interbreed, they will produce an A+B sub-sub-population with greater likelihood than if A and B remain reproductively isolated.

    Another issue is the ability of serial subtitution in many regions to completely replace sequence while retaining the same structure, due to the dependence upon property, not absolute side-chain fixity, at many sites. Further, this latitude in substitution over multiple sites provides a more complete exploration of the protein neighbourhood than that allowed by a model that changes only one site against a constrained background. Resolution of many apparent 'singletons' may simply await finer-grade sequential and structural analysis.

    Proteins are modular, as is the relationship between regulatory and exon regions, they are plastic, and function is more important than sequence.

    ReplyDelete
  6. A lot of thoughtful discussion of protein evolution here. But let me ask a different question:

    Branko Kozulić says that

    And that no simple model will work becomes evident, for example, with just a glance at the Beerli & Felsenstein paper.

    That paper is about constructing a maximum likelihood method (using Markov Chain Monte Carlo methods) for inferring migration rates among populations. (More precisely, for inferring the products (Ni mij) where Ni is the local population size of population i and mij is the rate of migration from population j to population i).

    I'm kind-of familiar with that paper, and I don't recall where it says that no simple model will work for evolution of the difference between the human and chimpanzee reference sequences. It does use a data example which shows that the human species is not all one big random-mating population. Not a very new or surprising conclusion. Could that have been what Kozulić fpund relevant?

    ReplyDelete
    Replies
    1. Don't you hate it when they quote your own papers at you and then tell you you don't understand what you meant?

      Delete
  7. Not sure that I understand why creationists fixate on "new functions" in proteins for evolution (usu. of humans) to have occurred. It seems more likely to me that phenotypic changes are primarily due to alterations of developmental "programs", not the production of functionally 'new' proteins.

    ReplyDelete
    Replies
    1. Since we have roughly the same number of protein-coding genes as Caenorhabditis elegans (with its ~1000 somatic cells), it should be obvious that the relationship between the number of functional proteins and phenotypic "sophistication" is anything but straightforward.

      Delete
  8. Kozulic says:

    "Furthermore, we should not lose sight of singletons/orphans found in all sequenced genomes. Population genetics – which deals with changes of allele frequencies in populations – necessarily will remain “agnostic” in relation to the singletons: they are, simply put, beyond its horizon."

    Unfortunately for ID proponents, orphan genes are not a mystery anymore. There's increasing evidence for de Novo gene synthesis from intergenic ancestral sequences as the following new studies report:

    1. Emergence of a New Gene from an Intergenic Region (in Mouse)

    http://www.cell.com/current-biology/abstract/S0960-9822(09)01475-4?cc=y?cc=y

    2. Origin and Spread of de Novo Genes in Drosophila melanogaster Populations

    http://www.sciencemag.org/content/343/6172/769.abstract

    3. Female fly genomes also populated with de novo genes derived from ancestral sequences

    http://phys.org/news/2014-03-female-genomes-populated-de-novo.html

    ReplyDelete
  9. It's great that Branko Kozulic is continuing the dialogue here, but I don't really see what the purpose of this post is other than to change topics. We have a situation where a fairly nicely focussed discussion about genetic drift has been replaced with some blurry ideas about deterministic protein changes of types that are not and have never been predicted to occur in evolutionary theory.

    Kozulic states:"Needless to say, a new function requiring 3, 4 … up to 21 amino acid changes would take much longer than 108 generations. And yet there about 3,000 proteins with 3 amino acid substitutions, over 1,000 proteins with 5 substitutions, etc. A contradiction is thus evident between the experimental data on the one side and the Lynch model plus Li & Durbin and Gronau et al., modeled results on the other, by a wide margin. Therefore, either the starting assumptions of the Lynch model, or of the human population size models, or of both, are false."

    This is exactly the type of conclusion that shows the fallacious reasoning from the ID camp. Lynch shows that complex protein adaptations can certainly occur in reasonable timeframes under a neo-Darwinian evolutionary model. What he also shows and what Kozulic is extrapolating from (e.g. Fig 3) is that it is not reasonable to expect pre-specified pairs of mutations to arise in populations of the size of humans. Because evolution is not theorised to work this way, this tells us nothing about the starting assumptions of Lynch's model nor that used by Li and Durbin.

    As ealloc points out above, Lynch's paper gives plenty of detail about the numbers of neutral amino acid substitutions that can occur in proteins. With this in mind, most of the differences between human and chimp genes should have occurred by drift, in line with all of the discussion that has gone before and with the limited evidence for positive selection found by Nielsen et al.

    ReplyDelete