Monday, May 02, 2016

The Encyclopedia of Evolutionary Biology revisits junk DNA

The Enclyopedia of Evolutionary Biology is a four volume set of articles by leading evolutionary biologists. An online version is available at ScienceDirect. Many universities will have free access.

I was interested in what they had to say about junk DNA and the evolution of large complex genomes. The only article that directly addressed the topic was "Noncoding DNA Evolution: Junk DNA Revisited" by Michael Z. Ludwig of the Department of Ecology and Evolution at the University of Chicago. Ludwig is a Research Associate (Assistant Professor) who works with Martin Kreitman on "Developmental regulation of gene expression and the genetic basis for evolution of regulatory DNA."

As you could guess from the title of the article, Michael Ludwig divides the genome into two fractions; protein-coding genes and noncoding DNA. The fact that organismal complexity doesn't correlate with the number of genes (protein-coding) is a problem that requires an explanation, according to Ludwig. He assumes that the term "junk DNA" was used in the past to account for our lack of knowledge about noncoding DNA.
Eukaryotic genomes mostly consist of DNA that is not translated into protein sequence. However, noncoding DNA (ncDNA) has been little studied relative to proteins. The lack of knowledge about its functional significance has led to hypotheses that much nongenic DNA is useless "junk" (Ohno, 1972) or that it exists only to replicate itself (Doolittle and Sapienza, 1980; Orgel and Crick, 1980).
Ludwig says that we now know some of the functions of non-coding DNA and one of them is regulation of gene expression.
These regulatory sequences are distributed among selfish transposons and middle or short repetitive DNAs. The genome is an extremely complex machine; functionally as well as structurally it is generally not possible to disentangle the regulatory function from the junk selfish activity. The idea of junk DNA needs to be revisited.
Of course we all know about regulatory sequences. We've known about this function of non-coding DNA for half a century. The question that interests us is not whether non-coding DNA has a function but whether a large proportion of noncoding DNA is junk.

Ludwig seems to be arguing that a significant fraction of the mammalian genome is devoted to regulation. He doesn't ever specify what this fraction is but apparently it's large enough to "revisit" junk DNA.

The biggest obstacle to his thesis is the fact that only 8% of the human genome is conserved (Rands et al., 2014). Ludwig says that 1% of the genome is coding DNA and 7% "has a functional regulatory gene expression role" according to the Rands et al. study. This is somewhat misleading since Rands et al. specifically mention that not all of this conserved DNA will be regulatory.

All of this is consistent with a definition of function specifying that it must be under negative selection (i.e. conserved). It leads to the conclusion that about 90% of the human genome is junk. That doesn't require a re-evaluation of junk.

In order to "revisit" junk DNA, the proponents of the "complex machine" view of evolution must come up with plausible reasons why lack of sequence conservation does not rule out function. Ludwig offers up the standard rationales ...
  1. Some ultra-conserved sequences don't seem to have a function and this "shows that the extent of sequence conservation is not a good predictor of the functional importance of a sequence."
  2. The amount of conserved sequence depends on the alignment and alignment is difficult.
  3. About 40%-70% of the noncoding DNA in Drosophila melanogaster is under functional constraint within the species but not between D. melanogaster and D. simulans. Therefore, some large fraction of functional regulatory sequences might only be conserved in the human lineage and it won't show up in comparisons between species. (Does this explain onions?)
The idea here is that there is rapid turnover of functional DNA binding sites required for regulation but the overall fraction of DNA devoted to regulation remains large. This explains why there doesn't seem to be a correlation between the amount of conserved DNA and the amount that can possibly be devoted to regulating gene expression. The argument implies that much more than 7% of the genome is required for regulation. The amount has to be >50% or so in order to justify overthrowing the concept of junk DNA.

That's a ridiculous number, but so is 7%. Imagine that "only" 7% of the genome is functionally involved in regulating expression of the protein-coding genes. That's 224 million base pairs of DNA or approximately 10 thousand base pairs of cis-regulatory elements (CREs) for every protein-coding gene.

There is no evidence whatsoever that even this amount (7%) of DNA is required for regulation but Ludwig would like to think that the actual amount is much greater. The lack of conservation is dismissed by assuming rapid turnover while conserving function and/or stabilizing selection on polymorphic sequences.

The problem here is that Ludwig is constructing a just-so evolutionary story to explain something that doesn't require an explanation. If there's no evidence that a large fraction of the genome is required for regulation then there's no problem that needs explaining. Ludwig does not tell us why he believes that most of our genome is required for regulation. Maybe it's because of ENCODE?

Since this is published in the Encyclopedia of Evolutionary Biolgoy, I assume that this sort of evolutionary argument resonates with many evolutionary biologists. That's sad.


Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genetics, 10(7), e1004525. [doi: 10.1371/journal.pgen.1004525]

96 comments:

  1. Hmm. Even at only 1140€ I don't think I'll be buying myself a set.

    ReplyDelete
    Replies
    1. what about the corelation between the species complexity and the anoumt of junk in this species?

      secondly- how is that that some species have so small anoubt of junk when other have a lot?

      Delete
    2. what about the corelation between the species complexity and the anoumt of junk in this species?

      There isn't much of a correlation. Drosophila melanogaster, for example, is very complex but it's genome is twenty times smaller than our genome. The loblolly pine genome is eight times larger than our genome.

      secondly- how is that that some species have so small anoubt of junk when other have a lot?

      We don't know for sure. It seems as though excess junk DNA is slightly deleterious but it can't be eliminated by negative selection in species with small effective population sizes. (These species tend to be large complex species.) In species with large effective population sizes, the junk can be purged by natural selection. This doesn't explain everything.

      Some species may have evolved more effective ways of eliminating mobile genetic elements (e.g. transposons) before they become established.

      Delete
    3. secondly- how is that that some species have so small anoubt of junk when other have a lot?

      Some sleek cars carry a puncture repair kit. Others with room to spare, just carry 100 spare tyres ...

      introns, which are mostly junk

      For an alternative view try 3rd Edition of Evolutionary Bioinformatics

      Delete
    4. Not sure I'm getting the analogy here. What makes some genomes "sleek" and gives others "room to spare"? In what way is junk DNA equivalent to spare tires?

      Delete
    5. hi prof moran. i found this paper interesting:

      http://www.ncbi.nlm.nih.gov/pubmed/23759593

      its true that some arent fit with this but the interesting thing is that *most of them* do fit with the complexity.

      you said:

      "Some species may have evolved more effective ways of eliminating mobile genetic elements (e.g. transposons) before they become established. "

      i think that i found evidence that most of the mutations on coding genes are actually not neutral.

      Delete
    6. @dcsccc

      That paper is from John Mattick's lab. He's one of the very few people who believe in a significant correlation between "complexity" and genome size. His data depends very much on how you define complexity. In order to see the major flaw in his argument look at ...

      Dog's Ass Plots

      Delete
    7. dcsccc says,

      i think that i found evidence that most of the mutations on coding genes are actually not neutral.

      Who are you and why haven't you published this in Nature or Science?

      Delete
    8. " In order to see the major flaw in his argument look at "-

      the problem is that the paper check no less than 153 eukaryotic genomes. so it's not just few examples here and there:

      "Here we extend on that work and, using data from a total of 1,627 prokaryotic and 153 eukaryotic complete and annotated genomes, show that the proportion of ncDNA per haploid genome is significantly positively correlated with a previously published proxy of biological complexity, the number of distinct cell types"-

      so complexity may not mean the number of genes but the number of cell types.


      "Who are you and why haven't you published this in Nature or Science?"-

      i just an amateur in the creation-evolution topic. i just think about this example: according to evolution fly and mosquito split off about 250 my ago. fly generation is about less than one month. so even if one generation mean only 1 new mutation we will need only 10^8 month to change his entire genome. or about less than 10^7 years. so fly and mosquito are suppose to be different in about their entire genomes from each other. far from reality.

      Delete
    9. OMG, that last paragraph is just too hilarious, dcscccc. "Just an amateur"? That's putting it lightly.

      Delete
    10. dcscccc,

      "i just an amateur in the creation-evolution topic. i just think about this example: according to evolution fly and mosquito split off about 250 my ago. fly generation is about less than one month. so even if one generation mean only 1 new mutation we will need only 10^8 month to change his entire genome. or about less than 10^7 years. so fly and mosquito are suppose to be different in about their entire genomes from each other. far from reality."

      This assumes no selection, which is unrealistic. You would have to fix that and reframe your hypothesis.

      When you do, mosquitoes provide you with a great way to test your new hypothesis. Many tropical mosquitoes do complete a life cycle every month or so. In contrast, there are a great many species of temperate and subarctic mosquito species that are univoltine, and would require 10^8 years to get to those other tropical species. I look forward to your results some day.

      Delete
    11. "This assumes no selection, which is unrealistic. You would have to fix that and reframe your hypothesis."-

      it may be even worse. im talking about neutral mutations. beneficial mutations are more rare. so i only take about 1 new neutral mutation that will fix in the population. so again- something doesnt fit with the data. but what? my conclusion is that most of the mutations arent neutral. if they was- flyes and mosquitoes need to be much more different.

      Delete
    12. so again- something doesnt fit with the data. but what?

      Your understanding of it.

      From your understanding of population genetics, how likely is it that any particular neutral mutation will be fixed in the population?

      Also, like Chris B says, you are pretending that natural selection does not exist. Even the most stubbornly uninformed creationists accept natural selection. But you don't, it seems.

      Delete
    13. "From your understanding of population genetics, how likely is it that any particular neutral mutation will be fixed in the population?"-

      as far as i know any generation add about 100 new mutations to the genome.

      and again- how natural selection make any different when we are talking about 100 new mutations per one generation? of course that there is more neutral mutations then beneficial ones. so the molecular clock need to base about neutral mutations.

      Delete
    14. I'll my question again, and maybe you'll actually answer it this time:

      From your understanding of population genetics, how likely is it that any particular neutral mutation will be fixed in the population?

      Your problem may be that you do not understand what is meant when a mutation is said to have been "fixed." Is that the cause of your confusion?

      Delete
    15. Poor dcscccc. He's trying to disprove the mathematics of population genetics about a century too late.

      The math that governs population genetics is the same well understood math that governs lotteries, games of chance, etc. If you want to disprove that, you'll need to find me a whole bunch of lotteries and casinos that are losing lots of money.

      Let me know when Vegas crumbles into the desert, dcscccc.

      Delete
    16. Moreover he's wrong about some crucial numbers. Keightley et al. (2014) estimate a mutation rate for Drosophila which leads to roughly 1 mutation per diploid genome per generation. That's off by a factor of 100 from his estimate.
      The precise timing of the basal divergences in crown Diptera is subject to quite a few open questions, with dates that are quite far apart and large errors. Hopefully this can be addressed with a more complete taxon sampling and additional calibration points.

      Delete
    17. "how likely is it that any particular neutral mutation will be fixed in the population?"-

      if we are talking about a specific base then the chance is one per the whole population.


      " estimate a mutation rate for Drosophila which leads to roughly 1 mutation per diploid genome per generation. That's off by a factor of 100 from his estimate"-

      its actually what i have said in my first claim. here it again:

      "fly generation is about less than one month. so even if one generation mean only 1 new mutation we will need only 10^8 month to change his entire genome. or about less than 10^7 years. so fly and mosquito are suppose to be different in about their entire genomes from each other. far from reality.


      "The precise timing of the basal divergences in crown Diptera is subject to quite a few open questions, with dates that are quite far apart and large errors."-

      the simple test is to find the first fossils of both flyes and mosquitoes. both are very old in the fossil record.

      Delete
    18. When you say "entire genome" are you really referring to the entire genome or just the fraction that isn't under selection? Pretty sure you wouldn't be able to align the junk DNA of a fruit fly with that of a mosquito, so your prediction holds true for that, as far as we can tell. Of course sequence alignments that include both fruit flies and mosquitos are always of highly conserved sequences which would not be expected to meet your prediction.

      Delete
    19. hi john. i do refer to the whole genome. fly have only a small amount of "junk". so i do talk about coding regions. i said that among coding regions most of the mutations cant be neutral.

      Delete
    20. That all depends on what "most" means. Every single third position could be evolving neutrally and still leave the sequences easily alignable.

      Delete
    21. the simple test is to find the first fossils of both flyes and mosquitoes. both are very old in the fossil record.

      You are talking to a paleoentomologist involved in a number of projects where finding the oldest fossil representative of crown groups is key. If you think it's simple, you are frankly ignorant of what that entails - you are often looking at partially preserved fossils, which may or may not show some of the apomorphies for the clade in question and placement within stem or crown groups is often problematic (see for instance the discussion about the inclusion of "Roachoid" fossils to date the insect tree with Misof et al. (2014), Tong et al. (2015) and Kjer et al (2015)). Blagoderov et al. (2007) have a number of likely crown Dipterans at ~220Ma and while there are older possible Dipterans they might be stem group representatives.

      Delete
    22. But those are roachoids, while he is talking about "flyes" and mosquitoes, which were separately designed and created by God, who so loved malaria that he gave it to a good part of the world.

      Delete
    23. " you are often looking at partially preserved fossils,"-

      i check out and found that the oldest mosquito is date about 95my when the oldest fly is even older. this fact alone give us the data the they split off not less then 100my ago. very simple.

      Delete
    24. Burmaculex antiquus from the Burmese amber is the oldest known member of the Culicidae (see Borkent and Grimaldi, 2016). The best date for Burmese Amber is 99Ma. But Metarchilimonia krzeminskorum is very likely to be a Nematoceran and dated to ~220Ma, while Prosechamyia trimedia is a likely Brachyceran from the same locality (both from Blagoderov, 2007). Not so simple.

      Delete
    25. why not? we have both flyes and mosquitoes in at least 90 my layers. we can conclude the *minimum* time of spliting event.

      Delete
    26. Again, what is your claim? Is it just that most non-silent substitutions in protein-coding regions can't be neutral? Is anyone claiming otherwise? (I would amend that to "most can't be neutral at the same time", which is a bit different.)

      Delete
    27. ammm. i think it may also include the silent substitute.

      Delete
    28. I confess that I have no idea what "ammm. i think it may also include the silent substitute." means.

      Delete
    29. im talk about "silent mutations". that may not be silent at all.

      Delete
    30. Of course, the non-silent silent mutations! Why didn't we think of that? (Slaps forehead.)

      You'll of course want to distinguish those from the silent non-silent mutations, right?

      Delete
    31. Again, no clue as to what you're talking about.

      Delete
    32. @ dscccc

      Sorry, I seem to have overlooked your earlier attempt to answer my question:

      "how likely is it that any particular neutral mutation will be fixed in the population?"-

      if we are talking about a specific base then the chance is one per the whole population.


      I think that response goes into the category of "Not even wrong."

      Delete
    33. @ John Harshman,

      My guess is that dscccc is hoping (perhaps even praying) that some new technology will be developed in the future that will find functions in things like pseudogenes. I don't know why he doesn't just pray for someone to invent a camera that can take pictures of God.

      Delete
  2. BE PATIENT. Please note that according to the Elsevier site, the release date of this volume, edited by Richard M. Kliman, is 13th May 2016 ( See Here ).

    Although Larry at the University of Toronto has acquired access, my Canadian institution (like many others I suspect) has been less forthcoming.

    ReplyDelete
  3. Does Ludwig suggest a reason for ultraconserved elements to be, um, ultraconserved if selection isn't it? I can't think of one myself, but Ludwig is clearly a very clever fellow.

    ReplyDelete
  4. Larry,
    I know very well that you believe it and you also think that you can prove it that roughly 90% of human genome, according to your best knowledge is junk; right?

    You have also expressed your "feelings" when some criticized that chimp and human genomes as nearly identical. Right?

    If you stand by all or most of those statements, what do you think makes the phenotypical differences between the 2 species; humans ans chimps that is?

    ReplyDelete
    Replies
    1. There's a simple answer if you care to think. "Nearly identical" isn't the same as "identical". A few thousand out of the 40 million or so differences between human and chimp are enough to account for the phenotypic differences.

      Delete
    2. I thought the difference was due to the different gene expression.

      Delete
    3. Hint: gene expression differences between species depend on DNA sequence differences.

      Delete
    4. Don't be silly. Everyone knows gene expression is controlled by Our Lord and Saviour Jesus Christ, who shrinks himself down to an itsy bitsy Saviour and then goes around controlling which genes are expressed, and how, in every single cell. You and your education.

      Delete
    5. John Harshman: Hint: gene expression differences between species depend on DNA sequence differences.

      Does it ? epigenetic mechanisms like dna methylation and histone modifications play no role ?

      Ever heard about repressor proteins that attach to silencer regions of the DNA ?

      Delete
    6. Where are repressor proteins encoded, how are silencer regions identified?

      Think!

      Delete
    7. Everyone knows gene expression is controlled by Our Lord and Saviour Jesus Christ, who shrinks himself down to an itsy bitsy Saviour and then goes around controlling which genes are expressed, and how, in every single cell.

      That is during his time off from cramming propellers into the rear ends of bacteria.

      Delete
    8. And think some more: differences among species must be in something that is stably inherited over many generations. What do you know that's stably inherited over many generations?

      Delete
    9. What do you know that's stably inherited over many generations?

      A: Not epigenetic changes.

      Delete
    10. Asking Otangelo to think? That's just not fair.

      Delete
    11. John Hashman,

      "Nearly identical" (human and chimp genomes) isn't the same as "identical". A few thousand out of the 40 million or so differences between human and chimp are enough to account for the phenotypic differences."

      If the genomes in question were identical, would we really have this discussion John?

      If yes, would you still hold your ground?

      If the(genomes) between chimp and human are "nearly identical", WHY can't the evolutionary experimental biologist tweak some of the "master genes" in chimps, so that they can develop at least SOME of the features that humans do possess?

      If it was so easy for random processes, why can't the opposite happen; why can't intelligent processes can do better? What is the stumbling block? It must be the creationists again...They believe in miracles too much...

      Delete
    12. If chimp and human genomes were identical nobody would be claiming that all the differences were genetic. I suppose one could tweak a few chimp sequences. But why? It sounds unethical. I know you think you're making some kind of point, but no, you are not.

      Now, as for "easy". First, natural selection is not a random process, though it has a random component. Second, who said it was easy? We are a good 10 million years (5 million each way) separated from chimps. How is something that took 10 million years to be considered "easy"?

      Delete
    13. First, natural selection is not a random process, though it has a random component.

      As written this is not even wrong. I wants to say something like "natural selection is a random process where all the RVs are dirac distributed", which then would turn it into a statement that is merely wrong. I struggle to try envisioning what a process that is "not a random process, but has a random component" would even look like, but the first thing to note is that random processes are very general and include the subset of deterministic processes.

      Delete
    14. I know all the words in your reply, but none of the sentences. I'm a biologist, not a mathematician. But I do know what Eric means by "random process", and it isn't what natural selection is.

      I would be somewhat interested to know what "random processes are very general and include the subset of deterministic processes" means.

      Delete
    15. See, I don't know what Eric means by "random process". In my experience creationists aren't very good at using mathematical terms correctly. But the term has a precise definition - it's a collection of random variables X_t, where t in T and T is a totally ordered set.
      A deterministic process is a stochastic process where all X_t are deterministic (so you could state equivalently: X_t independent from X_t for all t in T, or X_t ~ dirac (x(t))) i.e where each X_t takes a particular value with probability 1 and the set of all other values has a probability of 0.

      Delete
    16. Did you seriously intend that as an explanation that would communicate something to me, or are you just demonstrating your mathematical superiority? If the former, it didn't work.

      I believe that by "random process" Eric means a series of tornados in junkyards.

      Delete
    17. It is a bit sketchy, but I'm really not sure where to start from. If you want to build this up from the ground, we'd have to start with the Kolmogorov axioms for a probability space, then define random variables and using that we can define a random process. But that itself could easily be the syllabus for an introductory course in probability theory if you include a couple of corollaries. The key idea here is that a random variable can take values from a particular set and it has a probability distribution that allows you to assign probabilities to some subsets of that set. If you use a set containing only one element, you get the special case of a deterministic variable.

      Delete
    18. Simon Gunkel,

      Would you be comfortable with the statement that natural selection is not a random process but has a stochastic component to it?

      Delete
    19. No, I wouldn't. That statement simply doesn't make any sense at all. Natural selection is a stochastic process (and that's not a particularly interesting statement, because stochastic processes are a very inclusive category). I don't know what a "stochastic component" is. It seems like it could be a term from probability theory, but it isn't.

      Delete
    20. ...random processes are very general and include the subset of deterministic processes.
      Are you saying that evolution by ID is a random process? I would laugh so hard.

      Delete
    21. Simon Gunkel says,

      I don't know what a "stochastic component" is. It seems like it could be a term from probability theory, but it isn't.

      You're nitpicking again. You know damn well what he meant.

      Here's how Dan Graur describes the situation in his latest book Molecular and Genome Evolution.

      Stochastic models assume that changes in allele frequencies occur in a probabilistic manner. That is, from knowledge of the conditions in one generation, one cannot predict unambiguously the allele frequencies in the next genration; one can only determine the probabilities with which certain allele frequencies are likely to be attained.

      Obviously, stochastic models are preferable to deterministic ones, since they are based on more realistic assumptions. However, deterministic models are much easier to treat mathematically, and under certain circumstances, they yield sufficiently accurate approximations.


      Chris B was clearly referring to the fact that a beneficial allele won't always be fixed in a population. Population genetics models can predict the probability of becoming fixed but for an individual allele it can't tell you for sure whether it will be lost or become fixed.

      That's what he meant by a "stochastic component."

      But you knew that, didn't you?

      Delete
    22. why can't intelligent processes...do better?

      It's a matter of speed and information.

      Let's take a random process like acquisition of resistance to a particular antibiotic by some strain of bacteria.

      The intelligent humans working for drug companies don't know about this random mutation (or acquisition via horizontal gene transfer) until antibiotics are used in patients and don't work. (This also performs selection for the resistant bacteria, since the non-resistant ones won't survive the antibiotic.) The advantage of the resistant bacteria will last until the humans figure out which if any other current antibiotics will work, or they develop a new antibiotic.

      The random processes of nature are constantly developing such mutations or performing such HGTs among trillions upon trillions of bacteria and viruses, which can provide an advantage until humans acquire sufficient information to counteract that advantage.

      Delete
    23. @Larry:
      You're nitpicking again. You know damn well what he meant.

      When I write something like I don't know what a "stochastic component" is, I bloody well mean that I don't know what a "stochastic component" is. Chris Bs statement is like saying "a blue car is not a car, it just has a cary component".

      Delete
    24. Simon-
      A ‘stochastic component’ is a fancy way of saying ‘random element’, which is a fancy way of saying there is an aspect of the situation that is unpredictable (or too difficult to calculate) given our current models.

      People use the math without really understanding the definitions that the math is based on.
      It works reasonably well most of the time;
      But I would suggest the vast majority of the time what are called ‘confidence’ intervals are actually ‘false confidence’ intervals given the differences between the mathematical principles and the actual application to the data set at hand.

      A personal peeve.

      Delete
    25. Simon Gunkel,

      "a blue car is not a car, it just has a cary component".

      No. For your statement above to be analagous, I would have had to have said:

      'Natural selection is not a selection, it just has a selectiony component.' Your analogy is all wrong, and if I had said such a thing, it would indeed have not made any sense. Your analogy also suggests random and stochastic are equivalent. Are you ok with having said that? Feel free to nitpick.

      What I actually said was:
      "Natural selection is not a random process but it has a stochastic component."

      I am not a mathematician, so I may have used those terms in a way mathematicians would not agree with. Allow me to explain (from my perspective as a biologist):

      Natural selection is not a random process, but it is not deterministic either. If it were, we would be able to predict the fate of every mutant allele under selection with 100% accuracy. In my undergraduate ecology and evolution core course in biology (1988) we used computer simulation to consider deterministic and stochastic models of allele frequency changes over time with selection coefficients as part of the input. Stochastic here meant random sampling from a probability distribution.

      In real life, beneficial alleles that arise through mutation can be lost by chance. A variety of biotic and abiotic processes impinge on the survival probability of alleles from when a new allele appears until its fixation or extinction. These impose stochastic effects on allele survival and therefore on evolution.

      So you see, when you went on to say:

      "Natural selection is a stochastic process (and that's not a particularly interesting statement, because stochastic processes are a very inclusive category)."

      You misunderstood. Natural selection has an element of chance to it (which I described as a stochastic component), but is most certainly not random. And I think it is in fact a very interesting statement, because it is the reason you are sitting at a computer typing with five fingers on each hand.

      Delete
    26. Your analogy also suggests random and stochastic are equivalent.

      Well "random process" and "stochastic process" are synonyms. So are "random variable" and "stochastic variable". I admit that there's a tradition among biologists to use the term "random" in some willy nilly fashion, without ever clearly defining it (and without any real need I might add - in every instance there is a way to make these statements in an unambiguous fashion by using other words that are clearly defined).

      But you are still missing the key point: the set of stochastic processes includes the set of deterministic processes. There is no deterministic process that is not also a stochastic process. Hence saying "natural selection is not a random process" is something I find very odd, because I can't really think what that might look like (if I were a mathematician I'd probably know some weird process that's outside of the scope of random processes, but I'm not, so I don't).

      Delete
    27. If random process and stochastic process are synonyms and deterministic processes are a subset of stochastic processes, then deterministic processes are stochastic:
      "There is no deterministic process that is not also a stochastic process."

      Explain how.

      Delete
    28. "Hence saying "natural selection is not a random process" is something I find very odd, because I can't really think what that might look like "

      It would look like adaptive features in living creatures. There is a reason why an insect population with variation in insecticide resistance shows increased resistance to that insecticide after prolonged exposure to it, rather than as if resistance alleles had no differential survival and were inherited essentially stochastically.

      Delete
    29. Explain how.
      Let me simplify this and just discuss how deterministic variables are random variables (you can treat any stochastic process as a random variable, so that's even WLOG). A random variable can take values in some set S and there is a subset of subsets of S (which is a sigma-algebra) for which a probability is defined. For a deterministic variable X there is a value x, so that for all sets A_i for which we have defined a probability p(A_i)=1 iff x in A_i and p(A_i)=0 iff x !in A_i.

      t would look like adaptive features in living creatures.

      Can you do this without handwaving? Because Fisher-Wright is a stochastic process. Moran is a stochastic process. The diffusion approximation is a stochastic process. The LLN approximation leading to logistic growth of allele frequencies is a stochastic process... I'm not aware of any model of selection that is not a stochastic process. Since these models do explain adaptations, I see no need to come up with "nonrandom" processes, which would require maths beyond what is currently used in population genetics.

      I also note that when you write "no differential survival and were inherited essentially stochastically" you are basically describing strict neutrality, which is quite a far shot from Larrys interpretation: "Chris B was clearly referring to the fact that a beneficial allele won't always be fixed in a population". If you intended to say "case where s=0" and Larry found that to clearly refer to the case where s>0 then I find vindication for my claim that your use of these terms is at least ambiguous.

      Delete
    30. Simon, when you start by saying "Let me simplify this," and wind up a mere two sentences later with "for which we have defined a probability p(A_i)=1 iff x in A_i and p(A_i)=0 iff x !in A_i," I would be reluctant to claim vindication. Perhaps you are making a very precise logical/mathematical formulation, but I wouldn't know. As a layperson I've found your recent explanations and "simplifications" obscure, and Larry's translation to much plainer English a welcome relief.

      Delete
    31. "I also note that when you write "no differential survival and were inherited essentially stochastically" you are basically describing strict neutrality,"

      No, not a neutral model. I am referring to a scenario where different alleles have differential survival (i.e., are under selection) but through reproduction and various environmental factors are also subject to sampling from a frequency distribution. Not every gamete produces a reproductive individual, nor does every member of a population produce offspring. Therefore while certain alleles may have a greater probability of survival, because the real world is not deterministic, these alleles do not necessarily and inexorably proceed to fixation.

      Delete
    32. Although it is right above here for all to see, I guess I have to restore the part of the sentence you focused on to its original context:

      "There is a reason why an insect population with variation in insecticide resistance shows increased resistance to that insecticide after prolonged exposure to it, rather than as if resistance alleles had no differential survival and were inherited essentially stochastically."

      I was not referring to the strictly neutral situation, in fact I was drawing a distinction between selection and the strictly neutral model.

      So when Larry said:"Chris B was clearly referring to the fact that a beneficial allele won't always be fixed in a population". He was exactly right. That's what I said, and it's a rather simple concept.

      Delete
    33. As a layperson I've found your recent explanations and "simplifications" obscure.

      The problem is that I'm making a point about the incorrect use of a mathematical term. It's hard to explain this without getting technical. But maybe an example helps:
      Let's start with a coin toss. There are two possible outcomes of a coin toss, Heads (H) or Tails (T). We construct a sample space, which is the set of possible outcomes:
      O={H,T}
      Now, we find that there are 4 possible events (sets of outcomes):
      {H},{T},{H,T} and {}. If we give heads a probability of P({H})=p, we find:
      P({T})=1-p
      P({H,T})=1
      P({})=0
      A random variable is a function from O to some other set S (there are further conditions, but I'll skip them). So we could define a random variable X that is 1 if the coin toss is heads and 0 if it is tails. But we could just as well define a random variable that it 1 if the coin is heads and also 1 if the coin is tails. In other words, that random variable is always equal to 1, no matter how the coin lands. It is also deterministic.

      Chris B: It's still not clear to me, what you mean by a non-random process precisely and your recent post have been of no help there. So something that is deterministic is not a random process in your view. Nor is a process that is not deterministic in some circumstances, apparently. But it's unclear where you'd draw the line.

      Delete
    34. The unpredictable and the predetermined unfold together to make everything the way it is. It's how nature creates itself, on every scale, the snowflake and the snowstorm.

      –Tom Stoppard, Arcadia, Act 1, Scene 4

      Delete
    35. Simon,

      I don't know how I could re-describe a nonrandom process when you try so very hard not to get the point.

      Let's start real simple. Would you agree that alleles under positive selection have a higher probability of being inherited than a purely neutral allele?

      Delete
    36. Let's start real simple. Would you agree that alleles under positive selection have a higher probability of being inherited than a purely neutral allele?

      Of course I would. You can take it as a given that I accept population genetics and I have at least some competence with the commonly used models. My argument is with the unclear use of terms like "random" in a verbal description of what these models say.

      Delete
    37. "My argument is with the unclear use of terms like "random" in a verbal description of what these models say."

      Good. So allow me to define some terms, then, as I meant them. In a deterministic model of allelic inheritance, if Aa mates with Aa, 25% of offspring will be AA, 25% aa and 50% Aa. There is no stochasticity, no sampling from a probability distribution; we can iterate generation after generation, and always have the same allele frequency in the population. That is not random. That is what I meant by deterministic.

      So when you say:
      "There is no deterministic process that is not also a stochastic process."
      I don't know what you mean and your recent posts have been of no help.

      On the other hand, in real life the survival of alleles is not deterministic, but subject to stochastic processes. With neutral alleles, this equates to random genetic drift. Perhaps your problem with my terminology is that I was making a distinction between truly random inheritance of neutral alleles, and inheritance of alleles under selection, which are affected by their selection coefficient. Both neutral alleles and alleles subject to a selection coefficient are subject to stochastic effects, but they differ in the probability distributions they are sampled from.

      So when I said, many moons ago, it seems, that:

      "natural selection is not a random process but has a stochastic component to it"

      what I meant was that natural selection is not random like neutral allele inheritance, but has a different probability distribution that is sampled from, and cannot avoid stochastic effects because in real populations these alleles, no matter how advantageous, are part of a frequency distribution and can be lost due to chance. Likewise, deleterious alleles can also increase in frequency due to chance. It is not deterministic in the sense I defined above.

      Are we making any progress here?

      Delete
    38. Chance and necessity, to coin a phrase.

      Delete
    39. Chris B: The problem is that deterministic isn't the same as not random. When you state that there is no sampling from a probability distribution that is technically incorrect - you can express this as sampling from a dirac distribution. In fact deterministic events are a key part to defining probabilities in the first place - a sigma algebra on some set S by definition includes the deterministic event S and one of the axioms for probability measures is that the probability of S is 1 (that's the reason you can not get probabilities greater than 1). Without this part of the definition your probability measure isn't a probability measure, but just a measure.
      An an analogy consider rational, irrational and real numbers. 1/3 is a rational number and it's not an irrational number, but it is a real number. Likewise a deterministic process is not a non-deterministic process, but it is a random process.

      Both neutral alleles and alleles subject to a selection coefficient are subject to stochastic effects, but they differ in the probability distributions they are sampled from.
      But that's not really true, is it? Under a Fisher-Wright model the case s=0 has a binomial distribution and in the case s!=0 it has a binomial distribution. Under the diffusion approximation the case s=0 has a normal distribution and the case s!=0 has a normal distribution. In both cases there are 3 biologically relevant parameters to the model (N, s and p) but the distributions are fully described by 2 parameters, which means that you can get the same distribution with the same parameters from different values of the biological parameters. For instance
      N=10000, s=0, p=.5
      and
      N=10000, s=.693 (well, ln(1.5) to be precise), p=.4
      and
      N=10000, s=.04, p=.49
      have the same distributions.

      It's also worth noting that neutral models still have states in which the transition is deterministic. If s=0 the transition probabilities from p=0 to p=0 and from p=1 to p=1 are 1.

      With neutral alleles, this equates to random genetic drift.
      That's not quite correct. While most textbooks only treat drift explicitly in the neutral case (which leads to people confusing neutrality with drift), drift in the general case is the centered component of your model, i.e. it's expectation is 0. But it's worth noting that it generally still depends on N,s and p.

      Delete
    40. Simon Gunkel,

      From your response, it's clear this 'discussion' was effectively over a while ago. Thanks for your input.

      Delete
  5. Junk DNA=non-functional DNA whose sequences are not conserved
    Selfish DNA=non-functional DNA whose sequences are nevertheless conserved
    non-coding DNA=DNA sequence whose expression does not produce a protein but which serves a role in regulation of those which do...aka regulatory gene
    coding DNA=DNA sequence whose expression produces a protein...aka gene...often mistaken as the only genes
    spacer DNA=DNA sequence which is not conserved but whose length is essential to controlling binding and gene regulation...functional but not a gene

    Are these definitions feasible for a coherent discussion?

    ReplyDelete
    Replies
    1. I don't think Selfish DNA is necessarily non-functional. I could imagine active transposons that besides being active also contribute to some important developmental or regulatory function. Besides that I think your list is a good starting point.

      Delete
    2. Junk DNA: DNA that has no function. It may or may not be conserved.

      Selfish DNA: DNA that is capable of propagating itself. Selfish DNA is functional. It is not junk.

      Non-coding DNA: DNA that is not transcribed to produce mRNA that is subsequently translated to produce a functional protein. Some coding DNA is junk (i.e. pseudogenes). Most of the functional DNA in a large genome is noncoding.

      Coding DNA: DNA that encodes a protein that's produced in some cell. Some of these proteins are junk. An ORF is DNA that could potentially produce a protein but none has been found. ORF's are not genes and not coding DNA.

      Spacer DNA: Functional DNA required to separate other functional DNA sequences or to position functional DNA sequences within the three-dimensional structure of chromatin. The sequences of spacer DNA are not conserved.

      Coding DNA makes up about 1.25% of the human genome and almost all of it is functional. About 3.25% of the rest of the genome (noncoding) is functional. About 8% of the genome is conserved, which suggests that there's more functional DNA to be discovered.

      About 0.1% of the genome consists of active functional selfish DNA elements (transposons and viruses).

      Genes make up about 25-27% of the genome but most of that is introns, which are mostly junk.

      Delete
    3. But selfish DNA isn't conserved; at least, individual copies aren't conserved. It's just that active elements make more copies while inactivated elements don't. Where's the sequence conservation?

      And what mechanism is there to conserve the sequence of junk DNA?

      Delete
    4. But selfish DNA isn't conserved; at least, individual copies aren't conserved.

      If by "conserved" you mean under strong negative selection then you are correct.

      However, when I'm using "conserved" I mean that the sequences of these active transpososn are very similar to those of active transposons in chimps and gorillas. Perhaps it would be better to discuss this by avoiding the word "conserved" since that's a loaded word. Maybe I should say "high degree of sequence similarity"? This could be due to chance.

      And what mechanism is there to conserve the sequence of junk DNA?

      Same problem. Lots of defective transposons and pseudogenes very similar in chimp and human genomes because there were once functional. Those sequences are "conserved" by most analyses even though they are now junk. You can't always tell whether some stretch of DNA is junk or not just by looking at how similar it is in the chimp and human genomes.

      Delete
    5. I mean that the sequences of these active transpososn are very similar to those of active transposons in chimps and gorillas. Perhaps it would be better to discuss this by avoiding the word "conserved" since that's a loaded word. Maybe I should say "high degree of sequence similarity"? This could be due to chance.

      Might I suggest that it's due to the rate of neutral evolution being slow enough that there hasn't been a lot of time to change? It's the same reason that other shared regions of junk DNA are similar among humans, chimps, and gorillas. If you're talking about anything beyond that, you have to talk about families of transposons rather than individual insertions. The families undergo selection at that level, because active transposons that experience the wrong mutations stop being active.

      And I would distinguish between functional, as in useful to the organism and so conserved by selection at the level of the human population, and whatever word you want to apply to selfish DNA that experiences selection of a sort as a population of insertions within each genome.

      I would also reserve the word "conserved" for sequences that are more similar to each other than would be expected from evolution at the neutral rate. Otherwise, isn't just about everything "conserved" between humans and chimps?

      Delete
  6. Coding DNA makes up about 1.25% of the human genome and almost all of it is functional. About 3.25% of the rest of the genome (noncoding) is functional. About 8% of the genome is conserved, which suggests that there's more functional DNA to be discovered.

    Candidates for the other 3.5% that is conserved? Any references or even blog posts (yours or others) discussing this?

    ReplyDelete
    Replies
    1. ~0.2% regulatory sequences
      ~0.3% SARs
      ~0.1% virus
      ~0.1% active transposons
      ~0.3% origins of replication
      ~1% centromeres
      <1% telomeres

      These are revised estimates but see...

      What's in Your Genome?

      Delete
  7. Ludwig paraphrased by Larry: Some ultra-conserved sequences don't seem to have a function and this "shows that the extent of sequence conservation is not a good predictor of the functional importance of a sequence."

    If the paraphrase is accurate, it's a classic non sequitur.

    1. Some conserved sequences don't have a function. Therefore, some functional sequence is not conserved.

    Same as

    2. Some mammals are not bats. Therefore, some bats are not mammals.

    To be fair, just among protein molecules, there are non-conserved residues that are functional. But they are specifically correlated with changes in substrate specificity. E.g. if enzyme X interacts only with ATP and has an aspartate at position 100, and its sister, enzyme X' interacts only with GTP and instead *always* (across many species) has a glutamate at position 100, that's a good clue that the residue at position 100 causes the flip from interacting with ATP to interacting with GTP. But you have to show it's a consistent pattern across more than one species. If the residue at position 100 is not consistently the same thing in the sister enzymes (i.e. across many species, it's some higgledy-piggledy mix of all amino acids) then 100 is probably non-functional.

    Some years ago, Olivier Lichtharge wrote a program called Evolutionary Trace which analyzes 1. conserved residues, 2. non-conserved residues that are correlated with changes in substrate specificity, and 3. non-conserve residues that aren't correlated with anything. Categories 1 and 2 can be identified as functional, category 3 can't be.

    It must be emphasized that category 2 (functional but not conserved), while real, are a tiny fraction of all non-conserved residues, like in the low single digit percentages, or less. The more diverse a set of species you analyze, the more confidence you have in identifying 2-- BUT the number of non-conserved residues goes up, so the relative fraction of non-conserved, functional residues goes down as you analyze more species.

    Short answer: for coding regions, you can be > 95% sure that non-conserved codons aren't functional. If they were functional, there would be some kind of non-random signature you could detect.

    Non-coding nucleotides I don't know about, I haven't seen an analogous analysis for non-coding. I'd assume the same general principles apply.

    ReplyDelete
  8. Larry: About 40%-70% of the noncoding DNA in Drosophila melanogaster is under functional constraint within the species but not between D. melanogaster and D. simulans. Therefore, some large fraction of functional regulatory sequences might only be conserved in the human lineage and it won't show up in comparisons between species.

    Erp, some of the papers totting up how much of the human genome is conserved have looked at conservation within the human species and get about the same results, 7% or 8% conservation overall, IIRC. We here all know that more than one human genome have been sequenced and conservation within the species can be analyzed now. Is Ludwig aware of this?

    As I've mentioned before, when we have sequenced 50+ million human genomes (possible near-term) we'll have a map of all non-lethal human point mutations and in principle, we'll know, for all nucleotides in the genome, which can be singly mutated to what other nucleotide, and which can't be.

    A possible pitfall is that genome sequencing is error-prone and I don't know if technology will reduce the error rate low enough. It could take more than 50+ million human genomes to handle the signal-to-noise problem.

    ReplyDelete
    Replies
    1. I'm math impaired, but election polling seems to argue that even a few thousand samples can lift the signal above the noise.

      Delete
  9. If I printed up a bunch of T-shirts that said "IT DON'T EXPLAIN ONIONS" which of you would buy one?

    ReplyDelete
    Replies
    1. You're too late. At the SMBE meeting in Chicago in 2013 there was a session debating junk DNA. For that session Nick Matzke printed up a T-shirt saying "Keep calm and ask about onions" with a drawing of an onion.

      You can see them here.

      Delete
  10. Who should we believe, Moran the educator or Ludwig the researcher?

    Smart money's on the latter.

    Sorry Larry, you are out of date. You used to rock in the free world. Now you scrounge in the junkyard.

    Whowudathunk? Right? Well, there ya go.

    The world is truly random and unpredictable. In this case, anyway.

    ReplyDelete
    Replies
    1. "Who should we believe, Moran the educator or Ludwig the researcher?"

      So researchers are a better source of up to date knowledge? I guess that explains why ID proponents don't have anything to offer but armchair superficial critiques of evolution rather than any actual evidence to support their fantasies (Grasso being a prime example).

      At least Grasso has a reasonable grasp of the naturalistic science he spams about. You don't even have that. ID is circling the drain, not gaining ground. To gain ground, they would have to be accumulating empirical evidence to support their intelligent designer, which they aren't doing.

      I have to hand it to you in one respect, though, Steve: your self assurance is unflagging, regardless of how weak and content-free your position becomes. Now there's true faith (in the religious sense).

      Delete
  11. Nature is so far ahead of us technologically. Why would anyone think they KN0W most of ncDNA doesn't do anything.

    Thats like my mini-poodle looking at my iphone and barking "it doesn't do anything because well...do you SEE the iphone doing anything? Sheesh humans are dumb.

    Larry, tell me you are not of those who has a 'mini-poodle that thinks iphones don't do anything" kind of mind.

    ReplyDelete
    Replies
    1. Steve asks: "Why would anyone think they KN0W most of ncDNA doesn't do anything."

      Because, you idiot, it's commonly mutated. Every human baby born has 100+ mutations its parents didn't have. If all mutations in functional DNA are deleterious as creationists claim, then most of those 100+ mutations must be in non-functional DNA.

      Moreover, it's mostly not conserved and evolves like neutral DNA. We can compare the genomes of humans against those of other species AND can compare multiple genomes from WITHIN the human species. Even creationists have to concede that all those human genomes descend from a common ancestor and had to collect variations very fast. If most of the genome can vary like it's neutral, then most of it CANNOT be functional.

      Try to BS your way out of that one.

      Delete