Saturday, April 12, 2008

What's Wrong with Modern Science?

 
Yesterday and today I'm hanging out with some people who care abut science education. We've had some wonderful conversations. I'm pleased to lean that there are some very smart people who think there's something seriously wrong with the way modern science is progressing. I was delighted to learn that there are a growing number of scientists who think the peer review system is broken. A lot of junk is being published.

Speaking of junk, there's an essay in this week's Nature that qualifies in more ways than one [Rise of the Digital Machine]. Mark Pagel is a biologist at Reading University (UK). He's one of those people who just can't accept the fact that humans don't have several times more genes than an insect or a nematode.
THEME

Genomes & Junk DNA
Humans are almost unimaginably complex, with trillions of cells organized into hundreds of different tissues. But we have scarcely more genes than a fruitfly or a worm, and only about four or five times as many as brewers' yeast or some bacteria. Surprising then that the human genome is 250 times larger than the yeast's. It comprises about 99% 'junk DNA' — genetic code that is not used to make the protein building blocks of life.
You know what's coming next, don't you? We're going to hear about one of the seven silly excuses for why we don't really have junk DNA (see The Deflated Ego Problem). Here's how Martin Pagel sets up his choice of excuse.
Junk DNA gives every appearance of fulfilling the metaphor of the selfish gene. It accumulates in organisms' genomes simply because it is good at accumulating; it can even be harmful. Why we put up with it has long been a mystery.

Increasingly, it seems that the genes that do code for proteins may recruit some or all of this junk DNA to regulate when, where and how much they are expressed. Because nearly every cell in the body carries a complete copy of the genome, something has to tell the genes that make eyes not to switch on in the back of the head, or genes for teeth to stay silent in our toes. Something has to instruct genes to team up to produce complex structures such as hearts and kidneys, or the chemical networks that create our metabolism and physiology.
Astute readers will see where this is going—he's going to use the "regulatory DNA" excuse. All this will accomplish is to demonstrate; (a) Martin Pagel's inability to reason like a scientist by considering evidence that has been accumulated over four decades, and (b) Nature's inability to recognize good science from bad science.
Genes, in effect, use regulation to promote their interests within the bodily phenotype: it is how they vary their exposure to the outside world. Regulation is how we can have over 98.5% similarity to chimpanzees in the sequences of our coding genes, yet differ so utterly from them.

Indeed, the huge quantity of junk DNA in the genomes of most complex organisms may act as a vast digital regulatory mechanism. Until recently many common machines, such as aeroplanes, clocks, and even computers were analogue devices, regulated by levers, springs, heat or pressure. Aeroplanes were flown with a stick, springs drove clocks. Digital regulation — instructions encoded in strings of binary numbers arbitrarily long, and hence precise — enabled complexity to increase. Stealth fighters and space shuttles are so complex that they can be flown only by digital computers, not (analogue) human pilots.

Similarly, the emergence of digital regulation derived from unused stretches of junk DNA may have precipitated the transition from single cells to complex multicellular organisms. Long runs of the four chemical bases that make up DNA can easily act like binary strings. How these stretches bind to a gene can regulate exquisitely the degree and timing of that gene's expression. Tellingly, bacteria and some other single-celled organisms have negligible amounts of junk DNA. They rely far more on analogue systems of gene regulation that are protein-based and less precise.
This is, of course, complete nonsense. We know for a fact that large amounts of the human genome are really junk. We know for a fact that you can have complex regulation by using only a small percentage of the genome (1000 bp per gene, or less than 1% of the genome, per gene is more than sufficent [Junk in Your Genome: Protein-Encoding Genes]. We know for a fact that some single-celled species (amoeba) have huge amounts of junk DNA and some some complex multicellular species have genomes that are much smaller than mammalian genomes (Drosohila melanogaster.

All these facts can be found in basic introductory textbooks. In addition, there is an abundant scientific literature on junk DNA, explaining why defective transposons (for example) really are junk. Why can't scientists like Mark Pagel, and the Nature reviewers, learn about junk DNA beore spouting off? What's wrong with science today?

Science is a process and that process involves collecting evidence and making hypotheses that explain the data. In this case the author has ignored the data showing that much of our genome is junk. He has ignored the evidence that the regulation of gene expression can be easily accomplished without invoking huge amounts of (non-conserved) DNA. He has constructed an hypothesis to explain something that doesn't need explaining; namely, why humans have the same number of genes as other mammals. He has failed to read the literature and failed to consider alternative explanations.


20 comments:

  1. I also don't quite "get" this digital/analogue dichotomy he's making. As far as I can see, it's got nothing to do with the terms as I understand them from the electronics and computer field. It sounds like a clumsy analogy that fails in its purpose of making him look smart.

    ReplyDelete
  2. We know for a fact that you can have complex regulation by using only a small percentage of the genome (1000 bp, or less than 1% of the genome, per gene is more than sufficent

    I doubt that very much. Check your numbers.

    ReplyDelete
  3. I think he means there are complex multicellular "higher eukaryotes" with genomes that are 1% the size of the human genome. Heck, the genome of Drosophila is 3.7% the size of the human genome.

    Not sure what the 1000 bp bit was about, probably a typo.

    ReplyDelete
  4. If what we are talking about is the length of a promoter region, 1000bp does not strike me as excessively small.

    ReplyDelete
  5. Regulation is how we can have over 98.5% similarity to chimpanzees in the sequences of our coding genes, yet differ so utterly from them.

    Is he being serious?

    I can't believe he said that.

    ReplyDelete
  6. I think Mark Pagel is a very good researcher, and he has done a lot of interesting work with phylogenetic comparative analysis, including with morphological and molecular data and even in language evolution (Pagel et al. 2006; Organ et al. 2007; Pagel et al. 2007; Atkinson et al. 2008). I have talked briefly with him a few times about genome size, though not in any real detail. He's someone with whom I would be happy to collaborate sometime.

    On the specific topic of junk DNA, however, he seems to take argue one extreme view or another -- i.e., it's all junk or none of it is -- and although I disagree with both assessments, neither is uncommon.

    In 1992, Pagel and Johnstone published a widely-cited paper arguing that non-coding DNA accumulated neutrally and was limited by the tolerance of various organisms for it, and I used this as more or less the defining example of the "junk DNA" (vs. "selfish DNA" or "optimal DNA") hypothesis in a review in 2001. I re-did some of the analyses and argued that their purely neutralist explanation was not sufficient (Gregory 2003).

    This more recent case is the opposite, and seems to suggest that most or all noncoding DNA is functional in regulatory capacities -- but again, this does not provide a satisfying answer unless it can address the onion test.

    There is a middle ground that is pluralistic rather than either neutralist or selectionist, but it less frequently argued: most noncoding DNA is not functional but a significant portion of it is, and even the stuff that is not functional is not necessarily inconsequential.

    Atkinson, Q., Meade, A., Venditti, C., Greenhill, S., Pagel, M., 2008, Languages evolve in punctuational bursts. Science 319 588.

    Gregory, T.R. 2001. Coincidence, coevolution, or causation? DNA content, cell size, and the C-value enigma. Biological Reviews 76: 65-101.

    Gregory, T.R. 2003. Variation across amphibian species in the size of the nuclear genome supports a pluralistic, hierarchical approach to the C-value enigma. Biological Journal of the Linnean Society 79: 329-339.

    Organ, C., Shedlock, A., Meade, A., Pagel, M., Edwards, E. 2007. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446: 180-184.

    Pagel, M. and Johnstone, R.A. 1992. Variation across species in the size of the nuclear genome supports the junk-DNA explanantion for the C-value paradox. Proceedings of the Royal Society of London, Series B: Biological Sciences 249: 119-124.

    Pagel, M. Venditti, C., Meade, A. 2006. Large punctuational contribution of speciation to evolutionary divergence at the molecular level. Science 314: 119-121.

    Pagel, M., Atkinson, Q., Meade, A. 2007. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449: 717-720.

    ReplyDelete
  7. I still don’t get the “human genome doesn’t have enough proteins” argument. 20,000 or so protein-coding genes by themselves provides much more than enough to “produce” a fantastic variety of life, even immensely complicated life (such as humans). The way I explain it to students is this:

    Suppose, to be simplistic, that each and every cell in a human body was specified by a unique combination of transcription factors, in a binary (present or absent) fashion. If this were the case, then the minimum number of transcription factors that would be needed to “assemble” a trillion or so cells is remarkably small – on the order of 40 or so.

    Of course, mammalian genomes probably have two orders of magnitude more transcription factors than this. And (to be rough here) three orders of magnitude more protein-coding genes. This is enough transcriptional regulatory capacity for more than 10^100 different cell types. That’s just in the transcription factors, and just for a binary mode of regulation. Factor in more subtle aspects of regulation (concentration dependence of binding, alternative RNA processing, temporal aspects, epigenetic control, to name a few), and the possibilities in what mammalian genomes have are mind-boggling. These possibilities render the “not enough proteins” (as well as the related “not enough genes”) argument meaningless.

    The fact is, nature has sampled a minuscule fraction of the possibilities afforded by even this small suite of genes.

    ReplyDelete
  8. I doubt that very much. Check your numbers.

    Sorry, I meant 1000 bp per gene as explained in the reference I linked to.

    ReplyDelete
  9. Ryan Gregory says,

    This more recent case is the opposite, and seems to suggest that most or all noncoding DNA is functional in regulatory capacities -- but again, this does not provide a satisfying answer unless it can address the onion test.

    Exactly. Surely someone like Pagel should have known this before publishing a paper claiming that most noncoding DNA was involved in regulation?

    You know that's a stupid claim, I know that's a stupid claim, thousands of scientists know that's a stupid claim. How does such nonsence ge published in Nature?

    ReplyDelete
  10. Well, Nature likes snappy and short reviews that are not complicated or nuanced. I would have written a different one page piece, but it would have been messier conceptually and less amenable to hyping new 'discoveries' (note that news about regulatory roles for noncoding DNA has showed up in Nature since the 1990s). They like news, not in depth discussions. I agree that Pagel should have tried to better summarize current information, and it's a bit surprising that he's writing as an authority on the topic, but he still does good work in areas where he has stronger expertise.

    ReplyDelete
  11. I guess Nature isn't such a prestigious journal after all. I'm really disappointed.

    ReplyDelete
  12. It is funny that you are talking just now about junk DNA.

    So please read also my idea for that which is in my page
    http://koti.welho.com/hvirkkun/artikkeli/noncodinge1.html

    You can also put in Google the words
    rufimitrella espoo velu
    to find link to my page.

    Without thinking very much
    important “language-reality relationship” (junk dna- non coding etc) I believe that there is quite good general explanation for that DNA, or at least for that there can not be only “coding DNA” and then small amount of so called regulatory DNA.

    If you accept my idea in my page I can try to suggest something to onion test too:

    The onion cell must leave in much harder
    temperature (and perhaps pressure?) conditions than the human cell inside the body.

    The consequence can be then that in onion cell the DNA as a large biomolecule can not take as elaborate three dimensional structures to support the needed expression in a specific cell.
    Instead in onion cell (nucleus) the DNA perhaps exists more as a lump or clod,
    and the used genes are “biochemically visible” for transcription more by good luck.
    Also it is true that the expression is not needed to achieve per single cell because onion cells can transport quite big proteins (also RNA) through channels (Plasmodesmata ) between them.
    This supports the idea. Tthis can also somehow explain why polyploidy can be useful for plants, because then the lumps or clods can be bigger and there are more genes visible.

    Of course it is difficult to give any exact answer for genome sizes of different organisms and for amount of “junk DNA” there for example because of fully or partially occurred genome duplications in evolution.

    Also must be remembered that multicellularity has been developed separately in plants and animals. Then it is possible that the DNA has adopted different kind of convergences in plants and animals by structure and function like I suggested.

    So space (“junk”) is needed in DNA of multicellular complex organism because the same DNA must support different kind of cell types in organisms. Actually as known cell types are little like different single cell organisms (they use different part of DNA) but they must live in symbiosis because of the common path of inheritance.

    -Sorry my non native english.

    Best regards,
    Heikki Virkkunen
    Finland

    ReplyDelete
  13. So how can we measure and decide that the peer review system is "broken"? A certain percentage of junk has always been published. And the growth of areas with subsequent fragmentation has likely increased that percentage.

    But when is it "broken"? Unless there is an observable qualitative and/or practical threshold, I think the answer will remain subjective.

    Besides, an essay should be personal and subjective, preferably provocative, and in so doing most likely wrong. I see them as supportive for the review system, because they constantly point out the need for review. :-P

    It sounds like a clumsy analogy that fails in its purpose of making him look smart.

    Perhaps a clumsy formulation failed Pagel before that. Seems to me he wants to point out that as opposed to the practical difficulty with inevitable digital discreteness you can in principle reach the same precision as analog systems, but that this gets confused with enabling complexity.

    It is true that digital algorithms is a facile method for enabling complex and/or precise regulation especially, where for example practically realizable sampled dead beat regulators or Kalman filters have tremendous advantages compared to earlier time continuous methods.

    But considering that molecular chemical systems have both discrete and analog aspects, I would agree that the analogy is failed too.

    Btw, he tops this off with confusing the causality - the exemplified systems are complex precisely because they can be so, not because they have to be. IIRC the SST equivalent Buran could be manually piloted, as it was constructed to be robust. [Instead Pagel could have mentioned inherently unstable planes where agility is a premium. No way to fly those manually (except in those rare cases where you can uncouple the responsible wing surfaces to swivel freely).]

    ReplyDelete
  14. You could help by refuting the mistake in _Nature_ Communications Arising.

    http://www.nature.com/authors/editorial_policies/corrections.html

    I would do it if I had access to the article.

    ReplyDelete
  15. Well, Nature likes snappy and short reviews that are not complicated or nuanced.

    I can symphatize that you want to defend Pagel, but why should we give Nature a pass because it likes "snappy and short" reviews, if we don't give a pass to press releases which surely need to be even snappier and shorter?

    ReplyDelete
  16. I remember several "birds are nit dinosaurs" articles in nature, Science and PNAS in the 90's that were not based on phylogenetic analysis (the usual method for establishing questions of ancestry, which squarely supports that birds are dinosaurs). They were based on arguments of "mechanistic impossiblity".
    Certain arguments i find t be "intrinsicallt flawed" still have appeal to non-specialists; that way nature may get to the "broad public" with some BS argument, while almost every single TRUE specialist in the field can tell how and why that's nothing but bullcrap.

    ReplyDelete
  17. The cover of this very issue of Nature, for instance. That article says its data suggests that either epithelizoan grade of organization was acquired independently in ctenophores, cnidarians and bilateria or that sponges are a reversion (down to the silicatous skeleton and choanocyte of choanofalgelate anscestors). Yet they did their analysis without placozoa (which ARE basal, monophyletic metazoa) and only two desmosponges (no hexactinellid or calcarean sponges).The authors caution that their taxon sampling is not good for this question, yet they decided to "go for it" which is even a bit of a contradiction, since the title's POINT is, "better taxon sampling" (which it certainly does for other aspects of the tree that seem quite consistent with morphology)

    ReplyDelete
  18. In other words: I don't think any true specialist should be very enthusiastic about that hypothesis.

    ReplyDelete
  19. Maybe it's a good way to create some suspense to carry out the next step in the completion of the taxon sampling. Maybe a little toungue in cheek though hehe.

    I don't believe it. If molecular results do not change upon better phylogenetic sampling (as it usually DOES) such a strech beyond morphological parsimony will be very problematic for me. It would not be a matter of simply assuming that the molecular phylogeny is the true one.

    I can definitely say that there are cases in which the morphological transformations implied are ridiculous (remember that PNAS article, saying that sharks were teleosts?) and that it is safe to assume that there must be an artifact in the molecular data that has not been properly accounted for (sure enough, teleost sharks is already "gone with the wind" on account of better molecular data).

    ReplyDelete
  20. Good points, Sanders (now, to check my medication...)

    ReplyDelete