More Recent Comments

Wednesday, October 21, 2015

The quality of the modern scientific literature leaves much to be desired

Lately I've been reading a lot of papers on genomes and I've discovered some really exceptional papers that discuss the existing scientific literature and put their studies in proper context. Unfortunately, these are the exceptions, not the rule.

I've discovered many more authors who seem to be ignorant of the scientific literature and far too willing to rely of the opinions of others instead of investigating for themselves. Many of these authors seem to be completely unaware of controversy and debate in the fields they are writing about. They act, and write, as if there was only one point of view worth considering, theirs.

How does this happen? It seems to me that it can only happen if they find themselves in an environment where skepticism and critical thinking are suppressed. Otherwise, how do you explain the way they write their papers? Are there no colleagues, post-docs, or graduate students who looked at the manuscript and pointed out the problems? Are there no referees who raised questions?

Let's look at a paper on functional elements in the human genome (Milligan and Lipovich, 2015). It wasn't published in a front-line journal but that shouldn't matter for the points I'd like to make. This is a review article so special rules apply. As a scientist, you are obliged to represent the field fairly and honestly when writing a review. Here's the abstract ...
In the more than one decade since the completion of the Human Genome Project, the prevalence of non-protein-coding functional elements in the human genome has emerged as a key revelation in post-genomic biology. Highlighted by the ENCODE (Encyclopedia of DNA Elements) and FANTOM (Functional Annotation of Mammals) consortia, these elements include tens of thousands of pseudogenes, as well as comparably numerous long non-coding RNA (lncRNA) genes. Pseudogene transcription and function remain insufficiently understood. However, the field is of great importance for human disease due to the high sequence similarity between pseudogenes and their parental protein-coding genes, which generates the potential for sequence-specific regulation. Recent case studies have established essential and coordinated roles of both pseudogenes and lncRNAs in development and disease in metazoan systems, including functional impacts of lncRNA transcription at pseudogene loci on the regulation of the pseudogenes’ parental genes. This review synthesizes the nascent evidence for regulatory modalities jointly exerted by lncRNAs and pseudogenes in human disease, and for recent evolutionary origins of these systems.
The authors are, of course, entitled to their opinion but they are not entitled to state it as if it were a fact. I do not believe that the prevalence of non-coding functional elements is a key "revelation" of the past 15 years.

For one thing, those elements that truly are functional were known BEFORE the human genome was sequenced. For another, it's not true, in my opinion, that there are huge amounts of functional DNA in the human genome. Any scientist who has kept up with the literature will know that the conclusions of the ENCODE Consortium and FANTOM are not universally accepted so they should not be quoted in an abstract as if they were necessarily true.

It would be okay to say something like this, "We believe that ENCODE and FANTOM have demonstrated that much of the human genome is functional but we will review and report contrary evidence and opinions."

The authors say that "tens of thousands of pseudgoenes" are functional but there's no evidence at all that this is true. They also say that a similar number of lncRNA elements are functional but, again, there is no evidence that this is true. There may be lots of people who like to think that tens of thousands of DNA elements are functional (i.e. genes) because they produce functional RNAs but wishing is not evidence.

It would be okay to say, "After an extensive review of the literature we conclude that tens of thousands of pseudogenes, and a similar number of lncRNAs, are functional although we recognize that most scientists will disagree with our opinion."

There's a more fundamental problem with this abstract and it has to do with the connections between genome activities and disease. The implicit assumption in this paper, and in many other papers, is that the locus of disease-causing mutations pinpoints functional regions of the genome. This is not correct. You could easily have a mutation that enhances transcription in a junk DNA region and the aberrant transcription interferes with the expression of a nearby gene. An example might be a spurious mutation that leads to transcription of an adjacent pseudogene from the opposite strand and the resulting antisense RNA blocks translation of the mRNA from the active gene. That does not mean that the junk DNA and the pseudogene now have a function.

You can also have a mutation in the junk DNA part of a large intron creating a new spice site leading to splicing errors that shut down proper gene expression. This does not mean that the site of the mutation has a function and can no longer be considered junk. We need to recognize that many disease-causing mutations might occur in junk DNA. These go by the unfortunate name of "gain-of-function" mutations.

The Milligan & Lipovich paper begins with ....
Redefining the Human Gene Count

Classical definitions of genes focus on heritable sequences of nucleic acids which can encode a protein (White et al., 1994).
You can guess where this is going. The authors are going to make the case that new data has forced us to recognize that there are genes for functional RNAs that don't encode proteins. This is a standard approach for a certain group of scientists who want to defend ENCODE and the functionality of most of our genome.

The set-up requires you to believe that during the 1990s everyone thought that the only kind of genes were those that encoded proteins. This is not true, but it is a misrepresentation of the truth that seems to be widely believed. I can assure you that knowledgeable scientists have known about genes for ribosomal RNAs and tRNAs for half-a-century and we've known about a host of other genes for functional RNAs for thirty years.

It may be the case that Michael Milligan and Leonard Lipovich were ignorant of non-protein-coding genes until very recently but it's not fair to imply that this misconception was shared by most knowledgeable scientists.

The reference (White et al., 1994) was not something I recognized so I tried to look it up. After a bit of searching I realized that the order of authors was incorrect and the real reference is Fields et al. (1994). It's a News & Views article in Nature Genetics entitled "How many genes in the human genome?" The authors are from Craig Venter's company private research institute (The Institute for Genomic Research, TIGR) and they include Craig Venter. At the time, TIGR was trying to determine the sequences of human genes.

Fields et al. know that defining the word "gene" is important so they say ...
Counting genes requires being clear about what counts as a gene. "Gene" is a notoriously slippery concept, and differing notions about what it means to identify one can lead to heated disagreements. Some define a gene physically as a region of DNA sequence containing a transcription unit and the associated regulatory sequences.
They refer to genes for small regulatory RNAs but decide to focus on transcription units that can be translated into proteins in the rest of their discussion.

It's not clear to me why Milligan & Lipovich use this reference to bolster their claim that "classical" definitions of genes focus on genes that encode proteins unless they mean that Fields et al were aware of the proper definition of gene but decided to restrict their count to protein-coding genes. (See What Is a Gene? for a more thorough discussion.)

Milligan & Lipovich continue the Introduction with ...
The question of how many genes the human genome contains has been an evolving point of contention since before the Human Genome Project. In 1994, the estimated total human protein-coding gene count was 64,000–71,000 genes (White et al., 1994). The higher gene estimate was based on partial genome sequencing, GC content, and genome size. The lower bound of 64,000 took into account expressed sequence tags (ESTs) and CpG islands as additional prediction factors. In 2000, a new count of actively transcribed genes was estimated at 120,000 using the TIGR Gene Index, based on ESTs, with the results from the Chromosome 22 Sequencing Consortium (Liang et al., 2000). 1 year later, Celera arrived at only 26,500–38,600 protein-coding genes using their completed human genome and comparative mouse genomics (Venter et al., 2001). The Human Genome Project, which used tiling-path sequencing as opposed to Celera’s shotgun sequencing, converged on a similar estimate (Lander et al., 2001).
Stories like this have become standard fare in many papers these days. It's an example of the fallacy that if you repeat a lie often enough it becomes accepted as truth. Here's the truth ...
The Milligan & Lipovich paper is also a clear example of laziness. The authors (Milligan & Lipovich) are satisfied with repeating a myth instead of doing their own research into what knowledgeable scientist really thought about the number of genes in the human genome.

At least in this case the authors have read an "ancient" paper from 1994. It's the Fields et al. paper that I talked about above only they refer to it as White et al. (1994). It's actually a pretty good paper on the number of genes. They discuss estimates ranging from 14,000 to 100,000 recognizing that the problem was difficult. Unfortunately they don't discuss any of the genetic load predictions.

Fields et al. (1994) figure there are between 60,000 and 70,000 protein-coding genes in the human genome. But just because some people thought that there were so many genes doesn't mean that this was the value universally accepted by all knowledgeable scientists.

By the time the complete draft human genome sequences was published we already knew the sequences of chromosomes 21, and 22 and the gene frequency in these chromosomes gave rise to predictions of 40,000 to 45,000 genes in the whole genome (see Aparicio, 2000). These were likely to be overestimates since both of these small chromosomes are rich in genes compared to the rest of the genome. (At the time we didn't know that the algorithms for counting genes returned many false positives.) That means that the gene count was approaching the numbers estimated earlier (about 30,000, if you only count knowledgeable scientists).

I find it interesting that Milligan & Lipovich take a different view of the history, saying that the estimates from chromosomes 21, and 22 predicted 120,000 genes. Their reference is Liang et al., (2000). It's true that Liang et al. worked at TIG and it's true that their estimate was 120,000. However, that paper is in the same issue of Nature Genetics as the Aparico (2000) paper I just quoted and two papers by Ewing and Green (2000) and Roest Crollius et al. (2000). The Ewing and Green estimate is 35,000 genes. The Roest Crolius et al. estimate is 28-34,000 genes. The papers were part of an issue on "The Nature of the Number."

So even if your version of ancient history only extends back to 1994, it's clear that by 2000 (one year before publication of the draft human genome sequence) most knowledgeable scientists—even those who were ignorant of the real ancient history from the 1960s—were thinking that the human genome had about 30,000 genes.

You may be wondering, as I did, why Milligan & Lipovich want to make a point about historical estimates of gene number when we already know the correct answer. I'm not sure why they think it's important. Clearly it's not important enough for them to have done a critical job of describing that history. Based on what I've seen in other papers, this sort of introduction seems designed to show you that there is a lot of "missing information" in the genome since scientists were expecting many more genes.

These are estimates of protein-coding genes. That's not because knowledgeable scientists didn't know about any other genes, it's because recognizing genes for functional RNAs is much more difficult. Samuel Aparicio explained it very nicely 15 years ago (Aparicio, 2000) ...
Although the tendency (especially in a pay-per-sequence access mode) is to assume that any transcript represents a gene, classical genetics demands some evidence of associated function. Crucially, what is not yet established (but is implied to be relatively abundant by these studies) is the extent of biological "noise" in the transcriptome of any given cell. In other words, what fraction of transcripts which can be isolated have any meaningful function? What fraction might be mere by-products of spurious transcription, spuriously fired off, perhaps on the antisense strand from promoters or CpG islands associated with protein coding genes (as seems to be the case with a number of imprinted genes)?
Lots and lots of scientists have expressed this cautionary view but no matter how many times it's published there are many more scientists who ignore the warning and continue to ignore it to this day. It's not a question of whether, in your opinion, the transcripts are functional in spite of the potential problems, it's that too many scientists won't even recognize that there's a problem.

Let's see the next paragraph in the Milligan & Lipovich paper.
Following the sequencing of the human genome, focus has shifted toward understanding gene function. In 2005, the FANTOM (Functional Annotation of Mammals) Consortium determined that the mouse genome harbored more non-coding genes than coding genes (Carninci and Hayashizaki, 2007). In a parallel project to FANTOM, the ENCODE (Encyclopedia of DNA Elements) Consortium began exhaustively surveyed the epigenetics and regulation of the whole genome (Birney et al., 2007; Consortium ENCODE Project, 2012). ENCODE’s continuing effort to recount human genes (GENCODE) using the study of genetic landmarks indicative of transcription and next generation sequencing has allowed them to arrive at a current total of just under 58,000 genes as of 2013 (gencodegenes.org). Of these 58,000 genes ENCODE only defines approximately 20,000 genes as coding, with almost all of the other genes being classified as pseudogenes and non-coding RNA (ncRNA). Early studies of the mouse transcriptome by the FANTOM Consortium first motivated the redefinition of a gene into a transcriptional unit as a consequence of large numbers of lncRNA genes discovered (Carninci et al., 2005).
Things are beginning to fall into place in this paper. The authors want you to believe that historical gene number estimates were much higher than the actual number of genes observed when the human genome sequence was published. That's because scientists thought that the only kind of genes were those that encode proteins, according to the myth. However, recent discoveries by ENCODE and FANTOM show that those scientists were wrong and there are actually genes for noncoding RNAs. Futhermore, those RNA genes outnumber the protein-coding genes by a large margin (38,000 to 20,000).

The caution expressed by Aparicio, and many others, is ignored. The rest of the paper consists of reviews of lncRNA functions and pseudogene functions. With respect to lncRNAs, there's no discussion of whether these lncRNAs represent "noise" and no critical review of the case for function. Even lack of conservation doesn't phase Milligan & Lipovich because these nonconserved genes for lncRNAs are still exaptive—they can easily become important functioning genes. As reservoirs for future change, they are "not disposable even when adaptation doesn't govern their existence."

Contrast this biased review with a review of lncRNAs published by my colleagues Alex Palazzo and Eliza Lee in the same journal a month earlier (Palazzo and Lee, 2015). They review the literature with a critical eye and conclude that ...
The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as “intergenic RNA,” “long non-coding RNAs” etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here, we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.
I know for a fact that the Palazzo and Lee manuscript was reviewed by a number of knowledgeable and skeptical scientists before it was sent off. They even sent it to an old curmudgeon who criticizes everything.1

The question is, why didn't the Milligan & Lipovich paper get the same scrutiny before they sent it off to the journal?

The other part of the Milligan & Lipovich paper discusses possible functions of pseudogenes. Again, there's a remarkable lack of critical thinking. The only case presented is the case for function. There's no attempt whatsoever to critically analyze and defend their claim in the abstract and introduction that "... the prevalence of non-protein-coding functional elements in the human genome has emerged as a key revelation in post-genomic biology." It's a classic case of confirmation bias and this isn't supposed to happen in the scientific literature, especially in reviews.


1. They didn't need to change any of their main points in response to reviewers because they already knew how to read and interpret the literature correctly.

Aparicio, S.A.J.R. (2000) How to count… human genes. Nature Genetics, 25:129-130. [doi:10.1038/75949]

Ewing, B., and Green, P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet. 25:232-234. [doi:10.1038/76115]

Fields, C., Adams, M.D., Whte, O., and Venter, J.C. (1994) How many genes in the human genome? Nature Genetics, 7:345-346. [PDF]

Liang, F., Holt, I., Pertea, G., Karamycheva, S., Salzberg, S.L., and Quackenbush, J. (2000) Gene Index analysis of the human genome estimates approximately 120,000 genes. Nat Genet, 25:239-240. [doi:10.1038/76126]

Milligan, M.J., and Lipovich, L. (2014) Pseudogene-derived lncRNAs: emerging regulators of gene expression. Frontiers in Genetics, 5: [doi: 10.3389/fgene.2014.00476]

Palazzo, A.F., and Lee, E.S. (2015) Non-coding RNA: what is functional and what is junk? Frontiers in Genetics, 6:2 [doi: 10.3389/fgene.2015.00002]

Pertea, M., and Salzberg, S. (2010) Between a chicken and a grape: estimating the number of human genes. Genome Biology, 11:206. [doi:10.1186/gb-2010-11-5-206]

Roest Crollius, H., Jaillon, O., Bernot, A., Dasilva, C., Bouneau, L., Fischer, C., Fizames, C., Wincker, P., Brottier, P., Quetier, F., Saurin, W., and Weissenbach, J. (2000) Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet, 25(2), 235-238. [doi:10.1038/76118]

60 comments :

Jonathan Badger said...

The authors are from Craig Venter's company (The Institute for Genomic Research, TIGR) and they include Craig Venter. At the time, TIGR was trying to determine the sequences of human genes.

TIGR wasn't a company -- you may be confusing it with the later Celera, which was. TIGR was a non-profit research institute (it was the forerunner of the current JCVI).

Larry Moran said...

Thanks. I wasn't confusing it with Celera but I thought it was organized as a company.

Donald Forsdyke said...

Given bases A, C, G, T,
All linked up to infinity,
And given, little, little, me,
Confined to a set, not free.
There’s a need for strategy,
Predatory others for to see.

Predators’ sequences differ from me
One might read G,T,C,C,A,C.
All I need is G,T,G,G,A,G.
Complement catches, now you, I see.

No use burying head in sand,
With you I’ve formed a double strand!
Now I’ll put you in your place,
Then bestow the coup de grace.

Lucky I had GTGGAG around
The bell of doom for you to sound.
With me you found no bonne homie,
But GTGGAG’s where in economy?

My genome set is far from free,
Minute compared with the infinity,
From which predators unpredictably,
Emerge to challenge, challenge, me!

Sometime GTGGAG perchance,
Plays a role in my life’s dance.
But given foes’ infinity,
Need more to serve my liberty.

Seems a burden, but I’m glad,
My genome set, for to add,
Multitudes like GTGGAG,
To double-strand with enemy.

Tortoise-like, I’m no more nimble,
Forsaken genome that’s like thimble.
No longer genome lean and svelte,
For stalling pathogen, my cards were dealt.

Lumbering genome, quite a hunk,
Extra sequence some scoff as junk.
But I know that one day,
Thankful I’ll be f’that DNA.

Shuffled and variable within nations,
Self DNA tuned o’er generations,
Not to double-strand with me,
Only with those that don’t agree.

And subtle, subtle, even more,
Pathogen strategy does not ignore,
It morphs along with stepwise stealth,
To close approach that which is self.

DNA tuned for most hostilities,
Needs match finite possibilities.
Sequences near, but not quite, self,
Here’s where junk has mostest wealth.

Pathogen that inward flies,
Adopting a near-self guise,
Rapid brought to its senses
Cannot o’erleap host defences,

peer said...

"The authors are, of course, entitled to their opinion but they are not entitled to state it as if it were a fact. "

Why not? How many authors out there, knowing nothing about biology, claim evolution is a fact? Still they, and you, cannot even point out the ancestor for chimp and man.

Of course I fully agree to your opinion, but it should account for all disciplines of science.

Mikkel Rumraket Rasmussen said...

What was the name of your great great great great great great great great great great great great great great grandfather, where did he live and how many children did he have?

peer said...

This simply illustrates that you really do not have clue, right?

judmarc said...

Still they, and you, cannot even point out the ancestor for chimp and man.

Ah, so you mean they don't pretend to know the answer when the evidence is not there?

What a novel idea. Someone should really take this up and call it something, like maybe "science."

Eelco van Kampen said...

Of course Mikkel does have a clue. You can't even name your great great great great great great great great great great great great great great grandfather, so what about a million times 'great' ?

And indeed: to date, no fossil has been identified as a potential candidate for the CHLCA. Not surprising, and a pity too, of course.

peer said...

Eelco is the mindreader of Mikkel, like he was able to review my book before it had been released.

He's a psychic...

peer said...

As a matter of fact, I described the ancestor of chimp and man in my book...Eelco. You missed it while mind-reading it?

peer said...

Frontloading and the new biology...

With 100 mutations per genome, the copying fidelity (CF) is 1- 1/30.000.000 = 1-0.00000003 = 0.99999997

If copyig fidelity had to evolve gradually, as Moran and his followers here believe, wat would happen with organisms having a CF of say 0.90 or 0.99?

They would die because of genomic meltdown within <10 generations, resp. <25 generations.

Genomes had to start with nearly perfect CF.

There is no way out. Genomes were frontloaded.

Frontloading is the new scientific theory from which we can understand biology.

Eelco van Kampen said...

You wrote a book, Peter ? No way !

Eelco van Kampen said...

Review ? I did not write a review - you keep on saying that.
I wrote a one-line warning, which still very much applies.

peer said...

As you know Eelco, my book was published originally in english, sequentially in the JoC, a peer-reviewed science journal.

Then in dutch.

Soon it will also be availabe in german. Kutschera will be my next opponent, for sure.



Eelco van Kampen said...

I had no idea, Peter. No idea at all ! You never ever told anyone about your book, as you are such a modest person.

" ... the JoC, a peer-reviewed science journal"
Ah, your sense of humour is still alive !

peer said...

A warning not to buy the book...! Isn't that incredible, dear readers?

Why would he do such folly?

Because my book scientifically demonstrated where and how biology falsifies Darwinian filosofies, such as universal common descent and random mutations as the engine for evolution.

In other words, Eelco, was afraid that my book might lead to more Darwin-scepticism (...which is indeed what followed after the release of it).

So he warned potential readers...

Pathetic !

Eelco van Kampen said...

Wrong again: it was not a warning not to buy the book (I couldn't care less).

It was a helpful warning about the category, as the bookseller listed it as a science book, which of course it isn't.

Have you actually read my warning ?

Piotr Gąsiorowski said...

There has been a small misunderstanding. JOC (the Journal of Organic Chemistry) is a peer-reviewed scienve journal (IF=4.721). JoC (the Journal of Creation) is an cargo-cult imitation of a journal (IF=0.000). Not a very convincing imitation, but it has bright colours and a video on its website.

Eelco van Kampen said...

JoC seems cursed, though - here is another one of those: http://journalofcosmology.com/

Mikkel Rumraket Rasmussen said...

Peer, it is time to take your meds. Please let the nearest adult handler know your "condition" has been making you make a mess on the internet again.

Anonymous said...

Transforming the errors per genome into a percent is a distraction. Given an error rate of 1/30,000,000 bp, How many errors per genome should we expect if the genome is 100,000 bp long? 50,000 bp? How big do you think genomes would have been in primitive life forms? How big would the populations be if primitive life forms were mostly self-replicators? Would population sizes have anything to do with whether a genome "melts down" or not? What about the amount of such genomes that has functions?

How's that working for you now?

Frontloading is self-delusion, not a scientific theory.

I suspect that you think that the theory of evolution proposes that everything evolved "gradually," that such thing means from absolute zero to whatever we have now, that you think that evolutionary theory proposes that everything, even replication fidelity, evolves independently from nothing for each species. I suspect that you think one way about what evolutionary theory is one second, a very different one the next second. Typical of creationists. I might be wrong, but the "simplicity" of your "argument" shows that you might have some, if not all, of these assumptions in mind.

judmarc said...

Frontloading is the new scientific theory from which we can understand biology.

But apparently not grade school arithmetic.

Let us say in accordance with your, ehm, "theory," that we have bacteria whose genomes must contain not only all the information the bacteria themselves need to function, but all the "frontloaded" information specifying all the species to come. As we go through the species, we need progressively less and less frontloading, until we come to humans, who need least of all. If we are to believe you, humans have no junk DNA. Thus bacteria would need genomes much, much larger than humans. Therefore we look at genome size and - oh, whoops!

My condolences on the sad demise of your theory.

Anonymous said...

Larry said
"Unfortunately, these are the exceptions, not the rule"
Really? The majority of scientific papers on this topic are poor?

Decades ago when I was an undergrad to took a grad class on bad papers. Most of those dealt with technique. For example a paper that claimed to show DNA in peroxisomes neglected to do a simple DNase control. I'm thinking now that one could do such a class but expanded to cover bad scholarship and poor writing. Sometimes one can learn more about a process by studying bad examples rather than brilliant ones.

Anonymous said...

Photosynth said
" How big do you think genomes would have been in primitive life forms? How big would the populations be if primitive life forms were mostly self-replicators? Would population sizes have anything to do with whether a genome "melts down" or not? What about the amount of such genomes that has functions?"

I think more important than all of that is whether the primitive life forms would have had recombination. Then they could have recombined out bad mutations from a lineage.

Anonymous said...

I really don't think recombination would have been important in early life forms. Replication would sometimes lead to exact copies, sometimes different copies that still worked, and sometimes different copies that didn't work. Those last would die (or fail to reproduce), weeding out bad mutations from the lineage.

Diogenes said...

Journal of Creation will be a "peer-reviewed science journal" when monkeys fly out my ass. Name one discovery, just one, first published in that tripe.

Like all creationist journals, they state outright that they will not publish articles that challenge the "creation model." All articles are thus organized around a hypothesis they seek to make non-falsifiable. Nope, not science.

Diogenes said...

No Peer, it's really very simple. For the earliest self-replicators, being small and simple, could have huge population sizes and reproduce very quickly. With huge population sizes and super-fast replication speeds, super-high copy fidelity would not be needed.

Diogenes said...

Larry, wonderful post, well-written. I'm just sorry that you've been forced to write basically the same post over and over criticizing terrible post-ENCODE papers.

Diogenes said...

No Peer, it's really very simple. For the earliest self-replicators, being small and simple, could have huge population sizes and reproduce very quickly. With huge population sizes and super-fast replication speeds, super-high copy fidelity would not be needed.

steve oberski said...

Hey Peer,

For what it's worth, I wouldn't have bought the book, with or without a "review".

This is based on your reputation as an IDiot troll, as you have so amply demonstrated and continue to demonstrate.

Anonymous said...

Diogenes,
It seems to me that replication speed, population size and number of progeny is not relevant for this problem, that just leads to more copying errors faster. I think the key is how tolerant the early replicators were to mutation - how large was the functional space for those replicators- and whether they could recombine, which is don't think is far-fetched in an RNA world. If they could recombine there would always be some lineages that would recombine out harmful mutations, no matter how high the mutation rate.

Anonymous said...

Iantog,

It seems to me that replication speed, population size and number of progeny is not relevant for this problem, that just leads to more copying errors faster.

Nope. I agree that tolerance to mutations, functional/sequence space, and recombination are important, but recombination becomes important mostly after selection purges out a lot of the crap (recombining damaged genomes with advantageous ones might not be very productive). Population genetics theory helps a lot understanding why population size is an important factor, besides genome size, etc etc.

The Other Jim said...

Also, in a small "genome", back-mutation of specific sites would become more common.

Eelco van Kampen said...

Worse - 'Journal of Creation' has a clear 'statement of faith' that the editors (and thus the authors) need to adhere to (http://creation.com/journal-of-creation-writing-guidelines ). That makes it a religious publication, not a science journal.

peer said...

"Transforming the errors per genome into a percent is a distraction."

Of course not. It is a mathematical presentation of genome replication fidelity.

And because it does not serve the Darwinian-evolutionary paradigm it is distraction.

Copying Fidelity can simply be presented as errors/total sequence/generration. It depens heavily upon proofreading- en repairmechanims and CF will decrease with decreasing proofreading and repair enzymes.

CF must have been almost perfect from the start.

peer said...

For the earliest self-replicators, being small and simple, could have huge population sizes and reproduce very quickly. With huge population sizes and super-fast replication speeds, super-high copy fidelity would not be needed.

We can summarize this as: blablabla. Non-science. Not a skerrick of scientific evidence for this blablabla. Are you as scientist sticking to scientific facts? Surely not. Just blabla.

peer said...

"Let us say in accordance with your, ehm, "theory," that we have bacteria whose genomes must contain not only all the information the bacteria themselves need to function, but all the "frontloaded" information specifying all the species to come. As we go through the species, we need progressively less and less frontloading, until we come to humans, who need least of all. If we are to believe you, humans have no junk DNA. Thus bacteria would need genomes much, much larger than humans. Therefore we look at genome size and - oh, whoops!"

Bacteria strive for simple and streamlined genomes. Loss of infromation and unused genes is what is evident from Lenskis experiments. Frontloading predicts this.

Further, you describe the general frontloading theory. I personally stoch to the special frontloading theory, which holds independent origin of several forms of life.

The hypothetical LUCA is also perfectly in accord with frontloading, having the same genes manyfold over (Whitfield, “Origins of life: Born in a watery commune,” Nature 427, 674 — 676 (19 February 2004).

Fontloading is the only viable evolutionary theory.

peer said...

Furthermore, as evident from PCR experiments, the most simple replicator will win the reproduction race for survival, not the more complex.

Do you even understand the primary principle of your beliefsystem?

AllanMiller said...

I've never quite understood the 'meltdown' argument on the grand scale. It seems to assume that the only things that happen are deleterious mutations and Muller's ratchet. If replicators continue to be produced, selection will deal with the lineages that aren't very good at it, leaving the rest. If (say) 5 mutations is a threshold, and the world has ratcheted up to be full of 4-mutation individuals, then in that genomic context, a 5th is lethal, when it would have been merely detrimental had it happened sooner. So ... the world belongs to those that don't produce that lethal mutation, or compensate in some other way.

A mechanism that can extinguish one lineage does not have the power to extinguish them all assuming the replication exponent was ever >1 - it's not as if we'd ever be likely to get to a stage where every mutation in every lineage is lethal.

peer said...

"selection purges out a lot of the crap (recombining damaged genomes with advantageous ones might not be very productive"

...without a nearly perfect CF, you cannot talk about genomes. Genomes depend on a nearly perfect CF.

Why don't you guys set up an experiment, instead of all the blabla.

Take a plasmid containing one functional gene. Amplify it using PCR with and without proofreading enzymes...

Check the functionality of the gene after every tenth amplification round.

Surely, the one without proofreading degenerates with the speed of light.

The other less fast.

You may as well do the reverse. Take a plasmid containing a nonsense code...and amplify it in a PCR. Will we ever observe any functionality of the nonsense gene?

AllanMiller said...

Furthermore, as evident from PCR experiments, the most simple replicator will win the reproduction race for survival, not the more complex.

In a selective environment that rewards shorter genomes, shorter genomes will have the advantage. So what?

AllanMiller said...

Take a plasmid containing one functional gene. Amplify it using PCR with and without proofreading enzymes...

Check the functionality of the gene after every tenth amplification round.


So you completely remove selection? You think that's a valid protocol?

peer said...

Allen Miller,

selective extinguishing of lineages loaded with slight deleterious mutations is known as truncated selection. It does not work, because the input of nearly neutral mutations far exceeds the power of truncated selection.
The point is also redundancy, in particular in 2n slow reproducing organims, so that selection simply cannot see the nearly neutral mutations (compensation).

You must believe that selection is a sort of god (it is a God-substitute indeed), having the power to do everything and anything. For a belief that is okay, however. But as scientsits we would like to have some proof. Isn't it?

The science evidence shows that selection can mitigate the decay of genomes. It is like copying a page from a book. After several round of making copies of copies, the text will become less and less readible. Of course you are allowed to pick out (select) the best readible copy and start again making copies using this one as the original, but eventually copies of copies will be not readible.

Loss of information through random mutations is a law of life.

It is so easy to understand. Only those commited to the selectiongod will be blind for the obvious.

Anyway...it's your life.

peer said...


"So you completely remove selection? You think that's a valid protocol?"

We were discussing copying fidelity (CF). Yes, for CF this is a valid protocol.

Selection has to be determined in another way. Such as what I proposed on making copies of copies. Select the best and than make again copies of copies. Select the best, etc. It slows down genomic entropy, the loss of information.

peer said...

Larry, wonderful post, well-written. I'm just sorry that you've been forced to write basically the same post over and over criticizing terrible post-ENCODE papers.

I get your point. Before ENCODE there was hope for the Darwinians and a belief for the atheist. After ENCODE there is no hope for Darwinians and the atheists belief system is shattered.

Or as Graur said: If ENCODE is true, then evolution is wrong.

That is what it is all about. A belief system.

peer said...

"In a selective environment that rewards shorter genomes, shorter genomes will have the advantage. So what?"

Apparently, you do not understand that selection is nothing but differential reproduction?

The entity that leaves the most copies will always make up the population. This is another law of living systems. It does not tell us anything about information, neither about complexity.

Only if you link an increase of information to increased reproduction, Darwinian evolution is possible. Biology shows us the opposite. Increased complexy is linked to reduced reproduction.

Get it? Nothing in Darwinism makes sense in the light of biology.

Ed said...

Peer says:
"I personally stoch to the special frontloading theory, which holds independent origin of several forms of life."

LOL Special frontloading... now the religious monkey comes out of the sleeve, special creation of humans obviously. YEC meets ID.

AllanMiller said...

You must believe that selection is a sort of god (it is a God-substitute indeed), having the power to do everything and anything. For a belief that is okay, however. But as scientsits we would like to have some proof. Isn't it?

Don't be silly. Selection is differential reproduction correlated with genotype. How much proof of that do you need?

Selection will remove any lineage that exceeds some imaginary 'meltdown' threshold. That leaves all the rest. There is no case for saying that all lineages everywhere should wink out.

AllanMiller said...

"So you completely remove selection? You think that's a valid protocol?"

We were discussing copying fidelity (CF). Yes, for CF this is a valid protocol.


So you can show that non-faithful replication without selection produces an excess of non-faithful copies? Stunning. Get on the phone to Nature; they are gagging for material of this quality.

judmarc said...

Bacteria strive for simple and streamlined genomes.

Do they wear t-shirts and spandex while they "strive"?

Further, you describe the general frontloading theory. I personally stoch to the special frontloading theory, which holds independent origin of several forms of life.

Ah, of course - independent origin, so there is no necessity for frontloading!...oh, wait.

AllanMiller said...

"In a selective environment that rewards shorter genomes, shorter genomes will have the advantage. So what?"

Apparently, you do not understand that selection is nothing but differential reproduction?


What makes you think I don't appreciate that fact? In an environment where shorter genomes produce more descendants ... guess what will happen?

AllanMiller said...

Loss of infromation and unused genes is what is evident from Lenskis experiments. Frontloading predicts this.

So gradually, the frontloaded genome has been shorn of its information to make modern [insert arbitrary clade here]? Must have had a sodding big genome once. Easily outcompeted by its striving descendants.

Eelco van Kampen said...

Of course Mikkel does have a clue. You can't even name your great great great great great great great great great great great great great great grandfather, so what about a million times 'great' ?

And indeed: to date, no fossil has been identified as a potential candidate for the CHLCA. Not surprising, and a pity too, of course.

peer said...

The frontloaded genomes have by now all fallen apert is several different species. Check out the Cichlids in the African Rift Valley lakes...

About 12000 years the Vicoria lake was dried up. Within 6000 generations 500 species emerged...filing all niches, from vegetarions to parasites to predators. New genetic information is not involved. We only need a novel regulatory context of preexsiting information.

That is explained by frontloading theory. There is no need for millions of years Darwinian selection and acumulation of genetic noise. Darwinian evolution of novel codes is not even possible within such time frames.

Frontloading will soon be the new standard.

peer said...

"Must have had a sodding big genome once. Easily outcompeted by its striving descendants. "

Ask Moran about the onion genome...

peer said...

"LOL Special frontloading... now the religious monkey comes out of the sleeve, special creation of humans obviously. YEC meets ID."

Ed, I was refering to "general" and "special frontloading" in the same way as "general" and "special relativity"...

Considering the biological facts, special frontloading better suits the observations.


Eelco van Kampen said...

"Further, you describe the general frontloading theory. I personally stoch to the special frontloading theory, which holds independent origin of several forms of life."

and

"Ed, I was refering to "general" and "special frontloading" in the same way as "general" and "special relativity"..."

This is getting sillier by the day ... how on Earth would that be 'in the same way' ???

Please elaborate - it's Friday, so you are allowed to make a fool of yourself (again). You do that so well, after all.

Ed said...

"Ed, I was refering to "general" and "special frontloading" in the same way as "general" and "special relativity"..."

Yes, tbh this explanation really doesn't surprise me. With your bloated ego, it's no wonder you like to identify yourself with the BIG names in science. General and special frontloading... sjeesh.
The publisher of your book in the Netherlands has this banner on it's site where they compare you with Einstein and Darwin. The banner is about an campaign on 'fair science', i.e. schools should be allowed to teach (drums please)... creationism instead of evolution science.( I bet you didn't see that one coming!)
So yeah, it doesn't surprise me that your ego is bloated as it is, and your religious agenda was also very obvious.

Anonymous said...

Did you notice guys? "Peer" ignored everything explained to him just to keep believing that he's right. He's a self-deluded fool. Nothing will make him listen because he doesn't want to.

Diogenes said...

Peer, some people here *worked for* or advised ENCODE. I know people who advised ENCODE. Our own Georgi Marinov, to name one, *works for* ENCODE and he's the first to admit that ENCODE got it wrong: that the claim of 80% functionality was false by every pre-existing definition of "function."

Graur was right: if ENCODE is true, evolution is wrong. But people who *work for* or advised ENCODE acknowledge that ENCODE's functionality claim was wrong.

Should I believe them or a nobody religious fanatic pushing creationism, which the publisher of your book in the Netherlands says is what you're selling?

(Georgi might add two caveats: the ENCODE database might be useful to scientists somewhere down the road, even if the 80% functionality was bullshit; and the 80% would be true if we redefine "function" = interacts with any molecule at any time.)

Your creationism never had any scientific hopes to begin with. All the hopes of creationism were pinned on religiously-motivated political successes. Everything "scientific" in IDcreationism was fraud, hoaxes and non-falsifiable statements like "special theory of front leading."