More Recent Comments

Monday, November 26, 2018

Deflated egos and the G-value paradox

The Deflated Ego Problem refers to the fact that many scientists were very disappointed to learn we had less than 30,000 genes. Those scientists were expecting that the human genome would contain many more genes in line with their belief that humans must be genetically more complex than the "lower" animals. They should have known better since knowledgeable experts were predicting fewer than 30,000 genes and these same experts knew that humans don't need many more genes than other animals [see: Revisiting the deflated ego problem].

Disappointed scientists don't use the term "deflated ego;" instead they refer to their problem as the G-value paradox. This makes it seem like a real problem instead of just a mistaken view of evolution.

I just read the original paper on the G-value paradox and I think it's worth quoting some of it because it clearly states the issues. The paper is by Matthew Hahn and Gregory Wray when they were both at Duke University in Durham, North Carolina, USA. (Hahn was a graduate student in Wray's lab, he is now a professor at Indiana University.) (Hahn and Wray, 2002). It's important to note that these are respected scientists and their views are still very popular.

They begin their brief review by expressing surprise that the human genome had "a surprisingly modest 31,000 genes." Then they explain why this is a problem.
Even though sequencing the human genome may be merely a first pass at a deeper understanding of our biology, one fact stands out as demanding an immediate explanation: Why do humans have so few genes?

The assumptions and chauvinism implicit in this question—that humans are vastly more complex than the other fully sequenced eukaryotes and should therefore have a commensurately larger suite of genes—are difficult to argue clearly and may be even more difficult to justify biologically. Still, it is hard to deny our intuitive perception that the number of genes in a genome should be roughly correlated with complexity and that organismal complexity can be ranked as yeast < nematodes < flies < humans (we reserve judgment on the relative position of the “green fly,” Arabidopsis). However, the number of genes in the genomes of these organisms does not match our naive expectation.
The is a perfect description of the Deflated Ego Problem—the number of genes did not match their naive expectation. Normally when a scientific result doesn't match your expectations—and you recognize that your expectations were "naive"—that would cause you to reevaluate your expectations, especially when you learn that other scientists did not share them.

That's not what happened in most cases. What usually happened is that disappointed scientists attempted to justify their naive expectations and then propose solutions that still make humans more genetically complex even though they have the same number of genes as other animals.

Here's what the justification looks like.
This disjunction between the number of genes and organismal complexity, what we call the “G‐value paradox,” parallels the finding during the 1950s that the physical size of genomes does not correlate with organismal complexity, a relationship known as the C‐value paradox. The finding that much of the genome contains noncoding repeats and “junk” DNA seemed to resolve the C‐value paradox. Implicitly, this resolution rested on the assumption that once noncoding DNA was taken into account, the total number of genes would then correlate with organismal complexity (Cavalier‐Smith 1985). However, the published G values of the completely sequenced eukaryotes make it clear that we have not yet resolved the C‐value paradox—it has merely given way to the G‐value paradox.
Do you see what they just did? They tried to convince you that other scientists were also expecting there to be more genes because humans are more complex. This "problem" even has a name: it's called the G-value paradox.

What they did not do in their paper was to mention that many knowledgeable scientists were predicting fewer genes and their predictions were backed up by solid evidence (e.g. Ewing and Green, 2000; Roest Collius et al., 2000). Nor did did they mention the idea coming from evo-devo that complexity doesn't correlate with the number of genes but with differential control of a core set of genes. I'm struggling to understand why there's so much resistance to this key concept that's strongly supported by data from a number of model organisms.

Here's what the "solution" to the G-value paradox looks like.
Just as the discovery of noncoding DNA seemed to resolve the C‐value paradox, so a few simple observations may in time resolve the G‐value paradox. These observations all attempt to give more value to each of our genes and thus to give us a more accurate genomic predictor of organismal complexity by identifying the true measure of information encoded by a genome, the "I‐value." Some of the observations we discuss here have been offered as the answer to explaining our modest number of genes, whereas some have been invoked in combination. These observations indicate that the evolution of organismal complexity will typically involve changes in the genome that are subtler than simply adding genes. The C‐value paradox was resolved by a plea to the G value; a resolution of the G‐value paradox may be offered by a plea to the I value.
The most common ways of giving "more value" to existing genes, according to Han and Wray, are alternative splicing, and posttranslational modifications. Both of these possibilities give you more bang for the buck with the same number of genes because each gene can produce multiple products.

We now know that neither process causes a significant increase in the number of gene products but, even in 2018, it's still widely believed that this is the answer to the G-value paradox.

As I mentioned above, I'm having difficulty understanding why the G-value paradox was created in the first place but that difficulty pales in comparison to my difficulty in understanding why it remains so popular in 2018. Perhaps my readers can help me out. Do you have a problem accepting that humans have about the same number of genes as other mammals, fish, or insects? If so, can you explain to me why you think this is a problem that demands a solution?

Ewing, B., and Green, P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet, 25:232-234. [doi:10.1038/76115]

Hahn, M.W., and Wray, G.A. (2002) The g-value paradox. Evolution and Development, 4:73-75. [doi: 10.1046/j.1525-142X.2002.01069.x]

Roest Crollius, H., Jaillon, O., Bernot, A., Dasilva, C., Bouneau, L., Fischer, C., Fizames, C., Wincker, P., Brottier, P., Quetier, F., Saurin, W., and Weissenbach, J. (2000) Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet, 25:235-238. [doi:10.1038/76118]


  1. When I was a lowly undergraduate in the late 1970s, we discussed this issue in my genetics class…and the professor was quite clear that we might have a few more genes than Drosophila, but not by much, and there was no good reason to expect humans to have significantly more genes than other animals.

    I wonder where the people who are surprised by reality went to school.

  2. Note that Han & Wray also confuse "noncoding" with "junk", using the former term when they mean the latter.

  3. Modern creationism would welcome people having the same number as critters because it makes sense from a common design stance. God would do it that way. not give different numbes just to make a point. All in biology works well with the same number. its a good idea.
    it shows biology is just twisted/tweeked into its diversity but is still from a (simple?) blueprint.
    In fact having different number of genes etc would be more welcome in a idea of evolutionism being random and selecting .and drifting, and generally as if from chance.
    As research/people do better things look better for a thinking being behind nature. I'm sure more to come.

    1. A convincing argument. I'm sure the one explaining the wildly varying amounts of junk DNA in different species is just as convincing.

    2. "Modern creationism would welcome people having the same number as critters because it makes sense from a common design stance. God would do it that way. "

      Okay, so since he clearly didn't, God can't have been responsible. Right?

    3. Whenever evolutionary biologists discover some interesting new feature the response of creationists is always the same. The know the mind of god(s) so they are not surprised. After all, that's clearly the way god(s) would do it.

      You can be certain they would give the same response if it turned out that each species has a different set of genes. Apparently their god(s) can be very flexible.

      On the other hand, when it comes to things like the existence of evil, the standard response is that god(s) work in mysterious ways and we can't possibly understand his/her/its purpose.

    4. You can see that inconsistency when they are confronted with bad-design arguments. No, they say, once cannot say that god(s) would not do it that way, one cannot presume to know what they intend. Which is fine, but in the case of junk DNA they make an argument totally inconsistent with that. It cannot be there, they say. Pressed as to why, it's because the Designer they have in mind would not do it that way.

    5. A creationist can only answer it makes sense biology follows laws/order like in physics. So why not a better idea to have a common blueprint for biology and then it fits fine with a common number of genes etc for everyone?
      I think Einstein once said he asked himself WHAT would God do when trying to figure out physics stuff.
      creationists don't discover the equal numbers of genes but we can say AHA that makes sense as opposed to other options.
      I think evolutionism would easily welcome a diversity in gene numbers etc. tHey would say it fits with a mechanism thats founded on chance, or almost.

      Bad -design claims are just wrong ideas about biology.
      There was no bad design but then there was after problems came.
      Every bad-design can be explained especially from a YEC stance.
      This is, by the way, a strange way to preserve evolutionism. Your debunking the opposite option only here.
      Is it true that already thoughtful evolutionists realize they can't make a biological scientific case to justify evolution as a theory or hypothesis??

  4. 31,000 genes permit 31,000! simple gene interactions, not to mention higher order interactions and the variable non-coding substrate on which these interactions may be modulated. There is enough complexity in flies, snails, or men to occupy molecular biologists at least several centuries.

  5. Do these guys define the 'complexity' they want to explain, of do they just take an 'I know it when I see it' stance? Having an explicit definition might make the weaknesses of their arguments clearer.

    1. I'm not so sure. Which makes the weakness of their arguments clearer -- having one or not having one?

    2. The most common definition they use is the number of different types of cells and tissues. Using this definition they claim that humans are among the most complex organisms. They even seem to be more complex than other mammals but I’m not sure how that works.

      The nice thing about their definition of complexity is that humans are more complex than trees so it highlights the problem because we don’t have a lot more genes than trees. The inconvenient truth in their analysis is that plants have just as many transcript variants per gene as humans. I guess they’re all junk in plants but in humans they are examples of alternative splicing that increases diversity and complexity.

    3. I think I read that Norwegian pine trees have >20 gigabasepair genome, and it's mostly retrotransposons.

    4. How unambiguous are the criteria for 'different types of cells' across different kinds of organisms? Is our understanding of tree cell biology sufficient for us to be sure we're applying the same standards?

    5. I haven’t looked closely at the references but my impression is that the data on cell types is okay but not great. It seems to be tilted in favour of humans ‘cause we know a lot more about how many distinct cells are present in human organs than in, say, bird organs or fish.

  6. If you mean all possible pairwise combinations of 31,000 genes, that’s 31,000C2, which is (31,000!)/(2!((31,000-2)!)), which is only 480,484,500. If I remember my math, anyway.

  7. Did you read the paper "Testing the retroelement invasion hypothesis for the emergence of the ancestral eukaryotic cell" (Lee et al., 2018)?

    Here is the abstract:

    Phylogenetic evidence suggests that the invasion and proliferation of retroelements, selfish mobile genetic elements that copy and paste themselves within a host genome, was one of the early evolutionary events in the emergence of eukaryotes. Here we test the effects of this event by determining the pressures retroelements exert on simple genomes. We transferred two retroelements, human LINE-1 and the bacterial group II intron Ll.LtrB, into bacteria, and find that both are functional and detrimental to growth. We find, surprisingly, that retroelement lethality and proliferation are enhanced by the ability to perform eukaryotic-like nonhomologous end-joining (NHEJ) DNA repair. We show that the only stable evolutionary consequence in simple cells is maintenance of retroelements in low numbers, suggesting how retrotransposition rates and costs in early eukaryotes could have been constrained to allow proliferation. Our results suggest that the interplay between NHEJ and retroelements may have played a fundamental and previously unappreciated role in facilitating the proliferation of retroelements, elements of which became the ancestors of the spliceosome components in eukaryotes.

    And here is a particular statement:

    " In some animals, for example, the spliceosome can generate multiple mRNAs through alternative splicings of a single primary transcript, allowing access to additional complexity without a concomitant increase in the amount of coding DNA."

    Link for the paper: