Wednesday, October 23, 2024

Philip Ball doesn't understand sloppy genomes

... anything found to be true of E.coli must also be true of Elephants.
                                                         Jacques Monod (1961)

This version of the famous statement by Jacques Monod comes from 1961 but he said similar things much earlier and other scientists even predate Monod's earliest use of the phrase (Friedman, 2004). He echoed this same idea in Chance and Necessity (p. 102)

The diversity of types remained even so, and there was no getting around the fact that a great many macroscopic structural patterns, radically unlike one another, coexist in the biosphere. A blue alga, an infusorium, an octopus, and a human being—what had they in common? With the discovery of the cell and the advent of cellular theory a new unity could be seen under this diversity. But it was some time before advances in biochemistry, mainly during the second quarter of this century, revealed the profound and strict oneness, on the microscopic level, of the whole of the living world. Today we know that from a bacterium to man the chemical machinery is essentially the same, in both its structure and its functioning.

Monod was making a case for life as a chemical process and he reflected the view of the 'phage group who were studying bacteria and bacteriophage. He argued that all living things would consist of the same basic chemicals such as lipids, nucleic acids, proteins, and carbohydrates. He also assumed that all living things would have similar networks of metabolic enzymes and contain similar pathways. These enzymes would be regulated by similar mechanisms, such as allosteric regulation, and they would be composed of the same 20 amino acids. He expected all living cells would have similar mechanisms for capturing energy and they would obey the fundamental laws of thermodynamics.

He assumed that the genetic code would be universal and that the process of protein synthesis would be essentially the same in all species. He assumed that the fundamentals of transcription and DNA replication would be the same in all species. He imagined that the basic principles of gene regulation that were worked out in bacteria would apply to eukaryotes. This included the action of transcription factors and more unusual regulatory molecules such as the regulatory RNAs discovered in 'phage and bacteria. He expected that genes, regulatory sequences, origins of replication, and other important genetic elements would be found in the DNA molecules of the genome.

This theme of unity of life at the microscopic level was very important but it did not mean that all living things would be identical. Monod was a firm proponent of evolution and since evolution depended on the random occurrence of mutations the actual history of life is unpredictable. There's nothing profoundly upsetting about the fact that elephants have trunks and E. coli doesn't because that's not the point.

I'm sure that Monod was not upset to learn that some genes had introns or that eukarotic chromatin is more complicated than the DNA-protein complexes found in bacteria. He would not have been shocked to learn that many eukaryotes have more functional RNAs than E. coli or bacteriophage λ. Junk DNA was not a problem for someone who understood evolution.

I think Monod reflected the dominant view of most knowledgeable biochemists and molecular biologists of the 1960s and 1970s.

Over the next 50 years we learned a lot more about complex eukaryotes and the dominant theme at the molecular level is that they contain lots of junk DNA and lots of overly complex structures that only make sense in light of evolution. There's a lot of sloppiness in eukaryotes, including genomes full of transposon fossils, aberrant transcription, pseudogenes, inefficient splicing, and promiscuous enzymes. A lot of this sloppiness was apparent in the 1970s, including the fact that junk DNA must contain thousands of ineffective transcription factor binding sites. We learned in the 1980s that some structures, such as the spliceosome, could only have arisen by evolution since no designer in their right mind would have built such a thing.

I would be quite proud to have served on the committee that designed the E. coli genome. There is, however, no way that I would admit to serving on a committee that designed the human genome. Not even a university committee could botch something that badly.                                                          David Penny

I got this quote from Dan Graur who credits it to David Penny as a personal communication. Graur used it in his scathing criticism of ENCODE researchers after they declared the death of junk DNA (Graur et al., 2013). The meaning is clear. The E. coli genome is compact and carries all the information needed to ensure the survival and evolution of the bacterium. It has one copy of most protein-coding genes, two copies of ribosomal RNA genes, and a minimal number of tRNA genes. The regulatory sequences are just big enough for efficient transcription under the appropriate conditions. Many genes are clustered in operons to save space. There's only one origin of replication and one terminator sequence. There's only one chromosome and it is efficiently segregated to each daughter cell after DNA replication and cell division. There are only a small number of regulatory RNA genes in E. coli.

The human genome is a mess. 90% of it is junk and it requires complicated features like centromeres and telomeres. There are 100,000 origins of replication and tens of thousands of pseudogenes. The protein-coding genes are full of useless introns and they take up 40% of the genome even though the functional parts only occupy 1%. Every cell has thousands of incorrectly spliced transcripts. The genome is littered with fossil transposons and viruses and many of them still have partially active promoters churning out junk RNA. Useless transcription factor binding sites and chromatin alterations are ubiquitous. The abundance of junk DNA means that you need tens of thousands of copies of every transcription factor just to make sure the right genes are regulated. A large part of the genome is transcribed but the vast majority of those transcripts are useless junk.

This is why David Penny would not be proud to have served on the committee that designed the human genome. Neither would I, and that's why I spent so much time explaining sloppy genomes in my book. The idea of a sloppy genome is a difficult concept to grasp so I devoted the final chapter (Chapter 11) to the art of coping with this issue.

Now let's look at how Philip Ball handles this information on pages 116-117 of his book How Life Works.

These differences in the relative proportions of coding and non-coding DNA for simpler and more complex organisms reflect fundamental distinctions in how these organisms work. The problem has been delightfully, if inadvertently, stated by theoretical biologist David Penny. "I would be quite proud to have served on the committee that designed the E. coli genome" he has said. "There is, however, no way that I would admit to serving on a committee that designed the human genome. Not even a university committee could botch something that badly."

I'd suggest that can be rephrased: "I can understand how the E. coli genome works. I cannot make any sense of how the human genome works." So the corollary of Penny's comment is rather profound: how E. coli works is not how humans work. But his quip betrays an understandable frustration that the workings of the human genome are inscrutable to us. And I fear that the remark carries the same bias as that which leads us to insist that a foreign language we find difficult to learn is unnecessarily perverse and even absurd.

This shift in perspective challenges a famous statement by Jacques Monod: "What is true for E. coli is true for the elephant." In fairness, Monod had in mind here the notion of how DNA encodes proteins—for indeed it does so in (roughly) the same way in bacteria as in pachyderms, insofar as it uses the same genetic code. But the implication in Monod's comment is that this is what really matters in the same spirit as Crick's Central Dogma. We can now see that Monod's quote is misleading in an important sense, because what matters for E. coli is not the same as what matters for an elephant. The bacterium has a genome dedicated mostly to making proteins. The elephant has a genome dedicated mostly to making noncoding RNAs with regulatory functions. To truly understand how the elephant—and the human—works, we need to untangle the mechanisms governing this regulation.

As Morris and Mattick say,

It appears that we may have fundamentally misunderstood the nature of the genetic programming in complex organisms because of the assumption that most genetic information is transacted by proteins. This may be largely true in simpler organisms, but is turning out not to be the case in more complex organisms, whose genomes appear to be progressively dominated by regulatory RNAs that orchestrate the epigenetic trajectories of differentiation and development.

Or as biochemist Danny Licatalosi and neuroscientist Robert Darnell put it, biological complexity "has RNA at its core."

I think this is an excellent illustration of the differing viewpoints of Philip Ball and many biochemists and molecular biologists. David Penny and the rest of us don't disparage the human genome because we don't understand it. Quite the contrary. We think we DO understand evolution and the basic principles of molecular biology and that's why we recognize a sloppy genome when we see it. Philip Ball just can't get his head around the fact that we aren't ignorant of functional non-coding RNAs ... we just don't believe Mattick and ENCODE when they claim, without evidence, that the human genome is full of non-coding genes modulating some sophisticated regulation of the protein-coding genes.

Not only does such a model lack support but it doesn't make any sense. Why would all the 10,000 or so housekeeping genes require such regulation in humans and not in yeast? Why would evolution have selected for regulatory RNAs acting on the genes for the glycolytic enzymes? What kind of selective advantage would there have to be in order to evolve a regulatory RNA gene that could tweek expression by a few percent?

"Ball is one of the most meticulous, precise science writers out there. He is the antithesis of hypey, "dumb-it-down" reporting. He is MUCH more credible than you are, Laurence."

John Horgan July, 2024
Philip Ball even wants to twist the Monod quote to fit his agenda on the importance of proteins. That's not what Monod meant. But let's think about this for a minute. The biochemists of the last century discovered a complex network of metabolic pathways with reactions that were catalyzed almost exclusively by protein enzymes. That hasn't changed. It's true of E. coli and it's true of elephants.

They also discovered that the expression of genes, especially at the level of transcription, was mostly controlled and regulated by proteins; namely, RNA polymerase and transcription factors. That hasn't changed. The expression of elephant and human genes is also regulated by transcription factors and RNA polymerase. Hundreds of studies of particular mammalian genes have demonstrated beyond a doubt that we can explain most regulation by such a model.

That doesn't mean that proteins are the only players in regulation. Over the past several decades we've discovered a variety of regulatory RNAs and we now know that there are more of these non-coding genes in humans than in bacteria. We don't know how many but so far the number of well-characterized examples amounts to fewer than 2000 genes and probably less than 1000. Note that I said "well-characterized" examples and that means that the individual RNA molecule has been studied and its biologically relevant function has been confirmed. That's not the same as a genomics study that simply identifies candidate transcripts that may or may not have a function.

Proteins still play the most important functional roles in metabolism and gene expression but they are not the only players. We've known that for 50 years. The only thing that's changed is that there may be as many as two thousand non-coding genes in humans and only a dozen or so in E. coli and the human genome may be a lot more sloppy than bacterial genomes. That's not a paradigm shift.

Note: Philip Ball was an editor at Nature and that's ironic because it's the failure of Nature editors to do their job in 2012 that got us into the mess we're in today. The editors not only allowed ENCODE researchers to make exaggerated claims about junk DNA but they actively supported and participated in the publicity campaign that sold those false claims to the general public. Nature editors have never apologized for their behavior in 2012; in fact, one of them, Magdalena Skipper, has been promoted to editor-in-chief. [The 10th anniversary of the ENCODE publicity campaign fiasco]


Friedman, H.C. (2004) From Butyribacterium to E. coli: An Essay on Unity in Biochemistry. Perspectives in Biology and Medicine 47:47-66. doi: 10.1353/pbm.2004.0007

Graur, D., Zheng, Y., Price, N., Azevedo, R.B., Zufall, R.A. and Elhaik, E. (2013) On the immortality of television sets:“function” in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution 5:578-590. doi: doi: 10.1093/gbe/evt028

7 comments:

  1. I find it very difficult to see how Dr Ball can reconcile statements like this: "The elephant has a genome dedicated mostly to making noncoding RNAs with regulatory functions" with his acknowledgement on Sandwalk that between 70% and 90% of the genome is junk. The word "mostly" is highly inappropriate here.

    How the genome is able to untangle the genomic wheat from junk chaff is a wonder of nature and is an amazing story. Prof Moran's sloppy, error prone, spuriously transcribing genome makes far more sense to me and I very much enjoyed, and recommend, his book.

    However I have been debating pre-purchasing the paperback edition of Dr Ball's book, though I must be honest the reviews on Amazon are VERY putting off:

    "It seems to me that Philip Ball has – exhaustively and conclusively – set out the evidence for believing that (1) there is some sort of purpose or teleology in nature akin to Aristotle’s final causes and (2) consciousness must be more mysterious and widespread than simply being an epiphenomenon exclusive to the animal mind."

    Oh dear. Is that an accurate summation?

    ReplyDelete
  2. "Not only does such a model lack support but it doesn't make any sense. Why would all the 10,000 or so housekeeping genes require such regulation in humans and not in yeast? Why would evolution have selected for regulatory RNAs acting on the genes for the glycolytic enzymes? What kind of selective advantage would there have to be in order to evolve a regulatory RNA gene that could tweek expression by a few percent?"

    I have a hard time aligning this comment with an author who is, by all accounts, quite comfortable with the concept of neutral or near-neutral evolution. Why are the tens of thousands of human genes subject to RNA-based modes of regulation? Because they can be. Once the system of small RNA-based regulation appeared, drift - the appearance of new small RNAs, the evolution of extant RNAs and their targets, etc. - ran rampant. Why would one expect the Small RNA World (or, for that matter, the lncRNA World) to be apart from genetic drift and neutral evolution when so much (most, nearly all) of life is governed by the same?

    ReplyDelete
  3. @Arthur Hunt If the small RNAs are functional then their genes must be preserved by natural selection. They must be subject to purifying selection. [Nils Walter disputes junk DNA: (3) Defining 'gene' and 'function']

    Selection is the opposite of random genetic drift.

    ReplyDelete
  4. One thing I do agree with here is this article’s title. I don’t understand sloppy genomes. (I don’t think “sloppy” is quite the right word, however: I think the fuzzier operational principles of the human genome are there for a reason, and in my book I explain why.). The reason I don’t understand them is not because I am ill-informed or stupid but because I am not deluded. It is because I have spent a lot of time speaking to folk who are trying to understand how these genomes work, and so I know that they know they don’t understand them either. There is clearly so much that remains unknown or unclear yet of vital importance to the question: how the 3D structure of chromatin is controlled, how specificity of regulation is achieved from all those transcription factors and binding sites, how TFs and ncRNAs collaborate in transcriptional hubs, what those hubs even are, and so on and so on – and yes, also exactly what ncRNA does and doesn’t do or how much of it does anything.

    Larry explains what David Penny means by his comment, and it’s a good summary of what I mean, and what I say. What he takes issue with is the suggestion that dismay at the “messiness” of our genome comes from not understanding it. “We think we DO understand evolution and the basic principles of molecular biology.” Sure, I’ll buy that – and it doesn’t for one moment mean we fully understand how or why the human genome does what it does, or that what we don’t understand is mere detail. It is possible that Larry has figured it all out and is mysteriously being denied his Nobel prize, but I’d venture to say that most “knowledgeable biochemists and molecular biologists” working on these problems would disagree.

    More misdirection: “There's nothing profoundly upsetting about the fact that elephants have trunks and E. coli doesn't because that's not the point.” Yes, that is not the point, so why mention it, unless you want to imply that I think it is the point?

    Finally:
    “Philip Ball was an editor at Nature and that's ironic because it's the failure of Nature editors to do their job in 2012 that got us into the mess we're in today.” I was not a Nature editor in 2012, so what point is being made here? Even if we all agreed that the ENCODE paper was utterly wrong, this comment is totally otiose.

    ReplyDelete
  5. Neil Taylor: If you read my book, you will see that that comment from an Amazon review is – I was going to say nonsense, but it is probably fairer to say, a reflection of that reader’s preoccupations. Of course, whether you read it or not is your decision – but personally, I choose to read books because I want to find out what the author says, and not only when I have checked in advance what others think the author says.

    ReplyDelete
  6. @Phill Ball You stated earlier that you can accept the evidence showing that 90% of our genome is junk although you also think that, contrary to the evidence, that value could be as low as 70%.

    Assuming that you can live with 90% junk, that must mean that a large percentage of our genome is transcribed to produce junk RNA. It must also mean that there are millions of transcription factor binding sites that have no functional relevance. It means that you can accept the idea that 90% of our genome is fixing alleles at the neutral rate.

    I refer to such a genome as a "sloppy" genome. Do you have another word that you think is more suitable?

    You seem to be implying that there's some mysterious function or purpose hidden in our genome - something that we don't understand. Are you referring to the junk DNA (90%) or are you only referring to the 5% or so that hasn't been assigned a function?

    ReplyDelete
  7. @Philip Ball I know that you weren't an editor in 2012 and, furthermore, I know that you were never directly involved in editing biology articles since that's not your area of expertise.

    I was referring to the fact that being a editor at Nature isn't nearly as important as people used to think. It does not mean that you understand the papers that are being published.

    ReplyDelete