More Recent Comments

Showing posts sorted by date for query junk dna. Sort by relevance Show all posts
Showing posts sorted by date for query junk dna. Sort by relevance Show all posts

Saturday, March 26, 2022

Science communication in the modern world

Science editors asked young scientists to imagine what kind of course they would have created if they could go back to a time before the pandemic [A pandemic education]. Three of the courses were about science communication.

COM 145: Identification, analysis, and communication of scientific evidence

This course focuses on developing the skills required to translate scientific evidence into accessible information for the general public, especially under circumstances that lead to the intensification of fear and misinformation. Discussions will cover the principles of the scientific method, as well as its theoretical and practical relevance in counteracting the dissemination of pseudoscience, particularly on social media. This course discusses chapters from Carl Sagan’s book The Demon-Haunted World, certain peer-reviewed and retracted papers, and materials related to key science issues, such as the anti-vaccine movement. For the final project, students will comprehensibly communicate a scientific topic to the public.

Camila Fonseca Amorim da Silva University of Sao Paulo, Sao Paulo, Brazil

COM 198: Everyday science communication

As scientific discoveries become increasingly specialized, the lack of understanding by the general public undermines trust in scientists and causes the spread of misinformation. This course will be taught by scientists and communication specialists who will provide students with a toolset to explain scientific concepts, as well as their own research projects, to the general public. Upon completion of this course, students will be able to explain to their grandparents that viruses exist even though they can’t see them, convince their neighbors that vaccines don’t contain tracking devices, and explain the concept of exponential growth to governmental officials.

Anna Uzonyi Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.

COM 232: Introduction to talking to regular people

Communicating science is difficult. Many scientists, having immersed themselves in the language of their field, have completely forgotten how to talk to regular people. This course hones introductory science communication skills, such as how to talk about scary things without generating mass panic, how to calmly discourage the hoarding of paper hygiene products, and how to explain why scientific knowledge changes over time. The final project will include cross examination from law school faculty, who are otherwise completely uninvolved with the course and possess minimal scientific training. Recommended for science majors who are unable to discuss impactful scientific findings without citing a P value.

Joseph Michael Cusimano Bernard J. Dunn School of Pharmacy, Shenandoah University, Winchester, VA, USA.

They sound like interesting courses but my own take on science communication is somewhat different. I think it's very difficult for practicing scientists to communicate effectively with the general public so I tend to view science communication at several different levels. My goal is to communicate with an audience of scientists, science journalists, and people who are already familiar with science. The idea is to make sure that this intermediate group understands the scientific facts in my field and to make sure they are familiar with the major controversies.

My hope is that this intermediate group will disseminate this information to their less-informed friends and relatives and, more importantly, stop the spread of misinformation whenever they hear it.

Take junk DNA for example. It's very difficult to convince the average person that 90% of our genome is junk because the idea is so counter-intuitive and contrary to the popular counter-narratives. However, I have a chance of convincing the intermediate group, including science journalists and other scientists, who can follow the scientific arguments. If I succeed, they will at least stop spreading misinformation and false narratives and start presenting alternatives to their sudiences.


Monday, March 14, 2022

Junk DNA

My book manuscript has been reviewed by some outside experts and they seem to have convinced my editor that my book is worth publishing. I hope we can get it finished soon. It would be nice to publish in in September on the 10th anniversary of the ENCODE disaster.

Meanwhile, I keep scanning the literature for mentions of junk DNA to see if scientists are finally coming to their senses. Apparently not, and that's a good thing because it means that my book is still needed. Here's the opening paragraph from a recent review of lncRNAs. The authors are in the Department of Medicine at the Medical College of Gerogia, in Augusta, Georgia (USA).

Ghanam, A.R., Bryant, W.B. and Miano, J.M. (2022) Of mice and human-specific long noncoding RNAs. Mammalian Genome:1-12. [doi: 10.1007/s00335-022-09943-2]

Approximately ninety-eight percent of our genome is noncoding. Contrary to initial descriptions of this vast sea of sequence comprising “junk DNA” (Ohno 1972), comparative genomics and various next-generation sequencing studies have revealed millions of transcription factor binding sites (TFBS) (Vierstra et al. 2020) and tens of thousands of noncoding genes, most notably the class of long noncoding RNAs (LncRNAs), defined currently as processed transcripts of length > 200 base pairs with no protein-coding capacity (Rinn and Chang 2020; Statello et al. 2021). The widespread transcription of LncRNAs and abundance of regulatory sequences such as enhancers support the concept of a genome that is largely functional (ENCODE Project Consortium 2012). Such a dynamic genome should not be surprising given the complex nature of gene expression and gene function necessary for embryonic and postnatal development as well as disease processes.

  • No reasonable scientist, especially Susumu Ohno, ever said that all noncoding DNA was junk.
  • There are millions of transcription factor binding sites but most of them are spurious binding sites that have nothing to do with regulation. They simply reflect the expected behavior of typical DNA binding proteins in a large genome full of junk DNA.
  • Nobody has demonstrated that there are tens of thousand of noncoding genes. There may be tens of thousands of transcripts but that's not the same thing since you have to prove that those transcripts are functional before you can say that they come from genes.
  • There is currently no evidence to support the concept of a genome that is largely functional in spite of what the ENCODE researchers might have said ten years ago.
  • Such a genome would be very surprising, if it were true, given what we know about genomes, evolution, and basic biochemistry.

Except for those few minor details—I hope I'm not being too picky—that's a pretty good way to start a review of lncRNAs. :-)


Wednesday, November 03, 2021

What's in your genome?: 2021

This is an updated version of what's in your genome based on the latest data. The simple version is ...

about 90% of your genome is junk

Sunday, October 24, 2021

Style vs substance in science communication: The role of science writers in major science journals

Science writers have always had articles published in the leading science journals such as Science and Nature but over the past few decades their role seems to have increased so that now even lesser journals employ them to write articles, commentary, and press releases. I recently posted an example of where this can go horribly wrong [Society for Molecular Biology and Evolution (SMBE) spreads misinformation about junk DNA].

The role of science writers has come to dominate the pages of Science and Nature so that we now have a situation where only two thirds of the pages in a typical issue are devoted to actual science publications and most readers are concentrating on the news and opinons in the front part of the journal. In some cases, the science writers control the image of these journals as happened at Nature during the ENCODE publicity campaign in 2012. Over at Science, Elisabeth Pennisi has done more to spread misinformation than any scientist in the field of molecular biology.

These are cases where science writers have sacrificed sustance for style. They write nice readable articles that promote the image of their journal but are scientifcally incorrect.

Let's look at a specific example. Back in 2005 Science celebrated its 125th anniversary by publishing "125 Questions: What We Don't Know." One of those questions was "Why Do Humans Have So Few Genes?"—a question that scientists had adequately answered in 2005 but you wouldn't know that from the short article written by Elizabeth Pennisi [SCIENCE Questions: Why Do Humans Have So Few Genes?]. The article was full of untruths and misinformation. There were lots of other questions in that issue that were just as ridiculous if you knew the topics.

Now, you might imagine that these questions were posed by the leading researchers in their fields but you would be wrong. The list of questions was drawn up by editors and science writers as described in the anniversary issue [SCIENCE Questions: Asking the Right Question].

We began by asking Science’s Senior Editorial Board, our Board of Reviewing Editors, and our own editors and writers to suggest questions that point to critical knowledge gaps. The ground rules: Scientists should have a good shot at answering the questions over the next 25 years, or they should at least know how to go about answering them. We intended simply to choose 25 of these suggestions and turn them into a survey of the big questions facing science. But when a group of editors and writers sat down to select those big questions, we quickly realized that 25 simply wouldn’t convey the grand sweep of cutting-edge research that lies behind the responses we received. So we have ended up with 125 questions, a fitting number for Science’s 125th anniversary.

Isn't it remrkable that editors and writers are being asked to evaluate science (substance) as if their opinions were more important than those of the scientists?

Has Science learned from these mistakes? No, because a few months ago they published a new list of 125 questions in collaboration with the 125th anniversary of Shanghai Jiao Tong University: 125 Questions: Exploration and Discovery. The list of questions hasn't gotten any better; it includes questions like, "How do organisms evolve?"; "What genes make us uniquely human?"; and "How are biomolecules organized in cells to function orderly and effectively?" Many of you can imagine what the short accompanying explanation looks like and you would be right.

Pennisi's original question has disappeared but there's a very similar question in the 2021 list.

Why are some genomes so big and others very small?

Genome size, which is the amount of DNA in a cell nucleus, is extremely diverse across animals and plants, and varies more than 64,000-fold. The smallest genome recorded exists in the microsporidian Encephalitozoon intestinalis (a parasite in certain mammals), and the largest genome belongs to a flowering plant known as Paris japonica, which has 150 billion base pairs of DNA per cell (50 times larger than that of a human). Plants are interesting in that their genome size plays an important role in their biology and evolution. But as the authors of a 2017 paper in Trends in Plant Sciences wrote: “Although we now know the major contributors to genome size diversity are non-protein coding, often highly repetitive DNA sequences, why their amounts vary so much still remains enigmatic.”

Sandwalk readers know that knowledgeable scientists came up with good answers to that question about 50 years ago. One answer is that different species have different amounts of junk DNA because some species don't have large enough populations to eliminate it by natural selection. In other cases, the differences are due to polyploidization.

You would think that after all the criticism of Science over their past coverage of genomes and junk DNA that the writers and editors would know this. But they don't, and that's because science writers and editors seem to be remarkably immune to scientific criticism. (The topic probably doesn't come up when they get together at their science writers' conventions.) I'm making the case that they are so focused on style (science writing) that they just don't care about substance (scientific accuracy).

The major journals have a serious problem that they don't recognize. A lot of the stuff that appears in their journals is not scientifically accurate or, at the very least, is misleading. They're not going to fix this problem if their editorial staff is dominated by science journalists.


Tuesday, October 19, 2021

Society for Molecular Biology and Evolution (SMBE) spreads misinformation about junk DNA

The Society for Molecular Biology and Evolution (SMBE) is a pretigious society of workers in the field of molecular evolution. I am a member and I have attended many of their conferences. SMBE sponsors several journals incucluding Genome Biology and Evolution (GBE), which is published by Oxford Academic Press.

The latest issue of GBE has a paper by Stitz et al. (2021) that describes some repetitive elements in the platyhelminth Schistosoma mansoni. The authors conlcude that some of these elements might have a function and this prompts them to begin their discussion with the following sentences.

The days of “junk DNA” are over. When the senior authors of this article studied genetics at their respective universities, the common doctrine was that the nonprotein coding part of eukaryotic genomes consists of interspersed, “useless” sequences, often organized in repetitive elements such as satDNA. The latter might have accumulated during evolution, for example, as a consequence of gene duplication events to separate and individualize gene function (Britten and Kohne 1968; Comings 1972; Ohno 1999). This view has fundamentally changed (Biscotti, Canapa, et al. 2015), and our study is the first one addressing this issue with structural, functional, and evolutionary aspects for the genome of a multicellular parasite.

It is unfortunate that the senior authors didn't receive a good undergraduate education but one might think that they would rectify that problem by learning about genomes and junk DNA before publishing in a good journal devoted to genomes and evolution. Alas, they didn't and, even worse, the journal published their paper with those sentences intact.

As you might imagine, these statements were seized upon by Intelligent Design Creationists who wasted no time in posting on their creationist blog [Oxford Journal: “The Days of ‘Junk DNA’ Are Over ”].

But that's not the worst of it. The same issue contains an editorial written by Casey McGrath who self identifies as a employee of the Society for Molecular Biology and Evolution in Lawrence Kansus (USA). She is the Social Media Editor for Genome Biology and Evolution. The title of her editorial is "Highlight—“Junk DNA” No More: Repetitive Elements as Vital Sources of Flatworm Variation" (McGrath, 2021). She starts off by repeating and expanding upon the words of the senior authors of the study that I referred to above.

“The days of ‘junk DNA’ are over,” according to Christoph Grunau and Christoph Grevelding, the senior authors of a new research article in Genome Biology and Evolution. Their study provides an in-depth look at an enigmatic superfamily of repetitive DNA sequences known as W elements in the genome of the human parasite Schistosoma mansoni (Stitz et al. 2021). Titled “Satellite-like W elements: repetitive, transcribed, and putative mobile genetic factors with potential roles for biology and evolution of Schistosoma mansoni,” the analysis reveals structural, functional, and evolutionary aspects of these elements and shows that, far from being “junk,” they may exert an enduring influence on the biology of S. mansoni.

“When we studied genetics at university in the 1980s, the common doctrine was that the non-protein coding parts of eukaryotic genomes consisted of interspersed, ‘useless’ sequences, often organized in repetitive elements like satellite DNA,” note Grunau and Grevelding. Since then, however, the common understanding of such sequences has fundamentally changed, revealing a plethora of regulatory sequences, noncoding RNAs, and sequences that play a role in chromosomal and nuclear structure. With their article, Grunau and Grevelding, along with their coauthors from Justus Liebig University Giessen, University of Montpellier, and Leipzig University, contribute further evidence to a growing consensus that such sequences play critical roles in evolution.

There's no rational excuse for publishing the Stitz et al. paper with those ridiculous statements and there's no rational excuse for compounding the error by highlighting them in an editorial comment. The Society for Molecular Biology and Evolution should be ashamed and embarrassed and they should issue a retraction and a clarification. They should state clearly that junk DNA is alive and well and supported by so much evidence that it would be perverse to deny it.


McGrath,C. (2012) Highlight—“Junk DNA” No More: Repetitive Elements as Vital Sources of Flatworm Variation. Genome Biology and Evolution 13: evab217 [doi: 10.1093/gbe/evab217]

Stitz, M., Chaparro, C., Lu, Z., Olzog, V.J., Weinberg, C.E., Blom, J., Goesmann, A., Grunau, C. and Grevelding, C.G. (2021) Satellite-Like W-Elements: Repetitive, Transcribed, and Putative Mobile Genetic Factors with Potential Roles for Biology and Evolution of Schistosoma mansoni. Genome Biology and Evolution 13:evab204. [doi: 10.1093/gbe/evab204]

Monday, September 27, 2021

The biggest mistake in the history of molecular biology (not!)

The creationists are committed to proving that most of our genome is functional because otherwise the idea of an intelligent designer doesn't make a lot of sense. They reject all of the evidence that supports junk DNA and they vehemently reject the notion that 90% of our genome is junk.

I was recently alerted to a video on junk DNA produced by Creation Ministries International in which they quote John Mattick.

A leading figure in genetics, Prof. John Mattick said ...'the failure to recognize the implications of the non-coding DNA will go down as the biggest mistake in the history of molecular biology'.

The creationists are making the common mistake of equating noncoding DNA and junk DNA but the quotation sounded accurate to me since John Mattick makes similar mistakes in his publications. I decided to try and find the exact quotation and reference and the closest I could come to a direct quote was in a paper by Mattick from 2007 (Mattick, 2007). He's referring to introns—here's the exact quotation.

It should be noted that the power and precision of digital communication and control systems has only been broadly established in the human intellectual and technological experience during the past 20–30 years, well after the central tenets of molecular biology were developed and after introns had been discovered. The latter was undoubtedly the biggest surprise (Williamson, 1977), and its misinterpretation possibly the biggest mistake, in the history of molecular biology. Although introns are transcribed, since they did not encode proteins and it was inconceivable that so much non-coding RNA could be functional, especially in an unexpected way, it was immediately and almost universally assumed that introns are non-functional and that the intronic RNA is degraded (rather than further processed) after splicing. The presence of introns in eukaryotic genomes was then rationalized as the residue of the early assembly of genes that had not yet been removed and that had utility in the evolution of proteins by facilitating domain shuffling and alternative splicing (Crick, 1979; Gilbert, 1978; Padgett et al., 1986). Interestingly, while it has been widely appreciated for many years that DNA itself is a digital storage medium, it was not generally considered that some of its outputs may themselves be digital signals, communicated viaRNA.

However, the idea of the biggest mistake in molecular biology predates that reference. Mattick is quoted in a Scientific American article by W. Wayt Gibbs where Gibbs is discssing the "suprising" fact that regulatory sequences are conserved and that some genes are noncoding genes (Gibbs, 2003).

“I think this will come to be a classic story of orthodoxy derailing objective analysis of the facts, in this case for a quarter of a century,” Mattick says. “The failure to recognize the full implications of this—particularly the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules—may well go down as one of the biggest mistakes in the history of molecular biology.”

The discovery of introns in the mid-1970s was definitely a surprise but it's not true, as Mattick implies, that they were immediately assumed to be junk. In fact, as he points out, there was a lot of debate over the possible role of introns in the evolution of protein-coding genes where they could stimulate exon shuffling. Later on, the presence of introns was recognized to be an essential component of alternative splicing.

Once more and more sequences were published it became apparent that neither their size nor their sequences were conserved except for the spliceosome recognition sequences. It soon became obvious that their sequences were evolving at the neutral rate demonstrating that they were mostly junk. Mattick assumes that this conclusion—that introns are mostly junk—is one of the biggest mistakes in molecular biology. I think the opposite is true. I think that the failure of most molecular biologists to understand junk DNA is a huge mistake.

The creationists are misquoting Mattick when they say that the classification of all noncoding as junk is the biggest mistake in molecular biology. In the quotations above, Mattick is specifically referrring to introns but I'm sure he won't be upset to be misquoted in that manner since he firmly believes that most noncoding DNA is functional.

There's a bit of an ironic twist here. If it were true that knowledgeable scientists in the 1970s actually believed that all noncoding DNA was junk then I'd have to agree that this would have been a big (biggest?) mistake. But they didn't and it wasn't a big mistake. As I've said many times, no knowledgeable scientist ever said that all noncoding DNA was junk since they (we) all knew about noncoding genes, regulatory sequences, centromeres, and origins of replication, all of which are functional noncoding DNA. We now know that about 1% of our genome is coding sequences and about 9% is functional noncoding DNA. The other 90% is junk.

[Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means]


Mattick, J.S. (2007) A new paradigm for developmental biology. Journal of Experimental Biology 210:1526-1547. [doi: 10.1242/jeb.005017]

Gibbs, W.W. (2003) The unseen genome: gems among the junk. Scientific American 289:46-53.

Monday, May 31, 2021

Nessa Carey talks about epigenetics

Nessa Carey wrote a horribe book about junk DNA where she completely misunderstood the science. It's one of many examples of bad science journalism [Nessa Carey doesn't understand junk DNA].

I recently became aware of a talk given in 2015 by Nessa Carey on epigenetics so I'm posting it here. (She also wrote a book about epigenetics.) She is an entertaining speaker and gives a very good presentation but that's a problem if the science is misleading. Judge for yourselves.


Monday, May 10, 2021

MIT Professor Rick Young doesn't understand junk DNA

Richard ("Rick") Young is a Professor of Biology at the Massachusetts Institute of Technology and a member of the Whitehead Institute. His area of expertise is the regulation of gene expression in eukaryotes.

He was interviewed by Jorge Conde and Hanne Winarsky on a recent podcast (Feb. 1, 2021) where the main topic was "From Junk DNA to an RNA Revolution." They get just about everything wrong when they talk about junk DNA including the Central Dogma, historical estimates of the number of genes, confusing noncoding DNA with junk, alternative splicing, the number of functional RNAs, the amount of regulatory DNA, and assuming that scientists in the 1970s were idiots.

In this episode, a16z General Partner Jorge Conde and Bio Eats World host Hanne Winarsky talk to Professor Rick Young, Professor of Biology and head of the Young Lab at MIT—all about “junk” DNA, or non-coding DNA.

Which, it turns out—spoiler alert—isn’t junk at all. Much of this so-called junk DNA actually encodes RNA—which we now know has all sorts of incredibly important roles in the cell, many of which were previously thought of as only the domain of proteins. This conversation is all about what we know about what that non-coding genome actually does: how RNA works to regulate all kinds of different gene expression, cell types, and functions; how this has dramatically changed our understanding of how disease arises; and most importantly, what this means we can now do—programming cells, tuning functions up or down, or on or off. What we once thought of as “junk” is now giving us a powerful new tool in intervening in and treating disease—bringing in a whole new category of therapies.

Here's what I don't understand. How could a prominent scientist at one of the best universities in the world be so ignorant of a topic he chooses to discuss on a podcast? Perhaps you could excuse a busy scientist who doesn't have the time to research the topic but what excuse can you offer to explain why the entire culture at MIT and the Whitehead must also be ignorant? Does nobody there ever question their own ideas? Do they only read the papers that support their views and ignore all those that challenge those views?

This is a very serious question. It's the most difficult question I discuss in my book. Why has the false narrative about junk DNA, and many other things, dominated the scientific literature and become accepted dogma among leading scientists? Soemething is seriously wrong with science.


Friday, May 07, 2021

More misinformation about junk DNA: this time it's in American Scientist

Emily Mortola and Manyuan Long have just published an article in American Scientist about Turning Junk into Us: How Genes Are Born. The article contains a lot of misinformaton about junk DNA that I'll discuss below.

Emily Mortola is a freelance science writer who worked with Manyuan Long when she was an undergraduate (I think). Manyuan Long is the Edna K. Papazian Distinguished Service Professor of Ecology and Evolution in the Department of Ecology and Evolution at the University of Chicago. His main research interest is the origin of new genes. It's reasonable to suspect that he's an expert on genome structure and evolution.

The article is behind a paywall so most of you can't see anything more than the opening paragraphs so let's look at those first. The second sentence is ...

As we discovered in 2003 with the conclusion of the Human Genome Project, a monumental 13-year-long research effort to sequence the entire human genome, approximately 98.8 percent of our DNA was categorized as junk.

This is not correct. The paper on the finished version of the human genome sequence was published in October 2004 (Finishing the euchromatic sequence of the human genome) and the authors reported that the coding exons of protein-coding genes covered about 1.2% of the genome. However, the authors also noted that there are many genes for tRNAs, ribosomal RNAs, snoRNAs, microRNAs, and probably other functional RNAs. Although they don't mention it, the authors must also have been aware of regulatory sequences, centromeres, telomeres, origins of replication and possibly other functional elements. They never said that all noncoding DNA (98.8%) was junk because that would be ridiculous. It's even more ridiculous to say it in 2021 [Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means].

The part of the article that you can see also lists a few "Quick Takes" and one of them is ...

Close to 99 percent of our genome has been historically classified as noncoding, useless “junk” DNA. Consequently, these sequences were rarely studied.

This is also incorrect as many scientists have pointed out repeatedly over the past fifty years or so. At no time in the past 50 years has any knowledgeable scientist ever claimed that all noncoding DNA is junk. I'm sorely tempted to accuse the authors of this article of lying because they really should know better, especially if they're writing an article about junk DNA in 2021. However, I reluctantly defer to Hanlon's razor.

Mortola and Long claim that mammalian genomes have between 85% to 99% junk DNA and wonder if it could have a function.

To most geneticists, the answer was that it has no function at all. The flow of genetic information—the central dogma of molecular biology—seems to leave no role for all of our intergenic sequences. In the classical view, a gene consists of a sequence of nucleotides of four possible types--adenine, cytosine, guanine, and thymine--represented by the letters A, C, G, and T. Three nucleotides in a row make up a codon, with each codon corresponding to a specific amino acid, or protein subunit, in the final protein product. In active genes, harmful mutations are weeded out by selection and beneficial ones are allowed to persist. But noncoding regions are not expressed in the form of a protein, so mutations in noncoding regions can be neither harmful nor beneficial. In other words, "junk" mutations cannot be steered by natural selection.

Those of you who have read this far will cringe when reading that. There are so many obvious errors in that paragraph that applying Hanlon's razor seems very complimentary. Imagine saying in the 21st centurey that the Central Dogma leaves no role at all for regulatory sequences or ribosomal RNA genes! But there's more; the authors double-down on their incorrect understanding of "gene" in order to fit their misunderstanding of the Central Dogma.

What Is a Gene, Really?

In our de novo gene studies in rice, to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.

Five Things You Should Know if You Want to Participate in the Junk DNA Debate

The authors admit in the next paragraph that some pseudogenes may produce functional RNAs that are never translated into proteins but they don't mention any other types of gene. I can understand why you might concentrate on protein-coding genes if you are studying de novo genes but why not just say that there are two types of genes and either one can arise de novo? But there's another problem with their definition: they left out a key property of a gene. It's not sufficient that a given stretch of DNA is transcribed and the RNA is translated to make a protein: the protein has to have a function before you can say that the stretch of DNA is a gene [What Is a Gene?]. We'll see in a minute why this is important.

The main point of the paper is the birth of de novo genes and the authors discuss their work with the rice genome. They say they've discovered 175 de novo genes but they don't say how many have a real biological function. This is an important problem in this field and it would have been fascinating to see a description of how they go about assigning a function to their, mostly small, pepides [The evolution of de novo genes]. I'm guessing that they just assume a function as soon as they recognize an open reading frame in a transcript.

As you can see from the title of the article, the emphasis is on the idea that de novo genes can arise from junk DNA—a concept that's not seriously disputed. The one good thing about the article is that the authors do not directly state that the reason for junk DNA is to give rise to new genes but this caption is troubling.

The Human Genome Project was a 13-year-long research effort aimed at mapping the entire human genetic sequence. One of its most intriguing findings was the observation that the number of protein-coding genes estimated to exist in humans--approximately 22,300--represents a mere 1.2 percent of our whole genome, with the other 98.8 percent being categorized as noncoding, useless junk. Analyses of this presumed junk DNA in diverse species are now revealing its role in the creation of genes.

Why do science writers continue to spread misinformation about junk DNA when there's so much correct information out there? All you have to do is look [More misconceptions about junk DNA - what are we doing wrong?].


Monday, May 03, 2021

More illusions/delusions of James Shapiro and Denis Noble

It was just a few weeks ago that I discussed short articles by Denis Noble and James Shapiro that were published in the journal Biosemiotics [The illusions of Denis Noble] [The illusions of James Shapiro].

Several readers questioned whether Biosemiotics is a real science journal and they were right: it's a kooky journal and that's why it publishes papers by kooks. However, we now have a new paper by Shapiro and Noble that's about to appear in a legitimate scientific journal; albeit, one that has seen better days. This would normally raise red flags concerning peer review but we're long past the time when we can count on peer review to weed out the kooks.

Here's the paper. I'm not going to discuss all the main points because they were covered in my previous posts. I'll just concentrate on the most ridiculous part in order to illustrate the (lack of) quality of this paper.1

Shapiro, J. and Noble, D. (2021) What prevents mainstream evolutionists teaching the whole truth about how genomes evolve? Progress in Biophysics and Molecular Biology. [doi: 10.1016/j.pbiomolbio.2021.04.004]

The common belief that the neo-Darwinian Modern Synthesis (MS) was buttressed by the discoveries of molecular biology is incorrect. On the contrary those discoveries have undermined the MS. This article discusses the many processes revealed by molecular studies and genome sequencing that contribute to evolution but nonetheless lie beyond the strict confines of the MS formulated in the 1940s. The core assumptions of the MS that molecular studies have discredited include the idea that DNA is intrinsically a faithful self-replicator, the one-way transfer of heritable information from nucleic acids to other cell molecules, the myth of “selfish DNA,” and the existence of an impenetrable Weismann Barrier separating somatic and germ line cells. Processes fundamental to modern evolutionary theory include symbiogenesis, biosphere interactions between distant taxa (including viruses), horizontal DNA transfers, natural genetic engineering, organismal stress responses that activate intrinsic genome change operators, and macroevolution by genome restructuring (distinct from the gradual accumulation of local microevolutionary changes in the MS). These 21st Century concepts treat the evolving genome as a highly formatted and integrated Read-Write (RW) database rather than a Read-Only Memory (ROM) collection of independent gene units that change by random copying errors. Most of the discoverers of these macroevolutionary processes have been ignored in mainstream textbooks and popularizations of evolutionary biology, as we document in some detail. Ironically, we show that the active view of evolution that emerges from genomics and molecular biology is much closer to the 19th century ideas of both Darwin and Lamarck. The capacity of cells to activate evolutionary genome change under stress can account for some of the most negative clinical results in oncology, especially the sudden appearance of treatment-resistant and more aggressive tumors following therapies intended to eradicate all cancer cells. Knowing that extreme stress can be a trigger for punctuated macroevolutionary change suggests that less lethal therapies may result in longer survival times.

The section on "selfish DNA" is the one that seems to have the highest number of misleading and false statements per paragraph.

1.4. The end of “selfish” or “junk” DNA

A major shortcoming of the MS is that it was based on a “gene-centric” view, which assumed that the genome is basically a collection of “genes” that are the protein-coding units of heredity and heritable variation. As we saw in the quotation from Goldschmidt's 1940 book, this view failed to take the evolutionary importance of chromosome structure into account (Goldschmidt, 1940). It also blinded evolutionary biologists to the importance of McClintock's mid- 20th Century discovery of mobile “controlling elements” (McClintock, 1987). Both the ideas of genetic transposition and control of gene expression by these non-coding mobile elements did not fit within the narrow confines of the MS concepts of genome function and variation. A further empirical assault on the limited MS conceptual framework came in the late 1960s when Britten and Kohne discovered that a significant fraction of genomic DNA from complex eukaryotes consists of highly repetitive sequences rather than the unique coding sequences expected to make up the hereditary material (Britten and Kohne, 1968).

  • The title is ridiculous since no respectable scientist ever equated selfish DNA with junk DNA [Selfish genes and transposons].

  • The Modern Synthesis (MS) was not based on a "gene-centric" view.
  • For the past 50 years, no respectable scientist, and no knowledgeable expert in molecular evolution, has restricted the definition of "gene" to just protein-coding genes.
  • For the past 50 years, no expert in molecular evolution has ever thought that the genome is just a collection of protein-coding genes.
  • For the past 50 years, experts in molecular biology have known about transposons and have considered the view that some of them might be "controlling elements." They have concluded that most transposon-related sequences are just fragments of defective transposons with no biological function.
  • Nobody cares whether mobile genetic elements fit within the narrow confines of the Modern Synthesis as described by Huxley and other in the 1940s because no exeprt in molecular evolution has believed in that view of evolution since the late 1960s.
  • The Britten and Kohne paper established that the genomes of most multicellular eukaryotes contain large amounts of repetivie DNA. This was an attempt to resolve the C-value paradox. Britten and Kohne didn't like the idea that this could be junk DNA so they offered some speculation about function. However, futher data established that most of this repetitive DNA is, indeed, junk and Britten and Kohn's speculations have been discredited. Britten and Kohn were attempting to interpret their result within the context of the adaptationist views that characterized the the Modern Synthesis back then. The correct interpretation of their results came with the overthrow of the Modern Synthesis and the adoption of a new view of evolutionary theory that focused on Neutral Theory, Nearly-Neural Theory, and the importance of random geneitc drift. Shaprio and Noble missed that revolution so they continue to attack an old-fashioned strawman version of evolutionay theory.

Before continuing, it's important to realize that by the early 1970s selectionist thinking had been abandoned by the experts in genome evolution. By 1978 Gould and Lewontin tried, unsccessfully, to convince all other biologists to abandon the old selectionist way of thinking [The Spandrels of San Marco and the Panglossian Paradigm]. James Shapiro and Denis Noble are among those other biologists who didn't get the message.

In order to apply selectionist thinking to explain the presence of so much non-coding DNA, evolutionary biologists called this unexpected portion of the genome “junk DNA” (Ohno, 1972) or “selfish DNA” (Orgel and Crick, 1980). Richard Dawkins used an extreme view of these “selfish genes” to erect a whole philosophy of strictly passive evolutionary gradualism (Dawkins, 1976). Today we know that the human genome contains at least 30X as much repetitive non-coding DNA as protein-coding sequences (Lander et al., 2001). Repetitive DNA provides formatting signals for transcription, epigenetic modification and chromosome mechanics and also is the most variable component in the evolutionary diversification of complex genomes (Symonová and Howell, 2018; Subirana et al., 2015; Matsubara et al., 2016; CioffiMde et al., 2015; Chalopin et al., 2015; Shao et al., 2019; Böhne et al., 2008; Li et al., 2016; Oliver et al., 2013). A 2013 plot of organismal complexity against protein-coding and non-coding DNA showed that coding DNA peaked at approximately ∼3 × 107 bp, while the non-coding DNA increased linearly with growing complexity up to ∼2–3 x 1010 bp (Liu et al., 2013). In other words, non-coding DNA tracked organismal complexity better than the protein-coding genes. The “encyclopedia of DNA elements” (ENCODE) project, which largely abandoned the term “gene,” revealed that the large majority of the so-called junk DNA is actively transcribed in a regulated manner, indicating that it is functional (Consortium, 2012; Pennisi, 2012).

  • It is completely, totally, ridiculous to say that the idea of junk DNA was due to selectionist thinking. The first statement in this paragraph is powerful evidence that Shaprio and Noble don't know what they are talking about. The concept of junk DNA is a rejection of selectionist thinking.
  • The use of "noncoding DNA" is what's called a "tell."
  • Again, equating junk DNA with selfish DNA is stupid. If all the excess DNA were selfish then it isn't junk because it has a function.
  • Richard Dawkins' view on evolution is closer to the old-fashioned adaptationist view that was abandoned by the experts by the time he wrote The Selfish Gene. Dawkins book is not really about "genes," however, as is clear to anyone who has read it. He's talking about any piece of DNA that confers a fitness advantage. The Dawkins strawman is a favorite target of the Third Way types but it's just a strawman.
  • No significant proportion of repetitive DNA has a function in spite of the references quoted above.
  • There is no significant correlation between organismal compexity and noncoding DNA. Lots of very similar species, such as onions, have very different genome sizes.
  • No knowledgeable scientist since the 1980s thinks there should be a significant correlation between the number of genes and organismal complexity. We know that most of the phenotypic differences between multicellular species are due to changes in the timing and amount of expression of a standard set of genes. This is the main discovery of evolutionary-developmental biology (evo-devo), another revolution that Shapiro and Nobel missed. They should educate themselves by reading Sean B. Carroll's books.
  • The ENCODE researchers did lots of silly things but they did NOT abandon the term "gene."
  • The idea that most of our genome is functional because of ENCODE is laughable in 2021. The fact that Shapiro and Noble would bring this up is another "tell" and the fact that they would reference Elizabeth Pennisi is even more revealing. These guys are incapable of thinking critically.

Shaprio and Noble then describe a few examples of repetitive DNA sequences that have a known function and they point out that a number of noncoding genes have been indentified. They imply that these functional sequences make up a signifcant fraction of the genome thus calling the concept of junk DNA into question. They close the section with,

Clearly, none of the eminent scientists who wrote about junk or selfish DNA could possibly have imagined the wide range of cellular functionalities that we know today are executed by ncRNA molecules. The idea that a genome was just a collection of protein coding sequences has proved completely inadequate.

  • I don't know about you, dear reader, but I'll match those "eminent scientists" against Shapiro and Noble any day. I'd love to see them try to defend their views in a public debate against some of the leading proponents of junk DNA. I know where my money would be.

Let me close by quoting the last chapter of this paper. I don't intend to comment on it except to say that it gives new meaning to the word "irony."

The campaign to sustain the Modern Synthesis causes real harm in a number of different ways. Among doctors treating bacterial infections, ignorance of real-world evolutionary processes has led to a situation in which the available antibiotics have lost their effectiveness against many life-threatening conditions (CDC et al., 2019). Among the general public, the inability to comprehend the potential all living organisms possess for transferring and reorganizing genomic configurations makes them unprepared to form sound judgements about how society should utilize its growing arsenal of biotechnology tools acquired from our microbial neighbors, like CRISPR (Doudna, 2020). Among oncologists, MS thinking prevents the practitioners treating cancer patients from recognizing the dangers of overtreating tolerable tumors in ways that may provoke a macroevolutionary transition to a far more lethal and untreatable disease (Heng, 2019). Finally, in the battle against obscurantism and anti-evolution prejudice, insistence on an outdated set of assertions about how life can change itself leaves the defenders of rigorous scientific inquiry without satisfactory responses to critics. Clearly, the time has come for the mainstream evolution community to recognize and join the scientific reality of the 21st Century.

Finally, one of the most important properties of kooks is that they find each other and they tend to hang out together, either physically or virtually. I'm not sure why this happens since they often espouse mutually exclusive views. I'm guessing that we can explain it in two different ways: (1) they are all outsiders fighting against a common enemy; namely, real science, and (2) they lack critical thinking skills so they don't see the flaws in each other's arguments.


1. In case you didn't recognize the quality from the title.

Thursday, April 29, 2021

Chromatin organization at promoters in yeast cells

Our genome is very large and very complicated because it is full of junk DNA. It contains thousand of sites where DNA binding proteins can bind just by chance. This leads to the reorganization of nucleosomes in a way that mimics functional sites. It's difficult to distinguish these spurious sites from real functional sites and that has led to much confusion in the scientific literature.1

The yeast genome is much more simple and it's safe to assume that almost all of the sites detected by the standard chromatin assays are genuine, biologically relevant, sites. In that sense, it serves as a model for what functional sites looks like. A recent paper in Nature (April 8, 2021) reports on the mapping of most of the sites in the yeast genome where DNA binding proteins are found.

Rossi, M.J., Kuntala, P.K., Lai, W.K., Yamada, N., Badjatia, N., Mittal, C., Kuzu, G., Bocklund, K., Farrell, N.P., Blanda, T.R.M., Joshua D, V, B.A., Mistretta, K.S., Rocco, D.J., Perkinson, E.S., Kellogg, G.D., Mahony, S. and Pugh, B.F. (2021) A high-resolution protein architecture of the budding yeast genome. Nature 592:309-314. [doi: 10.1038/s41586-021-03314-8]

Origins of replication

Origins of replication are also called autonomously replicating sequence consensus sequences (ACS). There are 253 of them in the yeast genome and they are characterized by a 300 bp nucloeosome-free region that's occupied by the origin recognition complex (ORC) and the helicase MCM.

Telomeres

Telomeres are bound by a number of proteins including silent information regulators (SIRs). There's a nucleosome-free region of about 300 bp. where these proteins are located.

Centromeres

The nucleosome-free region at centromeres covers only 170 bp where a number of centromere binding proteins are located. The absence of nucleosomes at the centromere is a surprise since it was though that centromere DNA was bound by modified nucleosomes containing a specific histone variant.

Wednesday, April 21, 2021

The illusions of James Shapiro

James A. Shapiro is a professor in the Department of Biochemistry and Molecular Biology at the University of Chicago (Chicago, USA). He made signficant contributions to our understanding if the function and structure of transposons but in later years he has become a vocal opponent of evolution culminating in his 2011 book Evolution: A View from the 21st Century. He is one of the founding members of The Third Way of Evolution.

I wrote a critical review of Evolution: A View from the 21st Century for the National Center for Science Education (NCSE) Reports but the issue is no longer visible on the web. Shapiro didn't like my review so NCSE published his rebutal and that's also unavailable. You can see my response at: James Shapiro Responds to My Review of His Book.

Monday, April 19, 2021

The illusions of Denis Noble

Denis Noble was a Professor of Physiology at Oxford University in the United Kingdom until he retired. He had a distinguished career as a physiologist making significant contributions to our undestanding of the heart and its relationship to the whole organism.

In recent years, Noble has dabbled in philosophy and evolution. He has become a vocal opponent modern evolution (sensu Noble) and the way science is currently conducted. Some of his criticisms have made it onto two popular books: The Music of Life and Dance to the Tune of Life. He is one of the leading proponents of the "Extended Evolutionary Synthesis" (EES) and he is one of the founders of The Third Way of Evolution, a wishy-washy and scientifically inaccurate way of attacking a strawman version of evolution and providing a safe haven for religious scientists.

Tuesday, April 13, 2021

How do you explain evolution to non-experts?

I spent a lot of time explaining evolution in my book. The goal is to educate readers to the level where they can understand the drift-barrier hypothesis and why slightly deleterious mutations can accumulate in species with small populations. This requires some knowledge of random genetic drift and some knowledge of Neutral Theory and Nearly-Neutral Theory. The emphasis is on population genetics as the most important way of understanding evolution.

You can't understand genomes and junk DNA unless you have a firm understanding of evolution. In fact, you can't make sense of anything about genes and gene expression without such knowledge ... what the heck, nothing in all of biology makes sense if you don't know about evolution.

My approach hasn't been copied by popular websites. They usually misrepresent evolution by presenting it as adaptation; natural selection is the only game in town. I'll put in a link to Francis Collins describing evolution in truly bizarre narration but my question for Sandwalk readers is whether this is useful or not. Is it better to dumb down evolution on the NIH: National Huamn Genome Website [Evolution] or is this a bad idea?


Friday, April 09, 2021

Should we teach genomics and evolution to medical students?

Rama Singh,1 a biology professor at McMaster Universtiy in Hamilton (Ontario, Canada) has just published an interesting article on The Conversation website. It's about Medical schools need to prepare doctors for revolutionary advances in genetics. You can read the full article yourself but let me highlight the last few paragraphs to start the discussion.

Future physicians will be part of health networks involving medical lab technicians, data analysts, disease specialists and the patients and their family members. The physician would need to be knowledgeable about the basic principles of genetics, genomics and evolution to be able to take part in the chain of communication, information sharing and decision-making process.

This would require a more in-depth knowledge of genomics than generally provided in basic genetics courses.

Much has changed in genetics since the discovery of DNA, but much less has changed how genetics and evolution are taught in medical schools.

In 2013-14 a survey of course curriculums in American and Canadian medical schools showed that while most medical schools taught genetics, most respondents felt the amount of time spent was insufficient preparation for clinical practice as it did not provide them with sufficient knowledge base. The survey showed that only 15 per cent of schools covered evolutionary genetics in their programs.

A simple viable solution may require that all medical applicants entering medical schools have completed rigorous courses in genetics and genomics.

Here's the problem. I've just finished research on a book about modern evolution and genomics so I think I know a little bit about the subject. I'm also on the editorial board of a journal that publishes research on biochemistry and molecular biology education. I've written a biochemistry textbook and I have far too many years of experience trying to teach this material to graduate students and undergraduates at the University of Toronto. I can safely say that we (university teachers) have done a horrible job of teaching evolution and genomics to our students. We have turned out an entire generation of students who don't understand modern molecular evolution and don't understand what's in your genome.

What this means is that there's an extremely small pool of students who have completed "rigorous courses in genetics and genomics." Nobody will be able to apply to medical school. I doubt that we could teach this material to medical students with or without the appropriate background.

But you don't have to take my word for it. Some people have tried to teach this material to health science workers so we can see how it's working at that level. Take a look at the The Genomics Education Programme supported by the NHS in the United Kingdom. They have a series of short videos and longer lessons that are designed to educate health care specialists. Here's the blurb that defines their objective.

Rapid advances in technology and understanding mean that genomics is now more relevant than ever before. As genomics increasingly becomes a part of mainstream NHS care, all healthcare professionals, and not just genomics specialists, need to have a good understanding of its relevance and potential to impact the diagnosis, treatment and management of people in our care.

In 2014, Health Education England (HEE) launched a four-year £20 million Genomics Education Programme (GEP) to ensure that our 1.2 million-strong NHS workforce has the knowledge, skills and experience to keep the UK at the heart of the genomics revolution in healthcare.

Funding for the programme has since been extended to enable us to continue our work in providing co-ordinated national direction of education and training in genomics and developing resources for a wide range of professionals.

They describe genes as 'coding' genes that build proteins. There's no mention of noncoding genes. The define a genome as "both genes (coding) and non-coding DNA." They also say that your genome is all of the DNA in our cells (46 chromosomes, 23 pairs). I don't see anything in their education packages that covers modern molecular evolution. In one of the packages they say,

The term ‘junk DNA’ has been used since the 1970s to describe non-coding regions of the genome, but today it is considered inaccurate and misleading. The term ‘junk’ suggests that 98% of the genome has no use, but in recent years, studies and projects have used advances in technology to shed light on these regions and have come to different conclusions about how much of the genome has a biological function.

Here's a link to a short video called What is a genome?. I recommend that you watch it to see the level that these experts think is suitable for health care professionals in the UK and to see the level of expertise of those who made the video. This is what seven years of work by experts and £20 million will get you.

All of this tells me that teaching genomics and evolution to medical students is going to be a lot more difficult than Rama Singh imagines. Not only would we have to counter several years of misinformation but we would have to rely on teachers who probably don't understand either topic.

Let's start by teaching these things correctly to biology and biochemistry majors. That's going to be hard enough for now.


1. Full displosure: Rama and I shared an NSERC grant in 1981 on genetic variation in Drosophila.

Wednesday, April 07, 2021

Bold predictions for human genomics by 2030

After spending several years working on a book about the human genome I've come to the realization that the field of genomics is not delivering on its promise to help us understand what's in your genome. In fact, genomics researchers have by and large impeded progress by coming up with false claims that need to be debunked.

My view is not widely shared by today's researchers who honestly believe they have made tremendous progress and will make even more as long as they get several billion dollars to continue funding their research. This view is nicely summarized in a Scientific American article from last fall that's really just a precis of an article that first appeared in Nature. The Nature article was written by employees of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health in Bethesda, MD, USA (Green et al., 2020). Its purpose is to promote the work that NHGRI has done in the past and to summarize its strategic vision for the future. At the risk of oversimplifying, the strategic vision is "more of the same."

Green, E.D., Gunter, C., Biesecker, L.G., Di Francesco, V., Easter, C.L., Feingold, E.A., Felsenfeld, A.L., Kaufman, D.J., Ostrander, E.A. and Pavan, W.J. and 20 others (2020) Strategic vision for improving human health at The Forefront of Genomics. Nature 586:683-692. [doi: 10.1038/s41586-020-2817-4]

Starting with the launch of the Human Genome Project three decades ago, and continuing after its completion in 2003, genomics has progressively come to have a central and catalytic role in basic and translational research. In addition, studies increasingly demonstrate how genomic information can be effectively used in clinical care. In the future, the anticipated advances in technology development, biological insights, and clinical applications (among others) will lead to more widespread integration of genomics into almost all areas of biomedical research, the adoption of genomics into mainstream medical and public-health practices, and an increasing relevance of genomics for everyday life. On behalf of the research community, the National Human Genome Research Institute recently completed a multi-year process of strategic engagement to identify future research priorities and opportunities in human genomics, with an emphasis on health applications. Here we describe the highest-priority elements envisioned for the cutting-edge of human genomics going forward—that is, at ‘The Forefront of Genomics’.

What's interesting are the predictions that the NHGRI makes for 2030—predictions that were highlighted in the Scientific American article. I'm going to post those predictions without comment other than saying that I think they are mostly bovine manure. I'm interested in hearing your comments.

Bold predictions for human genomics by 2030

Some of the most impressive genomics achievements, when viewed in retrospect, could hardly have been imagined ten years earlier. Here are ten bold predictions for human genomics that might come true by 2030. Although most are unlikely to be fully attained, achieving one or more of these would require individuals to strive for something that currently seems out of reach. These predictions were crafted to be both inspirational and aspirational in nature, provoking discussions about what might be possible at The Forefront of Genomics in the coming decade.

  1. Generating and analysing a complete human genome sequence will be routine for any research laboratory, becoming as straightforward as carrying out a DNA purification.
  2. The biological function(s) of every human gene will be known; for non-coding elements in the human genome, such knowledge will be the rule rather than the exception.
  3. The general features of the epigenetic landscape and transcriptional output will be routinely incorporated into predictive models of the effect of genotype on phenotype.
  4. Research in human genomics will have moved beyond population descriptors based on historic social constructs such as race.
  5. Studies that involve analyses of genome sequences and associated phenotypic information for millions of human participants will be regularly featured at school science fairs.
  6. The regular use of genomic information will have transitioned from boutique to mainstream in all clinical settings, making genomic testing as routine as complete blood counts.
  7. The clinical relevance of all encountered genomic variants will be readily predictable, rendering the diagnostic designation ‘variant of uncertain significance (VUS)’ obsolete.
  8. An individual’s complete genome sequence along with informative annotations will, if desired, be securely and readily accessible on their smartphone.
  9. Individuals from ancestrally diverse backgrounds will benefit equitably from advances in human genomics.
  10. Breakthrough discoveries will lead to curative therapies involving genomic modifications for dozens of genetic diseases.

I predict that nine years from now (2030) we will still be dealing with scientists who think that most of our genome is functional; that most human protein-coding genes produce many different proteins by alternative splicing; that epigenetics is useful; that there are more noncoding genes than protein-coding genes; that the leading scientists in the 1960 and 70s were incredibly stupid to suggest junk DNA; that almost every transcription factor binding site is biologically relevant; that most transposon-related sequences have a mysterious (still unknown) function; that it's still a mystery why humans are so much more complex than chimps; and that genomics will eventually solve all problems by 2040.

Why in the world, you might ask, would we still be dealing with issues like that? Because of genomics.


Saturday, April 03, 2021

"Dark matter" as an argument against junk DNA

Opponents of junk DNA have been largely unsuccessful in demonstrating that most of our genome is functional. Many of them are vaguely aware of the fact that "no function" (i.e. junk) is the default hypothesis and the onus is on them to come up with evidence of function. In order to shift, or obfuscate, this burden of proof they have increasingly begun to talk about the "dark matter" of the genome. The idea is to pretend that most of the genome is a complete mystery so that you can't say for certain whether it is junk or functional.

One of the more recent attempts appears in the "Journal Club" section of Nature Reviews Genetics. It focuses on repetitive DNA.

Before looking at that article, let's begin by summarizing what we already know about repetitive DNA. It includes highly repetitive DNA consisting of mutliple tandem repeats of short sequences such as ATATATATAT... or CGACGACGACGA ... or even longer repeats. Much of this is located in centromeric regions of the chromosome and I estimate that functional highly repetitve regions make up about 1% of the genome.[see Centromere DNA and Telomeres]

The other part of repetitive DNA is middle repetitive DNA, which is largely composed of transposons and endogenous viruses, although it includes ribosomal RNA genes and origins of replication. Most of these sequences are dispersed as single copies throughout the genome. It's difficult to determine exactly how much of the genome consists of these middle repetitive sequences but it's certainly more than 50%.

Almost all of the transposon- and virus-related sequences are defective copies of once active transposons and viruses. Most of them are just fragments of the originals. They are evolving at the neutral rate so they look like junk and they behave like junk.1 That's not selfish DNA because is doesn't transpose and it's not "dark matter." These fragments have all the characterstics of nonfunctional junk in our genome.

We know that the C-value paradox is mostly explained by differing amounts of repetitive DNA in different genomes and this is consistent with the idea that they are junk. We know that less that 10% of our genome is conserved and this fits in with that conclusion. Finally, we know that genetic load arguments indicate that most our genome must be impervious to mutation. Combined, these are all powerful bits of evidence and logic in favor of repetitive sequences being mostly junk DNA.

Now let's look at what Neil Gemmell says in this article.

Gemmell, N.J. (2021) Repetitive DNA: genomic dark matter matters. Nature Reviews Genetics:1-1. [doi: 10.1038/s41576-021-00354-8]

"Repetitive DNA sequences were found in hundreds of thousands, and sometimes millions, of copies in the genomes of most eukaryotes. while widespread and evolutionarily conserved, the function of these repeats was unknown. Provocatively, Britten and Kohne concluded 'a concept that is repugnant to us is that about half of the DNA of higher organisms is trivial or permanently inert.'”"

That's from Britten and Kohne (1968) and it's true that more than 50 years ago those workers didn't like the idea of junk DNA. Britten argued that most of this repetitive DNA was likely to be involved in regulation. Gemmell goes on to describe centromeres and telomeres and mentions that most repetitive DNA was thought to be junk.

"... the idea that much of the genome is junk, maintained and perpetuated by random chance, seemed as broadly unsatisfactory to me as it had to the original authors. Enthralled by the mystery of why half our genome is repetitive DNA, I have followed this field ever since."

Gemmell is not alone. In spite of all the evidence for junk DNA, the majority of scientists don't like the fact that most of our genome is junk. Here's how he justifies his continued skepticism.

"But it was not until the 2000s, as full eukaryotic genome sequences emerged, that we discovered that the repetitive non-coding regions of our genome harbour large numbers of promoters, enhancers, transcription factor binding sites and regulatory RNAs that control gene expression. More recently, the importance of repetitive DNA in both structural and regulatory processes has emerged, but much remains to be discovered and understood. It is time to shine further light on this genomic dark matter."

This appears to be the ENCODE publicity campaign legacy rearing its ugly head once more. Most Sandwalk readers know that the presence of transcription factor binding sites, RNA polymerase binding sites, and junk RNA is exactly what one would predict from a genome full of defective transposons. Most of us know that a big fat sloppy genome is bound to contain millions of spurious binding sites for transcription factors so this says nothing about function.

Apparently Gemmell's skepticism doesn't apply to the ENCODE results so he still thinks that all those bits and pieces of transposons are mysterious bits of dark matter that could be several billion base pairs of functional DNA. I don't know what he imagines they could be doing.


Photo Credit: The photo shows human chromosomes labelled with a telomere probe (yellow), from Christoher Counter at Duke University.

1. In my book, I cover this in a section called "If it walks like a duck ..." It's a form of abductive reasoning.

Britten, R. and Kohne, D. (1968) Repeated Sequences in DNA. Science 161:529-540. [doi: 10.1126/science.161.3841.529]

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent1 (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

  • 19,107 mRNA genes (188 novel)
  • 18,387 lncRNA genes (13,175 novel)
  • 7,309 asRNA genes (2,519 novel)
  • 5,427 miRNAs
  • 5,427 circRNAs

Tuesday, February 16, 2021

The 20th anniversary of the human genome sequence:
6. Nature doubles down on ENCODE results

Nature has now published a series of articles celebrating the 20th anniversary of the publication of the draft sequences of the human genome [Genome revolution]. Two of the articles are about free access to information and, unlike a similar article in Science, the Nature editors aren't shy about mentioning an important event from 2001; namely, the fact that Science wasn't committed to open access.

By publishing the Human Genome Project’s first paper, we worked with a publicly funded initiative that was committed to data sharing. But the journal acknowledged there would be challenges to maintaining the free, open flow of information, and that the research community might need to make compromises to these principles, for example when the data came from private companies. Indeed, in 2001, colleagues at Science negotiated publishing the draft genome generated by Celera Corporation in Rockville, Maryland. The research paper was immediately free to access, but there were some restrictions on access to the full data.