More Recent Comments

Sunday, September 01, 2024

Scite Assistant (AI) answers the question "How much of the human genome consist of junk DNA?"

Scite Assistant is billed as "your AI research partner" and as "ChatGPT for researchers." It's supposed to draw on peer-reviewed published scientific papers for its information and it will give you an answer with genuine citations.

That sounds like a good idea until you realize that the scientific literature is full of misinformation and conflicting information. What we need is an AI assistant that can help us sort throught the misinformation and give us a genuine well-informed answer on controversial issues.

Let's pick the question of junk DNA as a completley random (!) example of such an issue. The scientific literature is full of false information about the origin of the term "junk DNA" and what it was originally intended to describe. It's also full of false information about recent results and how they pertain to junk DNA.

I suspected that Scite Assistant would just find that false information and spew it back at me as an authorative answer based on the scientific literature but I decided to find out if that were true so I asked it, "How much of the human genome consist of junk DNA?"

See the answer below. No surprises. This version of artificial intelligence is no more capable of finding the correct answer than ChatGPT. [ChatGPT lies about junk DNA] The only difference is that, unlike ChatGPT, it doesn't make up references or quotations. [ChatGPT is still making up quotations from scientists] It's a lot worse than Copilot which relies heavilty on Wikipedia. [Copilot answers the question, "What is junk DNA?"]

The concept of "junk DNA" has evolved significantly since its inception, reflecting ongoing debates in the scientific community regarding the functional significance of non-coding regions of the human genome. Historically, it was posited that approximately 98% of the human genome consists of non-coding DNA, often referred to as "junk DNA" due to its perceived lack of function (Li, 2016). However, the ENCODE project, launched in the early 2000s, claimed that around 80% of the human genome is biochemically functional, a statement that has sparked considerable controversy and critique regarding the criteria used to define "functionality" (Doolittle, 2013; Palazzo & Gregory, 2014). Critics argue that the methods employed by ENCODE may overestimate the functional elements within the genome, leading to confusion about the true nature of non-coding DNA (Graur et al., 2013).

Recent studies indicate that while a significant portion of the human genome is indeed non-coding, this does not imply that it is devoid of function. For instance, it has been shown that non-coding DNA includes regulatory elements that play crucial roles in gene expression and cellular function (Antonellis et al., 2010). The complexity of non-coding sequences is underscored by findings that suggest up to 12% of the human genome may be under evolutionary constraint, indicating potential functional significance (Robyr et al., 2011). Moreover, the distinction between "junk DNA" and "garbage DNA" has been made, where "junk DNA" is seen as neutral in terms of evolutionary fitness, while "garbage DNA" actively detracts from it (Graur et al., 2015).

The ENCODE project's findings have been both supported and challenged. While some researchers assert that a large portion of non-coding DNA is functional, others maintain that many of these regions may not contribute to the organism's fitness and could be classified as "junk" (Fagundes et al., 2022; Sundaram et al., 2014). For example, it has been estimated that only about 20% of the human genome is under selective pressure, suggesting that a substantial amount of non-coding DNA could indeed be non-functional or "junk" (Kellis et al., 2014). Furthermore, ongoing research into transposable elements and their roles in genome evolution suggests that these regions, often considered junk, may have more complex functions than previously understood (Sundaram et al., 2014).

In summary, while the notion of "junk DNA" persists, it is increasingly recognized that non-coding regions of the genome may harbor significant regulatory functions, challenging the simplistic categorization of these sequences as entirely non-functional. The debate continues, with ongoing research aiming to clarify the roles of these enigmatic portions of the genome.

I decided to ask a followup question to see what references Scite Assistant identified in anwering the question, "How do you define junk DNA?"

The term "junk DNA" has historically been used to describe portions of the genome that do not code for proteins and were thought to lack any functional significance. This concept emerged in the early days of genomic research, particularly following the Human Genome Project, which revealed that only about 1-2% of the human genome is composed of protein-coding sequences (Mathews (2017); Barrett, 2009). Consequently, the remaining non-coding DNA was labeled as "junk," implying that it was largely irrelevant to biological processes (Johnson, 2014).

However, the understanding of junk DNA has evolved significantly over the past two decades. Recent research has highlighted that many non-coding regions of DNA, previously dismissed as junk, may actually play critical roles in regulating gene expression and maintaining genomic integrity (Gryder et al., 2021; Fritz et al., 2019). For instance, transposable elements (TEs), once considered mere remnants of evolutionary history, are now recognized as important contributors to the regulatory landscape of the genome, acting as sources of diverse cis-regulatory sequences (Sundaram & Wysocka, 2020). This shift in perspective is largely attributed to findings from large-scale projects like ENCODE, which suggested that a substantial portion of non-coding DNA is biochemically active and may have functional roles (Gryder et al., 2021).

Despite these advancements, the definition of junk DNA remains contentious. Some researchers argue that junk DNA should be defined strictly as sequences that do not contribute to the organism's fitness or functionality at any level (Graur et al., 2015; Doolittle & Brunet, 2017). Others propose a more nuanced view, suggesting that while some non-coding DNA may not have a direct functional role, it could still influence evolutionary processes or serve as a reservoir for genetic variation (Fagundes et al., 2022; Havstad & Palazzo, 2022). This perspective aligns with the notion that junk DNA may not be entirely devoid of significance; rather, it could play indirect roles in shaping the genome's evolutionary trajectory (Díaz-Castillo, 2017).

In summary, while the traditional view of junk DNA as non-functional remains prevalent, ongoing research continues to uncover the complexities and potential roles of these sequences. The evolving definitions reflect a growing recognition that what was once considered "junk" may indeed have important biological implications, necessitating a reevaluation of how we classify and understand these genomic regions.

Two of those citations are interesting. The Mathews (20017) refernce is a review of Nessa Carey's book Junk DNA where her definition of junk DNA is quoted. This is not a peer-reviewed paper from the scientific literature.

The Havstad & Palazzo (2022) paper is cited as an example of a possible role for non-coding DNA thus casting doubt on whether it is junk. I think this is a correct interpretation of that paper but my colleague, Alex Palazzo, claims that it was not their intent to cast doubt on the concept of junk DNA.

I should note that these answers are not reproducible. If you ask the same question again you get slightly different answers, some of which are closer to the truth while others are even more wrong.


16 comments :

apalazzo said...

"The Havstad & Palazzo (2022) paper is cited as an example of a possible role for non-coding DNA thus casting doubt on whether it is junk. I think this is a correct interpretation of that paper but my colleague, Alex Palazzo, claims that it was not their intent to cast doubt on the concept of junk DNA."

All we say in the paper is that non-functional DNA (i.e. junk DNA) can still be consequential. If you live in a house full of junk, you may need a housemaid. It's not that the junk itself has a function, but if it is there in sufficient quantity it will have effects.

Our cells are full of processes whose function is to deal with, and manage, junk DNA and junk RNA. In fact, the existence of these processes is a strong argument (but not the only one) in favour of junk DNA/RNA.

Anonymous said...

My experience with it - less systematic than this admittedly - is that AI has a difficult time parsing literature where there is no clearly establish core paradigm. Questions about the key regulators of specialized metabolism yielded an essentially random set of candidate genes with no prioritization on any possible indicator of credibility. That is obviously a tough thing to ask a computer to do but I would have thought that the number of citations might be one way it could proceed. No such luck.

lantog said...

I just finished WIYG. Excellent, excellent book and unique among popular science books in many ways. I wasn't going to read it because I felt I knew all the evidence but after the Luskin/Cardinale debate I realized I didnt know much of the history and the specific papers on which the evidence is based.
I was going to ask how its selling but I guess this post answers that. It should be widely read. You need a new publicist

Larry Moran said...

@lantog Thank-you for the kind words about my book.

I'm disappointed that my book has not been reviewed in Nature or Science. I don't know for sure whether that's because it is not seen as a "popular" science book or whether my publisher is partly to blame for the lack of publicity.

Larry Moran said...

Alex, I have read your paper very carefully. The main point you are making is that junk DNA may not be functional in the selected effect sense but it is still "significant." You argue that eliminating large amounts of junk DNA could negatively affect the fitness of the organism (p. 29). You also favorably discuss many bulk DNA arguments that attribute function to the amount of junk DNA rather than its sequence. It's very easy for non-experts to skip over the footnotes and the nuances and see your paper as a defense of an important role for junk DNA.

For the benefit of other readers, here's a reference to the paper and its abstract.

Havstad, J.C. and Palazzo, A.F. (2022) Not functional yet a difference maker: junk DNA as a case study. Biology & Philosophy 37:29. doi:
doi: 10.1007/s10539-022-09854-1

"It is often thought that non-junk or coding DNA is more significant than other cellular elements, including so-called junk DNA. This is for two main reasons: (1) because coding DNA is often targeted by historical or current selection, it is considered functionally special and (2) because its mode of action is uniquely specific amongst the other actual difference makers in the cell, it is considered causally special. Here, we challenge both these presumptions. With respect to function, we argue that there is previously unappreciated reason to think that junk DNA is significant, since it can alter the cellular environment, and those alterations can influence how organism-level selection operates. With respect to causality, we argue that there is again reason to think that junk DNA is significant, since it too (like coding DNA) is remarkably causally specific (in Waters’, in J Philos 104:551–579, 2007 sense). As a result, something is missing from the received view of significance in molecular biology—a view which emphasizes specificity and neglects something we term ‘reach’. With the special case of junk DNA in mind, we explore how to model and understand the causal specificity, reach, and corresponding efficacy of difference makers in biology. The account contains implications for how evolution shapes the genome, as well as advances our understanding of multi-level selection."

apalazzo said...

"You argue that eliminating large amounts of junk DNA could negatively affect the fitness of the organism (p. 29)."

True, but that is not the same as functional. If you live in a house full of junk, and employ someone to clean your house but who did so in a "brainless" way, and if you eliminated all the junk, then the cleaner will start throwing away all your useful stuff. I guess you can say that the junk "serves" a purpose, in that it occupies the time of the cleaner, but that's not really what it does. The same argument for cell size. All this junk is there not because of selection, but other evolutionary forces (TE insertion).

Larry Moran said...

Alex, I understand YOUR perspective and I recognize the parts of the paper that you wrote. You are making the point that the presence of junk DNA has CONSEQUENCES.

But the main thrust of the paper is about the meaning of "function" and the dominant impression that any neutral reader will get is that junk DNA has some significance that makes philosophers uneasy about dismissing it as completely useless. Have you read the Waters paper that your co-author refers to nine times? She claims that junk DNA has "causal reach" and, although that might be a low-efficiency cause, it is still significant.

IMHO, saying that something has "causal significance" is quite a bit different than saying that its presence has consequences. That's because "cause" is a loaded word that sounds like function.

apalazzo said...

"That's because "cause" is a loaded word that sounds like function."

You and I both know that a causal relation is not he same as functional. This has been the source of so much debate, and as I recall you stated that the debate is over. ;)

Junk DNA causes stuff like the production of garbage RNA and larger cell sizes, but it does not have a function (in the origin or maintenance sense). It is not being conserved. Our paper makes this absolutely clear. The only way that changes in junk DNA would impact fitness, is if an large swath of it were eliminated (or added) in one fell swoop. Where the tipping point is, I'm not sure ... 20%, 50% change ... but the sudden reduction (or increase) in cell size alone would likely have deleterious effects.

The way I see it is that selection has little to no impact on cell size in most vertebrates. Cell size is largely dictated by other evolutionary forces that are much stronger than selection (TE insertion, biased mutation). These other forces dictate cell size. Our cellular machinery is under selection to operate within the parameters of the size of our cells. In other words, cell size is an environmental factor to which cellular machinery is adapted to, just like polar bear's metabolism is adapted to polar environments. The environment has an impact on polar bears, cell size has an impact on cellular machinery. I think that the impact is very weak - variances in genome size in the human population do not make a difference (i.e. these differences are neutral).

In birds, where cell size likely negatively impacts metabolism, there is some data suggesting that very large genomic deletions have been under selection (https://www.pnas.org/doi/abs/10.1073/pnas.1616702114) - this gives a hint that even when cell size is under selection, it must be changed by quite a bit for it to be subject to selection.

Similar arguments can be made about junk RNA.

Larry Moran said...

Alex, you and I know that the maintenance function (purifying selection) is the only valid definition of function in the junk DNA debate. Many philosophers don't agree with us, including, perhaps, your co-author. That's why the paper is confusing and why some people read it as as an attack on the idea of junk (non-functional) DNA.

Kye Goodwin said...

I've always been sceptical of non-functional DNA for reasons related to natural selection. DNA is expensive to make. If it's not doing anything, an individual organism that doesn't make it is useless quantities gains a selective advantage. We just don't know what all of the DNA is doing YET.

John Harshman said...

Kye: you need to read Larry's book. It's exactly the absence of selection that's one of the strongest arguments for junk DNA. You also overestimate the metabolic cost of making extra DNA, much less the incremental cost or benefit of any single indel, which is all selection could see.

Larry Moran said...

@Kye Goodwin The nearly-neutral theory was published more than 50 years ago. It explains why losing or gaining some junk DNA offers no selective advantage in small populations. This was one of the key steps in developing the concept of junk DNA.

You can read about the concept in an old post on Learning about modern evolutionary theory: the drift-barrier hypothesis. I also cover it in my book.

Kye Goodwin said...

Larry, I'm surprised and pleased that I can so easily connect with many who share my interest. I have taken up biochemistry recently because I've become aware that some scientists are making progress on the Origin question and I can only follow it if I work hard. I found the Sandwalk Blog mentioned in a paper by Markus Ralser. Another Moran, Joseph Moran who works in France, is making a lot of progress catalyzing metabolism without protein enzymes. He hasn't got to nucleic acids yet, but has had some success with the nucleobases. I'm learning from the bottom up and don't have near the knowledge to keep up. I may be back when I've learned more. In the meantime I won't be volunteering to have my junk DNA removed as an experiment.

Kye Goodwin said...

Hi again Larry. Well I've had a look around the blogspot and not found much interest in the wide swath currently being cut in the Origins field by the Metabolism First contingent with the Serpentining Systems idea (I've been calling it the Alkaline Vent Theory but Martina Preiner says we shouldn't.) I did see Nick Lane's latest book mentioned and your Kudos for Bill Martin who has become a science hero to me too. The junk DNA debate is mostly about Eucaryotes and they showed up about two billion years late for the Origin. I'll keep checking to see if there's growing interest here in the topic, one of the few aspects of Biochemistry that I'm starting to know something about.

Larry Moran said...

@Kye Goodwin

Where did the glucose come from?

Changing Ideas About The Origin Of Life

Was the Origin of Life a Lucky Accident?

Metabolism First and the Origin of Life

More primordial soup nonsense



Kye Goodwin said...

Thanks Larry. I guess my look around here was pretty superficial. I'll follow your leads and comment in those threads when I've got something to say.