Sandwalk: ChatGPT lies about junk DNA

Saturday, March 25, 2023

ChatGPT lies about junk DNA

I asked ChatGPT some questions about junk DNA and it made up a Francis Crick quotation and misrepresented the view of Susumu Ohno.

We have finally restored the Junk DNA article on Wikipedia. (It was deleted about ten years ago when Wikipedians decided that junk DNA doesn't exist.) One of the issues on Wikipedia is how to deal with misconceptions and misunderstandings while staying within the boundaries of Wikipedia culture. Wikipedians have an aversion to anything that looks like editorializing so you can't just say something like, "Nobody ever said that all non-coding DNA was junk." Instead, you have to find a credible reference to someone else who said that.

I've been trying to figure out how far the misunderstandings of junk DNA have spread so I asked ChatGPt (from OpenAI) again.

Q: What is junk DNA?
ChatGPT: Junk DNA is a term used to refer to non-coding DNA sequences in an organism's genome that do not code for proteins and whose function, if any, is not yet fully understood. These sequences were once thought to have no biological function and to be "junk" or remnants of evolutionary processes. However, recent studies have shown that some non-coding DNA sequences play important regulatory roles in gene expression and have other biological functions. The term "junk DNA" is therefore controversial and is no longer used frequently in scientific literature.

This misconception seems to be quite common. Many people think that the term 'junk DNA' was a synonym for 'non-coding DNA' and that the recent (?) discovery of functional non-coding DNA means that there's no such thing as junk DNA.

Q: Which scientists thought that all non-coding DNA was junk?

ChatGPT: The idea that non-coding DNA, especially repetitive DNA sequences, was "junk" and had no functional significance was proposed by molecular evolutionary biologist Susumu Ohno in 1972, in his book "Evolution by Gene Duplication." This idea was further popularized by British geneticist John Mattick in the 1990s. However, as more research has been conducted on the genome, it has become clear that non-coding DNA plays important structural and regulatory roles in gene expression, and the term "junk DNA" is no longer considered an accurate description.

Susumu Ohno never said that all non-coding DNA was junk. He published several papers where he discussed regulatory DNA and whether a large proportion of the genome could be devoted to regulation. At no point did he ever suggest that regulatory DNA doesn't exist therefore all non-coding DNA is junk.

John Mattick is a big fan of regulatory sequences—he thinks that a large percentage of the genome is dedicated to producing non-coding regulatory RNAs. He is partially responsible for spreading the rumor that scientists in the 1970s said that all non-coding DNA was junk but he has never documented that claim because it's false. It's very difficult to show on Wikipedia that Mattick is distorting the truth.

ChatGPT concludes, again, that the discovery of functional non-coding sequences, such as ribosomal RNA genes (1960s), means that the the term "junk DNA" is no longer an accurate description of non-coding DNA. That statement could only be true if someone had claimed that all non-coding DNA is junk.

Q: Can you find a direct quotation by Susumu Ohno saying that all non-coding DNA was junk?

ChatGPT: Yes, in his 1972 book "Evolution By Gene Duplication", Susumu Ohno stated the following: "The earthworm, being a more primitive animal, should have a more primitive genome, just as it has a more primitive anatomy. But we know from our studies on nucleic acids and proteins that this sweeping generalization is not quite correct; some of the most advanced animals seem to have many apparently useless DNA sequences... the earthworm probably has no room for 'junk' DNA in its genome."

This is one of the earliest examples of the hypothesis that non-coding DNA is non-functional or "junk", which was proposed by Ohno in the context of comparing the genome of the earthworm to that of more advanced animals.

I don't have Ohno's book but I don't need it since the quotation does not answer the question.

Q: Are there any other scientists who said that all non-coding DNA was junk?

ChatGPT: Yes, there were other scientists who shared the view that much of non-coding DNA was useless, non-functional and even redundant. One of the prominent scientists who argued for the idea that non-coding DNA was "junk" was Francis Crick, co-discoverer of the structure of DNA. In his 1970 article "Central Dogma of Molecular Biology", he suggested that "the information content of the nucleus has been shown to be almost entirely independent of the cytoplasmic environment, which emphasizes the isolation of the genetic material from the operational proteins of the cell... My own belief is that these sequences are in fact junk, partly to allow mutations to occur without lethal effect, and partly as merely redundant copies of functional sequences elsewhere in the DNA."

In addition to Crick, scientists like Leslie Orgel and Walter Gilbert also proposed the idea that much of non-coding DNA was non-functional in the 1970s and 1980s. However, it is worth noting that as more genomic data has become available, our understanding of the functions and roles of non-coding DNA has evolved and expanded beyond the simplistic "junk" label.

I have Crick's 1970 paper and there is nothing in that paper that's remotely similar to the quotation that ChatGPT attributes to Crick. I was unable to find any source for that "quotation." It's true that there were scientists (Leslie Orgel, Walter Gilbert) who thought that most of the non-coding DNA was non-functional but that's not the question that I asked and it's not the position that ChatGPT is defending.

There are two lessons here: (1) Don't trust ChatGPT, and (2) ChatGPT is probably spewing back the common misunderstanding of junk DNA. If Wikipedia is ever going to be a reliable encyclopedia then one of it's primary goals must be to refute misunderstandings and misconceptions but that doesn't jibe with the Wikipedia culture.

22 comments:

AnonymousSaturday, March 25, 2023 8:30:00 PM
Perhaps it means that I am becoming a curmudgeon a few months short of my half century mark, but I’m not terribly impressed by these programs yet. Like their close cousin, Dr. Google, I find them somewhat sketchy characters who fib frequently.
ReplyDelete
Replies
AnonymousSaturday, March 25, 2023 9:44:00 PM
I always understood junk DNA as being nothing or non-coding. I know there’s more evidence to the contrary. So all you’re disagreeing with is Chat GPT’s definition of junk dna? Is that right? Maybe ask Chat GPT is how it defines “junk DNA” just to clarify definitions?
ReplyDelete
Replies
Steve WatsonSaturday, March 25, 2023 10:57:00 PM
My wife had a conversation with ChatGPT one day, about who runs CFI Canada (a subject with which she is intimately acquainted). It said a lot of silly things, to the point of making up people who don't exist. Since what ChatGPT relies on is pretty much just whatever it finds on the internet, it's basically just a fancy example of the GIGO principle.
ReplyDelete
Replies
Larry MoranSaturday, March 25, 2023 11:15:00 PM
@Anonymous

I asked ChatGPT how it defines junk DNA. Here’s the answer.

“Junk DNA is a term used to describe sections of an organism's DNA that do not code for proteins or functional RNA molecules and were thought to have no biological function. These regions of DNA were once considered to be evolutionary remnants, accumulated over time through random mutations and genetic drift, with no adaptive significance. However, recent research has shown that some of these non-coding regions may have important regulatory functions and play a role in gene expression and genome evolution. Therefore, the term "junk DNA" is no longer used as it implies that these regions have no biological function, which is not accurate.”

The correct answer is that junk DNA is nonfunctional DNA and that functional regulatory sequences have been known for 60 years. No knowledgeable scientist ever said that functional regulatory sequences don’t exist therefore the only functional regions of the genome are those that specify proteins and functional RNAs.

And it is simply not true that the term “junk DNA” is no longer used. Somebody even wrote a book about it.
ReplyDelete
Replies
Larry MoranSaturday, March 25, 2023 11:30:00 PM
@Steve

I could tell you and Seanna who is President of CFI Canada but I suspect you already know the answer. :-)

I couldn’t tell you who “runs” CFI Canada because nobody really knows the answer to that question. :-)
ReplyDelete
Replies
gert korthofSunday, March 26, 2023 5:08:00 AM
Someone said " but I’m not terribly impressed by these programs yet. " (ChatGPT).

I heard of someone who works in the AI field that ChatGPT is not designed to deliver the truth. If that is correct, then one should not expect that it delivers the truth.
Having said that, I think one should really be impressed by the ability of ChatGPT to deliver grammatical correct sentences which look very much like those of an average human being. ChatGPT is considered as a milestone in AI. The thing is: one should not expect truth and judge it as 'someone' who read the whole internet and remembers all of it. That is something no human being is able to do.
ReplyDelete
Replies
gert korthofSunday, March 26, 2023 6:13:00 AM
correction: "The thing is: one should not expect truth BUT judge it as 'someone' who ..."
ReplyDelete
Replies
Larry MoranSunday, March 26, 2023 11:14:00 AM
@gert korthof

So we should be impressed with ChatGPT because, unlike Donald Trump, it uses grammatically correct sentences to lie and spread misinformation?

This is why it's considered a milestone in AI?
ReplyDelete
Replies
AnonymousSunday, March 26, 2023 5:54:00 PM
ChatGPT is basically a search engine, the only difference is, one, it summarizes results in sentences/paragraphs; And, two, the results are inferior. With a search engine you get vastly more information.I asked it if a certain argument for a deity's existence has been refuted (it has) and it replied saying it can't take sides then summarized two objections. I did not ask it to take sides, I thought it would at least say some believe the argument has been refuted and others not then summarize both sides.
-César D
ReplyDelete
Replies
SPARCMonday, March 27, 2023 1:45:00 AM
Do you think that Wikipedia will allow you to cite your upcoming book?
ReplyDelete
Replies
gert korthofMonday, March 27, 2023 4:29:00 AM
Larry said: "it uses grammatically correct sentences to lie and spread misinformation?"
By asking ChatGPT questions, you expect it to 1) understand your question and 2) to answer your question. And because it answers your questions like a human it passed the Turing Test (designed by Alan Turing in 1950). https://en.wikipedia.org/wiki/Turing_test

Furthermore, the very fact that you say "ChatGPT lies about.." in your blog title and "spread misinformation" proves you interact with ChatGPT as if it were a person: only persons can lie. Computersoftware and dogs cannot lie. Only humans can lie. The dictionary definition of lying is "to make a false statement with the intention to deceive". and you ascribe that to a computerprogram! ChatGPT has the intention to spread misinformation?! as if it were a conscious human being.

Larry asked: "This is why it's considered a milestone in AI?"
in a certain way: Yes. It passed the Turing Test, for the the first time after Turing proposed it 73 years ago. You proved it.
ReplyDelete
Replies
Larry MoranMonday, March 27, 2023 10:51:00 AM
@SPARC

No, Wikipedians will not let me cite my own book. That sort of thing fits right into the common narrative that scientists are only interested in promoting their own work on Wikipedia and they are incapable of fairly representing the views of other scientists. They believe that amateur Wikipedians with no expertise in the subject matter can do a far better job of presenting the scientific consensus because they are unbiased.

The ENCODE article is a very good example of a well-written and well-researched article that, nevertheless, has a strong built in bias in favor of the ENCODE conclusion that most of the human genome is functional.
ReplyDelete
Replies
Mark SturtevantMonday, March 27, 2023 11:00:00 AM
I have also experimented with ChatGTP since I have students write a term paper on fruit fly development in our senior capstone class. What I've learned from this, and elsewhere, is that the AI apparently does not know what it is saying. It is mainly spitting out sentences, with a seemingly logical point of view and progression but it cannot fact-check. So it will:
> Generate literature citations when asked, but those citations appear to be fake. They are made up!
> On occasion produce breathtakingly wrong descriptions about fruit fly development (and about other subjects).
> And now I've learned that when asked to give a direct quote from someone, it seems to make up the quote!

Someone who seemed knowledgeable online recently gave the comment that they believe that the AI puts together its answers from a kind of online "word cloud" of terms and phrases that are most often associated with a given query. That would at once explain the quirks about ChatGTP. It does not really have this slanted and wrong view about junk and ncDNA. It’s not lying. It’s just that it most frequently finds word strings that lead it to make those sentences because that viewpoint is frequently encountered online. It did not look up references or quotes, it simply makes 'em up once again, bc of the algorithm it uses to generate answers.
Some have accused Chat GTP of having a "woke" viewpoint or other political viewpoints. But now I suspect that it is similarly innocent of that. There are simply a lot of woke viewpoints out there now, and so that is what it samples from.

ReplyDelete
Replies
SPARCTuesday, March 28, 2023 1:22:00 AM
Mark Sturtevant said
>Generate literature citations when asked, but those citations appear to be fake. They are made up!
> On occasion produce breathtakingly wrong descriptions about fruit fly development (and about other subjects).
> And now I've learned that when asked to give a direct quote from someone, it seems to make up the quote!

How could anybody tell that something like this wasn't the work of a bad student?
ReplyDelete
Replies
apalazzoTuesday, March 28, 2023 9:00:00 AM
The fact that ChatGPT does not answer your questions and cites irrelevant information is just a reflection of what is published - many critics of junk DNA mis-cite and mis-quote the literature. ChatGPT is just echoing that. Garbage in garbage out.
ReplyDelete
Replies
Mark SturtevantTuesday, March 28, 2023 9:28:00 AM
@SPARC: I've never seen a student attempt such fakery. At the least, they would earn a very low grade. But there is another program out there called ChatGTPZero which is designed to spot text likely built from ChatGTP. It apparently uses the same or similar algorithm as the original. I've tried it against text made by ChatGTP, and for me it identified it about 50% of the time.
Ironically, I had to prove I'm not a robot with this comment. :/
ReplyDelete
Replies
Bayesian Bouffant, FCDThursday, March 30, 2023 9:58:00 AM
Check out my new book:
"What's in your AI?
90% of your AI is junk."
ReplyDelete
Replies
judmarcMonday, April 03, 2023 6:22:00 PM
You've described one of the two big problems I know of with the large language model (LLM) variant of AIs that are so much in the news these days. The other (and again, these are just the two I know of as a layperson who tries to stay reasonably informed - I'm sure there are plenty more) is that it reflects majoritarian prejudices against people as well as ideas. Indeed, how could it not, considering the (mis)information it's trained on?
ReplyDelete
Replies
Graham JonesFriday, April 21, 2023 3:31:00 AM
The BBC lies about junk DNA:
The mystery of the human genome's dark matter
https://www.bbc.com/future/article/20230412-the-mystery-of-the-human-genomes-dark-matter
ReplyDelete
Replies
AnonymousSaturday, April 22, 2023 10:25:00 PM
Off topic. But I thought you might like to know that Uncommon Descent has been shut down.
ReplyDelete
Replies
J. OakleyWednesday, April 26, 2023 10:47:00 PM
Kind of late to this conversation, but I put the first question into the Bing Chatbot and got:

Junk DNA is a term used to describe DNA that has no known biological function. It mostly consists of pseudogenes and fragments of transposons and viruses [1]. However, recent research suggests that some of this so-called “junk” DNA may play a critical role in mammalian development [2].

Citations:
1. https://en.wikipedia.org/wiki/Junk_DNA
2. https://news.berkeley.edu/2021/10/18/so-called-junk-dna-plays-critical-role-in-mammalian-development/
ReplyDelete
Replies

Add comment