Sunday, November 26, 2023

ChatGPT gets two-thirds of science textbook questions wrong: time to bring it into the classroom!

The November 16th issue of Nature has an article about ChatGPT: ChatGPT has entered the classroom: how LLMs could transform education. It reports that the latest version (GPT4) can only answer one third of questions correctly in physical chemistry, physics, and calculus. Nevertheless, the article promotes the idea that ChatGPT should be brought into the classroom!

An editorial in the same issue explains Why teachers should explore ChatGPT’s potential — despite the risks.

Many students now use AI chatbots to help with their assignments. Educators need to study how to include these tools in teaching and learning — and minimize pitfalls.

I don't get it. It seems to me that the problems with ChatGPT far outweigh the advantages and the best approach for now is to warn students that using AI tools may be terribly misleading and could lead to them failing a course if they trust the output. That doesn't mean that there's no potential for improvement in the future but this can only happen if the sources of information used by these tools were to become much more reliable. No improvements in the algorithms are going to help with that.


Monday, November 20, 2023

Two Heidelberg graduate students reject junk DNA

Science in School is a magazine for European science teachers. Two graduate students1 have just published an article in the November issue: Not junk after all: the importance of non-coding RNAs.

Note: The article has been edited to remove some of the references to junk DNA and the editor has added the following disclaimer to the end of the article: Editor’s note: Some parts of the introduction and conclusion were rephrased to avoid any misunderstanding concerning the nature of ‘junk DNA’, which is not the focus of this article. Here's a link to the revised article: Not junk after all: the importance of non-coding RNAs. More changes are expected.

Not junk after all: the importance of non-coding RNAs

Originally assumed to be useless ‘junk DNA’, sections of the genome that don’t encode proteins have been revealed as a source of many important non-coding RNA structures.

The central dogma of molecular biology is that DNA is used as a template to create messenger RNA (mRNA), which in turn is translated into proteins that build the tissues in our bodies and carry out the main functions of our cells and organs. In other words, DNA → mRNA → proteins. Interestingly, though, only 2% of the DNA in our whole genome codes for proteins! So, what does the other 98% of the human genome do? In the mid-1900s, it was widely believed that a great part of our genome was useless, repetitive ‘junk DNA’. However, this belief goes against the evolution theory, which suggests that useless sequences would be eliminated from the genome since their maintenance requires energy. In the late 20th century and the early 21st century, this junk DNA has been shown to not only contain important regulatory elements for transcription, but also sequences that encode various non-coding RNAs that have functions in many cellular mechanisms.

I just finshed a podcast interview with Kat Arney and one of the questions she asked was what is the most important thing I'd like scientists to know about this topic. I picked evolution—I'd like modern researchers to understand that there's more to evolution than natural selection. You can see the problem in this example where two students who are working toward a Ph.D. at a top lab in Europe think that junk DNA "goes against the evolution theory."

That's sad. It's also sad that these two students think that 98% of our genome might be devoted to regulation and non-coding genes.

We need to focus on educating the next generation of scientists and that starts with educating science teachers. This is not the way to do it.

Here's the contact information for Science in School. I've written the editor at editor@scienceinschool.org. Please send a message if you are as concerned about the spread of scientific misinformation as I am.


Zuzana Koskova at the European Molecular Biology Laboratory in Heidelberg (Germany) and Miguel Hernandez at the University Hospital, Heidelberg. I tried sending an email message to Zuzana Koskova but got no reply. I was unable to find contact information for Miguel Hernandez.

Sunday, November 12, 2023

ChatGPT is still making up quotations from scientists

I wondered whether ChatGPT had improved in the last six months so I asked it again about junk DNA. The answers reveal that ChatGPT is still lying and spreading false information.

What is junk DNA?

ChatGPT: "Junk DNA" is a term that was historically used to describe portions of the DNA sequence that do not code for proteins or have any known function. These non-coding regions were once thought to be vestiges of evolutionary processes or simply genetic "leftovers" with no apparent purpose.

However, it's important to note that the term "junk DNA" is now considered outdated and somewhat misleading. Advances in genomics and molecular biology have revealed that many non-coding regions of DNA have important functions, even if they do not encode proteins. These regions may contain regulatory elements that control the activity of genes, play a role in the three-dimensional structure of the genome, or have other functional significance.

The Human Genome Project and subsequent genomic research efforts have shown that a significant portion of the genome is involved in the regulation of gene expression, development, and other cellular processes. The term "non-coding DNA" or "non-coding regions" is now preferred over "junk DNA" to emphasize that these sections of the genome have functions that are not immediately apparent and are still an active area of research.

In summary, while some regions of the genome were once thought to be functionless "junk," ongoing scientific research is uncovering the complexity and functional significance of non-coding DNA.

So, the answer is "no," ChatGPT hasn't gotten any better. Even after we worked on a Wikipedia entry for junk DNA, ChatGPT is still responding with the standard non-scientific perspective on junk DNA. It implies that the term junk DNA originally meant all non-coding DNA and that the discovery of function in non-coding regions is due to recent advances in genomics and molecular biology.

I asked for some specific evidence that this was, indeed, the historical definition of junk DNA.

Can you give me some direct quotes from scientists who said that all non-coding DNA was junk?

ChatGPT: The idea that non-coding DNA was "junk DNA" was a historical perspective that gained traction in the early days of molecular biology when researchers primarily focused on protein-coding genes. It's important to note that while the term "junk DNA" was used, scientists were aware that there were likely unknown functions in non-coding regions. Over time, advances in genomics and molecular biology have revealed the complexity and functionality of non-coding DNA.

Here are a couple of historical quotes that reflect the perspective on non-coding DNA from earlier times:

Francis Crick (1970): In a 1970 paper, Francis Crick, one of the co-discoverers of the structure of DNA, stated, "The ultimate aim of the DNA investigators is to identify, with some precision, the genetic material. At present, it looks as if at least 90% of the DNA in any organism is useless."

Susumu Ohno (1972): Geneticist Susumu Ohno, in his book "Evolution by Gene Duplication" (1972), stated, "The failure to recognize this [the abundance of 'junk DNA'] for so long has misled the investigators studying the higher order chromosomal structure... No doubt the junk DNA sequences in the heterochromatin and at the centromeres are examples of 'selfish' DNA."

It's important to understand that these quotes represent a historical perspective, and scientific understanding has evolved significantly since then. The term "junk DNA" has since been reconsidered as researchers have discovered various functions within non-coding regions of the genome. Scientists today emphasize the importance of non-coding DNA in regulatory processes, genome stability, and other essential biological functions.

Neithe of these quotes support the idea that all non-coding DNA was thought to be junk. The direct quote from Crick in 1970 seems suspicious so I asked for clarification.

Can you supply the complete reference to the Francis Crick quotation that you just gave me?

ChatGPT: I apologize for any confusion, but I made an error in my previous response. After a thorough review, I couldn't find a direct quote from Francis Crick expressing the sentiment that "at least 90% of the DNA in any organism is useless." It appears there might have been a misunderstanding or misattribution in the information I provided.

Francis Crick and James Watson, in their groundbreaking work, were primarily focused on understanding the structure of DNA, and while there were discussions about non-coding regions, the notion of "junk DNA" as we understand it today wasn't explicitly articulated by Crick in the manner suggested.

I appreciate your diligence in seeking accurate information, and I apologize for any confusion caused by the error in my response. If you have any further questions or if there's anything else I can assist you with, please let me know.

The only way you would know that ChatGPT was making up direct quotations would be if you asked for confirmation. The average person would just assume that the fake quote was accurate. It would be fun to grade a student essay that used such a quotation!

When I asked these question last March, ChatGPT also made up false quotations that didn't even support it's claim that all non-coding DNA was thought to be junk [ChatGPT lies about junk DNA].

Can you supply the complete reference to the Susumu Ohno quotation that you just gave me?

ChatGPT: Certainly, Susumu Ohno's quote about "junk DNA" is attributed to his book "Evolution by Gene Duplication," published in 1970. I must clarify that the quote I provided earlier is a paraphrased version based on the general sentiment expressed in his work rather than an exact wording. Unfortunately, I couldn't locate the precise quote within the text.

If you're interested in exploring Ohno's thoughts on this matter in more detail, you may want to refer to his book "Evolution by Gene Duplication," which was influential in discussions about the evolutionary significance of gene duplication and, by extension, the role of non-coding DNA.

This is another example of a made-up quotation and it doesn't address the main issue. ChatGPT is unable to provide any evidence that all non-coding DNA was thought to be junk DNA.

The behavior of ChatGPT should be a major concern. If we are going to rely on artificial intelligence in the future then we better make sure that the information it gathers is correct.


Wednesday, November 08, 2023

The Purple Blog

Raphaël Champeimont has a blog called The Purple Blog: Freedom and Technology. His latest post is called The great Pufferfish Genome and it's well worth a read. Here's an excerpt ...

Human: I am the mighty human, pinnacle of the evolution: I have the most advanced and complex genome with 25,000 genes and an impressive 3 billion base pairs in my DNA, you know these letters like A, T, G, C which make my genome. 3 billion of them!

Pufferfish: Come on. Your genome is just full of junk, 90% of it is completely useless! It’s full of dead viruses that infected your ancestors long ago and you never cleaned it up. Look at my genome, I have just as many genes as you, but I don’t need to waste 3 billion base pairs of DNA for that, just 400 million is well enough. Yes, I pack as many genes as you in a genome 10 times smaller! That’s what I call optimization!

I met Raphaël a few months ago at a Café Scientific meeting in Mississauga, Ontario (Canada) and he came to our meeting last night. Turns out, he read my book and that's why he posted an article about genomes.

I recently read a very interesting new book “What's in Your Genome? 90% of Your Genome Is Junk” by Laurence A. Moran, in which he argues that our knowledge of genomics points to the fact that 90% of the human genome is useless junk.

This idea is not new, but it has become unfashionable in the last 20 years, without good evidence, the author argues. Most of our genome is still junk, and a central argument is that many other species don’t need that much DNA, or have much more without any “good” reason like the organism’s complexity.

I've lost count of how many people have read my book. I think this makes six or maybe seven!