The November 16th issue of Nature has an article about ChatGPT: ChatGPT has entered the classroom: how LLMs could transform education. It reports that the latest version (GPT4) can only answer one third of questions correctly in physical chemistry, physics, and calculus. Nevertheless, the article promotes the idea that ChatGPT should be brought into the classroom!
Many students now use AI chatbots to help with their assignments. Educators need to study how to include these tools in teaching and learning — and minimize pitfalls.
I don't get it. It seems to me that the problems with ChatGPT far outweigh the advantages and the best approach for now is to warn students that using AI tools may be terribly misleading and could lead to them failing a course if they trust the output. That doesn't mean that there's no potential for improvement in the future but this can only happen if the sources of information used by these tools were to become much more reliable. No improvements in the algorithms are going to help with that.
Note: The article has been edited to remove some of the references to junk DNA and the editor has added the following disclaimer to the end of the article: Editor’s note: Some parts of the introduction and conclusion
were rephrased to avoid any misunderstanding concerning the nature of ‘junk DNA’, which is not the focus of this article. Here's a link to the revised article: Not junk after all: the importance of non-coding RNAs. More changes are expected.
Not junk after all:
the importance of non-coding RNAs
Originally assumed to be useless ‘junk DNA’, sections of the genome that don’t encode
proteins have been revealed as a source of many important non-coding RNA structures.
The central dogma of molecular biology is that DNA is used as a template to create messenger RNA (mRNA), which in turn is translated into proteins that build the tissues in our bodies and carry out the main functions of our cells and organs. In other words, DNA → mRNA → proteins. Interestingly, though, only 2% of the DNA in our whole genome codes for proteins! So, what does the other 98% of the human genome do? In the mid-1900s, it was widely believed that a great part of our genome was useless, repetitive ‘junk DNA’. However, this belief goes against the evolution theory, which suggests that useless sequences would be eliminated from the genome since their maintenance requires energy. In the late 20th century and the early 21st century, this junk DNA has been shown to not only contain important regulatory elements for transcription, but also sequences that encode various non-coding RNAs that have functions in many cellular mechanisms.
I just finshed a podcast interview with Kat Arney and one of the questions she asked was what is the most important thing I'd like scientists to know about this topic. I picked evolution—I'd like modern researchers to understand that there's more to evolution than natural selection. You can see the problem in this example where two students who are working toward a Ph.D. at a top lab in Europe think that junk DNA "goes against the evolution theory."
That's sad. It's also sad that these two students think that 98% of our genome might be devoted to regulation and non-coding genes.
We need to focus on educating the next generation of scientists and that starts with educating science teachers. This is not the way to do it.
Here's the contact information for Science in School. I've written the editor at editor@scienceinschool.org. Please send a message if you are as concerned about the spread of scientific misinformation as I am.
Zuzana Koskova at the European Molecular Biology Laboratory in Heidelberg (Germany) and Miguel Hernandez at the University Hospital, Heidelberg. I tried sending an email message to Zuzana Koskova but got no reply. I was unable to find contact information for Miguel Hernandez.
I wondered whether ChatGPT had improved in the last six months so I asked it again about junk DNA. The answers reveal that ChatGPT is still lying and spreading false information.
What is junk DNA?
ChatGPT: "Junk DNA" is a term that was historically used to describe portions of the DNA sequence that do not code for proteins or have any known function. These non-coding regions were once thought to be vestiges of evolutionary processes or simply genetic "leftovers" with no apparent purpose.
However, it's important to note that the term "junk DNA" is now considered outdated and somewhat misleading. Advances in genomics and molecular biology have revealed that many non-coding regions of DNA have important functions, even if they do not encode proteins. These regions may contain regulatory elements that control the activity of genes, play a role in the three-dimensional structure of the genome, or have other functional significance.
The Human Genome Project and subsequent genomic research efforts have shown that a significant portion of the genome is involved in the regulation of gene expression, development, and other cellular processes. The term "non-coding DNA" or "non-coding regions" is now preferred over "junk DNA" to emphasize that these sections of the genome have functions that are not immediately apparent and are still an active area of research.
In summary, while some regions of the genome were once thought to be functionless "junk," ongoing scientific research is uncovering the complexity and functional significance of non-coding DNA.
So, the answer is "no," ChatGPT hasn't gotten any better. Even after we worked on a Wikipedia entry for junk DNA, ChatGPT is still responding with the standard non-scientific perspective on junk DNA. It implies that the term junk DNA originally meant all non-coding DNA and that the discovery of function in non-coding regions is due to recent advances in genomics and molecular biology.
I asked for some specific evidence that this was, indeed, the historical definition of junk DNA.
Can you give me some direct quotes from scientists who said that all non-coding DNA was junk?
ChatGPT: The idea that non-coding DNA was "junk DNA" was a historical perspective that gained traction in the early days of molecular biology when researchers primarily focused on protein-coding genes. It's important to note that while the term "junk DNA" was used, scientists were aware that there were likely unknown functions in non-coding regions. Over time, advances in genomics and molecular biology have revealed the complexity and functionality of non-coding DNA.
Here are a couple of historical quotes that reflect the perspective on non-coding DNA from earlier times:
Francis Crick (1970): In a 1970 paper, Francis Crick, one of the co-discoverers of the structure of DNA, stated, "The ultimate aim of the DNA investigators is to identify, with some precision, the genetic material. At present, it looks as if at least 90% of the DNA in any organism is useless."
Susumu Ohno (1972): Geneticist Susumu Ohno, in his book "Evolution by Gene Duplication" (1972), stated, "The failure to recognize this [the abundance of 'junk DNA'] for so long has misled the investigators studying the higher order chromosomal structure... No doubt the junk DNA sequences in the heterochromatin and at the centromeres are examples of 'selfish' DNA."
It's important to understand that these quotes represent a historical perspective, and scientific understanding has evolved significantly since then. The term "junk DNA" has since been reconsidered as researchers have discovered various functions within non-coding regions of the genome. Scientists today emphasize the importance of non-coding DNA in regulatory processes, genome stability, and other essential biological functions.
Neithe of these quotes support the idea that all non-coding DNA was thought to be junk. The direct quote from Crick in 1970 seems suspicious so I asked for clarification.
Can you supply the complete reference to the Francis Crick quotation that you just gave me?
ChatGPT: I apologize for any confusion, but I made an error in my previous response. After a thorough review, I couldn't find a direct quote from Francis Crick expressing the sentiment that "at least 90% of the DNA in any organism is useless." It appears there might have been a misunderstanding or misattribution in the information I provided.
Francis Crick and James Watson, in their groundbreaking work, were primarily focused on understanding the structure of DNA, and while there were discussions about non-coding regions, the notion of "junk DNA" as we understand it today wasn't explicitly articulated by Crick in the manner suggested.
I appreciate your diligence in seeking accurate information, and I apologize for any confusion caused by the error in my response. If you have any further questions or if there's anything else I can assist you with, please let me know.
The only way you would know that ChatGPT was making up direct quotations would be if you asked for confirmation. The average person would just assume that the fake quote was accurate. It would be fun to grade a student essay that used such a quotation!
When I asked these question last March, ChatGPT also made up false quotations that didn't even support it's claim that all non-coding DNA was thought to be junk [ChatGPT lies about junk DNA].
Can you supply the complete reference to the Susumu Ohno quotation that you just gave me?
ChatGPT: Certainly, Susumu Ohno's quote about "junk DNA" is attributed to his book "Evolution by Gene Duplication," published in 1970. I must clarify that the quote I provided earlier is a paraphrased version based on the general sentiment expressed in his work rather than an exact wording. Unfortunately, I couldn't locate the precise quote within the text.
If you're interested in exploring Ohno's thoughts on this matter in more detail, you may want to refer to his book "Evolution by Gene Duplication," which was influential in discussions about the evolutionary significance of gene duplication and, by extension, the role of non-coding DNA.
This is another example of a made-up quotation and it doesn't address the main issue. ChatGPT is unable to provide any evidence that all non-coding DNA was thought to be junk DNA.
The behavior of ChatGPT should be a major concern. If we are going to rely on artificial intelligence in the future then we better make sure that the information it gathers is correct.
Raphaël Champeimont has a blog called The Purple Blog: Freedom and Technology. His latest post is called The great Pufferfish Genome and it's well worth a read. Here's an excerpt ...
Human: I am the mighty human, pinnacle of the evolution: I have the most advanced and complex genome with 25,000 genes and an impressive 3 billion base pairs in my DNA, you know these letters like A, T, G, C which make my genome. 3 billion of them!
Pufferfish: Come on. Your genome is just full of junk, 90% of it is completely useless! It’s full of dead viruses that infected your ancestors long ago and you never cleaned it up. Look at my genome, I have just as many genes as you, but I don’t need to waste 3 billion base pairs of DNA for that, just 400 million is well enough. Yes, I pack as many genes as you in a genome 10 times smaller! That’s what I call optimization!
I met Raphaël a few months ago at a Café Scientific meeting in Mississauga, Ontario (Canada) and he came to our meeting last night. Turns out, he read my book and that's why he posted an article about genomes.
I recently read a very interesting new book “What's in Your Genome? 90% of Your Genome Is Junk” by Laurence A. Moran, in which he argues that our knowledge of genomics points to the fact that 90% of the human genome is useless junk.
This idea is not new, but it has become unfashionable in the last 20 years, without good evidence, the author argues. Most of our genome is still junk, and a central argument is that many other species don’t need that much DNA, or have much more without any “good” reason like the organism’s complexity.
I've lost count of how many people have read my book. I think this makes six or maybe seven!
A junk-filled genome
Review of: What’s In Your Genome? 90% of Your Genome Is Junk, by Laurence A. Moran, 2023. University of Toronto Press. 392 pp. ISBN: 9781487508593
Norman A Johnson
Evolution, qpad176, [10.1093/evolut/qpad176]
"The subject matter in this book is not easy. Molecular biologists might well be challenged by the population genetics theory, while the biochemistry details may vex evolutionary biologists. Moran does an excellent job at presenting both of these aspects. I am also glad that he provided a historical perspective, showing that many of the current debates have a long history.
In the Preface, Moran states that he was motivated to write this book in part due to what he views as failures in science communication regarding the nature of the genome. He reminds us about the importance of accuracy in science communication: “No matter how good your style, if the substance of what you are communicating is flawed, then you are not a good science communicator” (xiii). Narratives are useful in communicating science, but when they (or the hype) get in the way of telling the truth, the science, and the science communication suffer."
I know Stephen Meyer and I have discussed his views on creationism many times. Some of the issues he raises are quite interesting and they aren't easy to refute. In this video from 2020, he presents two standard creationist objections to evolution: the Cambrian explosion, and the probability of evolving a gene.1
The Zoonomia project aligned the genome sequences of 240 mammalian species and determined that only 10.7% of the human genome is conserved. This is consistent with the idea that about 90% of our genome is junk.
The April 28, 2023 issue of science contains eleven papers reporting the results of a massive study comparing the genomes of 240 mammalian species. The issue also contains a couple of "Perspectives" that comment on the work.
There are a million potential transcription regulatory sites in the human genome. How many of these function as true regulatory sites?
One of the important questions about the human genome concerns how gene expression is regulated. The main controversy is over the number of functional regulatory sites and how that relates to abundant junk DNA. Here's how one group addresses the problem by looking at the conservation of regulatory sites in mammals. Sequence conservation is best genomics proxy for identifying functional sites.
Andrews, G., Fan, K., Pratt, H.E., Phalke, N., Zoonomia Consortium, Karlsson, E.K., Lindblad-Toh, K., Gazal, S., Moore, J.E. and Weng, Z. (2023) Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science 380:eabn7930. [doi: 10.1126/science.abn7930]
Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element–derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.
The authors introduce the issue by pointing out two different views of functional regulatory sites. First, there's the ENCODE view, which maps the binding sites of 1600 transcription factors and the associated methylation and histone modification patterns. This analysis creates a database of almost one million candidate cis-regulatory elements (cCREs). Second, there's the evolutionary perspective, which looks at conservation of regulatory sites as the prime indicator of function. Only a fraction of candidate sites are conserved. Does this mean that most of the cCREs are not functional?
Andrews et al. set out to identify all of the cCRE's and transcription factor binding sites (TFBSs) that show evidence of conservation using an alignment of 241 mammalian genomes from the Zoonomia database and a program called phyloP.
They began with more than 920,000 human cCREs from the ENCODE Consortium results. Their results indicate that 47.5% of all CREs are highly conserved because they align to almost all of the 240 non-human mammalian genomes. (I have no idea how the phyloP program calculates "conservation.") That means approximately 439,000 sites that are likely to be genuine regulatory sequences covering 4% of the human genome. If there are 25,000 genes then this means that each gene requires about 17 regulatory sequences.
The next step was to examine 15.6 million TFBSs with a median length of 10 bp covering 5.7% of the human genome. They classified 32.5% of these sequences as highly conserved using the mysterious phyloP program. That means about 5.1 million functional transcription factor binding sites, but later on they reduce this to 2 million covering 0.8% of the genome. This is equivalent to an average of 80 per gene.
I don't believe that the authors have identified functional sites. There is no critical analysis of the results or the methodology and no attempt to rationalize the extraordinary claim that every gene requires so many regulatory sites. About 10,000 genes are regular housekeeping genes, such as those encoding the standard metabolic enzymes, and it's difficult to imagine that those genes require such complex regulation.
Most of the genes in eukaryotes contain introns that are removed by splicing during processing of the primary transcript. In some cases the gene produces two different functional RNAs due to differential splicing of the introns. If the product is mRNA then two different versions of the protein can be made as shown in the figure from my book What's in Your Genome? This mechanism is known as alternative splicing.
True alternative splicing is rare—less than 5% of all genes are alternatively spliced.1 However, when you analyze all of the transcripts in a tissue you will invariably detect many transcripts from junk DNA and many low abundance splice variants. Those transcripts and splice variants are due to transcription errors and splicing errors. Splicing errors arise from the presence of weak splice sites that are occasionally recognized by the normal spliceosome or by the splice factors responsible for true alternative splicing.
The latest summary of the number of genes in the human genome gets the number of protein-coding genes correct but their estimate of the number of known non-coding genes is far too high.
In order to have a meaningful discussion about molecular genes, we have to agree on the definition of a molecular gene. I support the following definition (see What Is a Gene?).
Jonathan McLatchie is an Intelligent Design Creationist who now blogs frequently on the Discovery website Evolution News [sic]. His latest post is How NOT to Argue Against Irreducible Complexity where he defends the claim that the human male reproductive apparatus is irreducibly complex and therefore must be a product of intelligent design.
You can read the post yourself to see how ID proponents argue. I want to make another point.
McLatchie thinks that the irreducible complexity argument is very powerful evidence for intelligent design. He writes,
The argument from irreducible complexity against evolution and for design has always held strong intuitive appeal for me, and it has hence become my argument of choice in discussions about the scientific merits of evolution versus design.
Let's look at the logic of the argument from irreducible complexity. Assume that we have identified a structure that's irreducibly complex. There are three possible ways to deal with its origin.
There is a plausible naturalistic explanation for the evolution of the irreducibly complex structure.
There is currently no detailed naturalistic explanation that accounts for the evolution of the irreducibly complex structure.
It is impossible for there to ever be a naturalistic explanation; therefore, god did it.
We know that there are good naturalistic explanations for the evolution of irreducibly complex structures. In fact, McLatchie mentions some of them that refuted his earlier claims. Behe has also backed off some of his claims in light of evidence that irreducibly complex structures can evolve without the help of god(s). This establishes that the mere existence of an irreducibly complex structure is not evidence for intelligent design.
Here's how Behe explains it on page 40 of Darwin's Black Box.
Even if a system is irreducibly complex (and thus cannot have been produced directly), however, one cannot definitively rule out the possibility of an indirect, circuitous route.
In some cases there is currently no good naturalistic explanation for the evolution of an irreducibly complex structure. This could be due to a real difficulty in coming up with a plausible scenario or it could be due to the fact that no scientist has bothered to do the investigation required because they don't care. In either case, the current lack of an explanation is not, in itself, evidence for an intelligent designer.
The third possibility is the one that counts. If you can prove that a naturalistic explanation is impossible then there must be a non-naturalistic explanation such as aliens, or god(s). McLatchie says the the sperm flagellum is irreducibly complex and that he cannot imagine how it could have evolved naturally. According to creationist logic, it follows that some alien, or some god, must have designed the original sperm flagellum.
McLatchie won't tell us when this happened or why the intelligent designer was so interested in sperm, but that's typical of Intelligent Design Creationsts—they require detailed explanations from scientists but not from their fellow creationists.
Evelyn Fox Keller died a few days ago (Sept. 22, 2023). She was a professor of History and Philosopher of Science at the Massachusetts Institute of Technology (Boston, MA, USA). Most of the obituaries praise her for her promotion of women scientists and her critiques of science as a male-dominated discipline. More recently, she turned her attention to molecular biology and genomics and many philosophers (and others) seem to think that she made notable contributions in that area as well.
In his new book "What’s in Your Genome?", Laurence A. Moran shows that all the research done in the decades since that letter was written has confirmed what Crick, Jukes, and other specialists believed all along: much of our DNA is indeed a kind of “junk.”
This is a video of a debate that took place in Kraków, Poland on June 2, 2023. The topic was "Intelligent design in nature—illusion or reality?" (Spoiler alert! - the answer is "illusion.") The participants were Michael Behe and Richard v. Sternberg for the creationists and Michael Ruse and Malgorzata Moczydlowska-Vidal for the science/philosophy side. The video is almost three hours long and I don't recommend watching the whole thing.
Ruse, as usual, is incoherant and more focused on religion and telling Christians how they should behave. The Polish paleontologist didn't do a very good job of addressing the claims of the creationists.1 Michael Behe gave his standard pitch about irreducible complexity and the bacterial flagellum.
The interesting part was Sternberg's defense of intelligent design. I hadn't seen him before although I've been familiar with his writings over the past twenty years. His opening presentation begins at 17:50 and it's worth watching to see how important the junk DNA debate is to the ID crowd.
Sternberg begins by noting that he was skeptical of the arguments put forward by Richard Dawkins in "The Selfish Gene" where Dawkins says that 98% of our DNA is noncoding junk. (Dawkins never said any such thing!) Sternberg says that when he started looking for function in this part of the genome he found that it was replete with function. Then he brings up the ENCODE results and claims that they challenged the concept of a gene (not true). Sternberg says that the new definition of a gene is that it is polyfunctional and "constantly changing in real time." He says,
... how can you have a theory based on an entity that you cannot define and how can you discuss the evolution of something that is kind of this amorphous notion ...
Sternberg seems to think that redefining the gene shows that evolutionary biology is out of touch with reality. He claims that the discovery of the epigenome is futher evidence that there are multiple layers of information that take us far beyond the theory of neo-Darwinism that was crafted in the nineteen teens and the 1920s.
Sternberg reflects the views of many Intelligent Design Creationists who tout the "debunking" of junk DNA as one of their greatest intellectual achievements because they predicted all along that there couldn't be large amounts of junk DNA in our genome because that's incompatible with intelligent design. What's different in the case of Richard Sternberg is that the discovery of function in most of our genome is what led him to the position that design is the best explanation.
I find it strange that Intelligent Design Creationists are relying so heavily on the so-called debunking of junk DNA, especially since in Sternberg's case he is well aware of the fact that some prominent scientists have criticized ENCODE. It's a risky strategy to put so much emphasis on a result that may turn out to be wrong. If our genome is mostly junk DNA (it is!) then the major part of their argument for design falls apart.
From reading the ID literature, it seems that they are supremely confident that most of our genome will turn out to be full of function. It will be interesting to see how they respond when the scientific community concludes that 90% of our genome is junk. From my perspective, they are digging themselves into a deep hole that will be very difficult to climb out of. Maybe it's time to stop digging?
Sternberg made one quip that's worth highlighting. At about 1:46:20 he talks about a saying that he learned in the air force; you don't receive flak unless you're over a significant target. That's cute. He uses it to explain why intelligent design is coming under such heavy attack. He is, of course, correct. When you drop bombs on people you can expect them to get upset. When you attack some of the most important concepts in science you can expect some pushback. That doesn't mean your bombing is justified. If it were justified then scientists would embrace your criticisms instead of shooting them down.
Sternberg scores big at 2:51:11 when he asks, "Can there be Darwinian evolution ... or any evolution in general, without natural selection?" The correct answer is yes. Malgorzata Moczydlowska-Vidal says no and so does Michael Ruse. Ruse then goes on to explain why he dismisses random genetic drift. Sternberg then explains neutral evolution and Michael Lynch's drift-barrier hypothesis and why some biologists use them to explain some of the ID challenges. Sternberg (and Behe) appear to know more about evolution than their opponents.
1. She concentrated on presenting evidence for the history of life but both Behe and Sternberg accept common descent and the correct age of the Earth.
This is a very slick video from the Discovery Institute. It shows you what we are up against. Anyone who thinks they can easily refute the claims in this video hasn't tried.
Intelligent Design Creationists know exactly what they are doing and they are very good at it. There are so many thing wrong with this video that it would take a book to correct them all and, furthermore, you would have to convince people that their entire worldview has to change in order to really understand biology. I bet there are many scientists who couldn't deal with a video like this and that's a problem.
Real biology is messy and sloppy. Things do not look as neatly designed as Richard Dawkins and the creationists would have you believe. I've tried to present the case for a sloppy worldview in my latest book.