Sandwalk: Is AI really "intelligent"? Here are 13 biology questions to test the latest AI algorithms.

Thursday, May 22, 2025

Is AI really "intelligent"? Here are 13 biology questions to test the latest AI algorithms.

Last night I attended a talk by Chris DiCarlo who warned us about the dangers of AI. I'm sure he's right to be worried but I'm skeptical about some of the hype surrounding AI. For example, Chris said that just a few years ago the best AI algorithms were performing at high school level but now they are at Ph.D. level. The implication is that it won't be long before AI is smarter than humans.

Here's the problem. I can only access the cheap versions of AI such as ChatGPT and Scite Assistant but I can also see the results of Google's Generative AI whenever I do a Google search. Chris has access to more sophisticated versions so that's what he might be referring to when he says they operate at the Ph.D. level of intelligence.

The ones I see are not very intelligent. They are mostly sophisticated aggregators that comb the internet for information published by humans and then they summarize that information with an emphasis on the frequency that a given human opinion appears in its databases. Since there's a lot of misinformation out there, this often means that the AI algorithm just propagates that misinformation, especially if it's a very popular misconception (see question #8, below).

The greatest danger of AI right now is not that it is smarter than humans—it's that it's as stupid as the humans that it relies on to give it the correct answers.

There have been some attempts to improve the current crop of AI algorithms by having them put more emphasis on reliable data. For example, some of them seem to weigh Wikipedia opinions more than typical press releases and some of them mistakenly believe that opinions published in the scientific literature are the gold standard of accuracy. This attempt is laudable but it hardly counts as true artificial intelligence since the results are being guided by humans who can independently evaluate the accuracy of the information that the program is aggregating.

I've seen articles that extol the virtues of AI by claiming that an AI algorithm can now pass all undergraduate exams in the sciences and that AI-generated essays are getting high grades in university courses. To me this says more about the quality of essays and exams than it does about the power of AI. I've yet to see an AI algorithm that could have passed an exam in one of my undergraduate biochemistry or molecular biology courses. And the idea that an AI program could have written a decent essay in one of my courses is ludicrous because all of my essay topics were based on controversial and contentious issues in the field and I've never seen an AI app that can deal effectively with controversy.

I've published lots of examples where the common AI programs botch the definition of junk DNA and fail to report the current views of many knowledgeable experts. Let's look at another example produced by Google's Generative AI when you search for "gene." Here's the opening statement that you see at the top of the page.

A gene is the fundamental unit of heredity, a segment of DNA that contains the instructions for building a specific protein or a functional RNA molecule. These instructions are passed down from parents to offspring, determining traits and characteristics.

This definition is misleading because it doesn't distinguish between a gene as a hereditary object and the molecular gene. With reference to the molecular gene, it doesn't tell us whether to include regions such as UTRs in protein-coding genes and introns in all genes. Contrast this with the Wikipedia definition, written by human beings.

In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and non-coding genes.

Note that the concise definition is a DNA sequence that is transcribed.¹

It's true that if you scroll down below the opening statement on Google you'll encounter more information but it's just a hodgepodge of bits of information that have been retrieved from the internet. There's no coherence of thought and no attempt at coming up with a definition that accounts for the main areas of contention such as whether regulatory elements are a part of the gene or whether introns count. Google's Generative AI doesn't look very intelligent to me.

How will we know when AI reaches the intelligence level that's expected of a Ph.D. student? Obviously it will depend on the field of study since the level of intelligence expected in some fields might be quite different from that expected in other fields. It will also depend on the university since some universities in my field clearly don't have the same standards that I'm used to.

I can only evaluate the result for areas where I think I'm knowledgeable enough to serve on Ph.D. exam committees so I made a list of thirteen questions that I would expect a Ph.D. student to answer in an intelligent manner.²

(I note that in some cases, the current AI answers are quite a bit better than the answers from a few years ago (e.g. #1, #6 and #10). I suspect this is because programmers are restricting the sources that the AI program is using in order to come up with better answers from human experts.)

If you think you've come across an AI algorithm that can perform at the Ph.D. level on most of these questions then please post the answers here. I'm especially interested in the more expensive programs that I can't afford.

What is a gene?
How much junk DNA is there in the human genome?
What percentage of human genes can produce multiple functional products by alternative splicing?
What is the most common mechanism of evolution at the molecular level?
What is the oldest biochemical pathway?
Many biochemical reactions have a positive standard Gibbs free energy change. How do those reactions proceed in a typical cell?
Why is most of the human genome transcribed?
What's the best definition of photosynthesis?
What's the best current explanation of the origin of life?
Do you think the Three Domain Hypothesis is a valid description of the tree of life?
Why is there an approximate molecular clock?
Why can humans synthesize proline but not valine?
What is the Central Dogma of Molecular Biology?

Image Credit: The image was created by ChatGPT in response to the request, "draw an image of the role of AI in biology."

1. Don't quibble about RNA genes. That's covered later on in the Wikipedia article.

2. I'm not demanding that the answer is the correct answer (where "correct" is the answer that I would give!). What I'm asking is that the answer address the important issues and present the controversies in a fair and rational manner. It should then draw a conclusion that makes logical sense. Students would be given the question in advance so they could demonstrate their ability to do research.

35 comments :

Anonymous said...: All percentage of alternative transcripts are functional. Even noisy transcript are part of the regulation process known as Regulated unproductive splicing (RUS); Thursday, May 22, 2025 6:23:00 PM
Zammo See said...: i asked Copilot (Microsoft, free) about proline and valine (your Q12). It briefly described the synthetic pathways and that humans lack acetolactate synthase and isomeroreductase enzymes. So (if accurate) that answers one aspect of "why?'

Copilot then introduced essential and non-essential amino acids and suggested that the ready availability of vaccine in a protein-rich diet meant there was no pressure on humans to maintain the synthetic pathway for valine. That's a different aspect of "why?"

I'm no biochemist, and that's obviously a short undergraduate level answer, but I'm happy with that answer … until you tell me why I shouldn't be!

Also, AI chatbots target ordinary questions, the requirements of examination questions are different.; Friday, May 23, 2025 4:52:00 AM
Neil Taylor said...: I tried out Elon Musk's X AI Grok, which I find to give the most long winded answers of AI's which don't require a subscription!

I asked the following question:

Using expert biochemical knowledge, address the important issues and present the controversies in a fair and rational manner and then draw a conclusion that makes logical sense to answer the question: What percentage of human genes can produce multiple functional products by alternative splicing?

The answer is a bit repetitive and comes out with a very high estimate of the key issue - what proportion of the alternative splices are functional 70-80% , though it does try to address the issue of functionality.

The answer is too long to be allowed so I will place it into two comments:
Part 1
Grok answered:
Addressing the Question: What Percentage of Human Genes Can Produce Multiple Functional Products by Alternative Splicing?
Alternative splicing is a fundamental biochemical process in eukaryotic gene expression, allowing a single gene to produce multiple mRNA isoforms, which can translate into distinct protein products with diverse functions. This mechanism significantly expands the proteomic diversity without requiring an increase in the number of genes. To estimate the percentage of human genes capable of producing multiple functional products through alternative splicing, we need to consider biochemical mechanisms, genomic data, and ongoing controversies in the field. Below, I will address the key issues, present relevant controversies, and provide a reasoned conclusion.
Key Issues in Alternative Splicing and Functional Products
Definition of Alternative Splicing and Functional Products:
Alternative splicing involves the selective inclusion or exclusion of exons or introns in the final mRNA transcript, resulting in isoforms that may differ in coding sequences, untranslated regions (UTRs), or regulatory elements.

A "functional product" typically refers to a protein isoform with a distinct biological role, such as altered enzymatic activity, binding affinity, localization, or stability. However, some isoforms may affect mRNA stability or translation efficiency without producing a functional protein, complicating the definition.

Scale of Alternative Splicing in Humans:
The human genome contains approximately 20,000–25,000 protein-coding genes, based on current annotations from databases like Ensembl and GENCODE (as of 2025).

High-throughput sequencing technologies (e.g., RNA-seq) have revealed that most human genes undergo some form of alternative splicing, but not all isoforms are functional.

Detection and Quantification:
RNA-seq and transcriptomic studies provide evidence of splicing events, but detecting low-abundance isoforms or tissue-specific variants remains challenging.

Bioinformatic pipelines (e.g., STAR, Kallisto) and databases (e.g., Ensembl, RefSeq) annotate splice variants, but discrepancies in annotation criteria can affect estimates.

Functional Relevance:
Not all splice variants produce functional proteins. Some isoforms may be non-coding, degraded via nonsense-mediated decay (NMD), or produce truncated proteins with no biological role.

Experimental validation (e.g., proteomics, functional assays) is required to confirm functionality, but such studies lag behind transcriptomic data due to technical limitations.

Controversies in Estimating the Extent of Alternative Splicing
Estimates of Genes Undergoing Alternative Splicing:
Early microarray-based studies suggested that 40–60% of human genes undergo alternative splicing. However, RNA-seq studies have increased estimates to 90–95% or higher, as reported in landmark studies (e.g., Pan et al., 2008; Wang et al., 2008).

Controversy arises over whether all detected isoforms are biologically relevant. Some researchers argue that many splice variants are "transcriptional noise" with no functional role, while others contend that even low-abundance isoforms may have context-specific functions (e.g., in development or disease).; Friday, May 23, 2025 5:16:00 AM
Neil Taylor said...: Part 2:
Defining "Functional" Products:
A major debate centers on what constitutes a "functional" product. Some isoforms may subtly modulate protein function (e.g., altering a single domain), while others may be non-functional or deleterious. For example, alternative splicing in the TP53 gene can produce isoforms with opposing roles in cancer progression.

The lack of comprehensive proteomic data makes it difficult to confirm whether all mRNA isoforms translate into functional proteins. Mass spectrometry studies often detect fewer isoforms than predicted by transcriptomics, suggesting a gap between transcription and translation.

Tissue-Specific and Context-Dependent Splicing:
Alternative splicing is highly tissue-specific and regulated by developmental stage, environmental signals, or disease states. This variability complicates global estimates of genes producing multiple functional products.

For example, neuronal genes (e.g., Dscam in Drosophila, though less extreme in humans) can produce thousands of isoforms, but only a subset may be functional in specific contexts.

Bioinformatics and Annotation Challenges:
Different databases (Ensembl, RefSeq, GENCODE) report varying numbers of splice variants due to differences in annotation pipelines and criteria for defining a "gene" or "transcript."

Overestimation of functional isoforms may occur if bioinformatic tools include low-abundance or artefactual transcripts, while underestimation may result from incomplete sequencing depth or tissue sampling.

Current Evidence and Estimates
Based on recent genomic and transcriptomic studies (up to 2025), the following points provide a foundation for estimating the percentage of human genes producing multiple functional products via alternative splicing:
High Prevalence of Splicing: Studies using RNA-seq (e.g., ENCODE, GTEx) estimate that 90–95% of human multi-exon genes undergo alternative splicing, producing multiple mRNA isoforms. This is supported by data from Wang et al. (2008) and subsequent analyses in GENCODE (release 44, 2023), which annotate multiple transcripts for most protein-coding genes.

Functional Isoforms: While most genes produce multiple mRNA isoforms, not all translate into functional proteins. Proteomic studies (e.g., Kim et al., 2014; Wilhelm et al., 2014) suggest that 50–70% of multi-exon genes produce at least two distinct protein isoforms detectable by mass spectrometry, though this is likely an underestimate due to limitations in proteomic sensitivity.

Conservative Estimates: Accounting for tissue-specificity and functional validation, a reasonable estimate is that 60–80% of human genes produce multiple functional protein products through alternative splicing. This range aligns with reviews (e.g., Kelemen et al., 2013; Baralle & Giudice, 2017) that balance transcriptomic and proteomic evidence.

Conclusion
Based on the available evidence, approximately 70–80% of human genes are capable of producing multiple functional products through alternative splicing. This estimate reflects the high prevalence of alternative splicing (90–95% of multi-exon genes) tempered by the fact that not all isoforms are functional or translated into proteins. The range accounts for ongoing controversies regarding the definition of "functional" and limitations in proteomic validation. Advances in single-cell RNA-seq, long-read sequencing (e.g., PacBio, Oxford Nanopore), and high-sensitivity proteomics will likely refine this estimate in the future, potentially increasing it as more tissue-specific and low-abundance isoforms are characterized.
This conclusion balances the optimism of transcriptomic data with the caution required by proteomic and functional validation, providing a rational and evidence-based answer to the question.; Friday, May 23, 2025 5:17:00 AM
Michael Tress said...: Oh, boy, that is not a smart answer.

The first part is only true if you define "functional" as "has some effect on the cell (measurable or not)". However, only a small percentage of conserved alternative transcripts produce protein isoforms that are useful for the cell.

The vast majority of splice events are not conserved and the proteins they generate are little more than biological noise. There are a lot of these alternative transcripts (and proteins) because neither translation initiation nor alternative splicing are particularly efficient.

Regulated unproductive splicing does not apply to all transcript noise, just a small proportion of conserved NMD transcripts.; Friday, May 23, 2025 5:28:00 AM
Michael Tress said...: To clarify, my reply was to the first comment by anonymous.; Friday, May 23, 2025 6:09:00 AM
Mehrshad said...: Sequence conservation is not a good indication of transcript functionality. As there are many examples of non-coding RNA that shows pretty no conservation like Xist RNA.... And also splisome machine, is a well efficient and flexible machine how dare you to say it's in inefficient; Friday, May 23, 2025 7:18:00 AM
Michael Tress said...: @mehrshad.

Last time I looked, non-coding RNA wasn't (by definition) coding.

Will I get struck down for pointing out the obvious inefficiency then?; Friday, May 23, 2025 7:32:00 AM
Mikkel Rumraket Rasmussen said...: @mehrshad
The absolute numbers are irrelevant (it doesn't matter that there are "many" functional but poorly conserved transcripts), it's the degree of correlation that matters. If among 10 million poorly conserved transcripts 99.9% are nonfunctional, it doesn't matter that the remaining 0.1% are functional (0.1% of 10 million would still be 10 000, aka "many"). And if 99.9% of highly conserved transcripts are functional, then sequence conservation is in fact a perfectly good INDICATION of transcript functionality.; Friday, May 23, 2025 8:30:00 AM
Michael Tress said...: By the way, this answer:

"Proteomic studies (e.g., Kim et al., 2014; Wilhelm et al., 2014) suggest that 50–70% of multi-exon genes produce at least two distinct protein isoforms detectable by mass spectrometry"

... is an interesting example.

ChatGPT initially invented its references, but both of these are real and large-scale studies. Unfortunately, neither detected multiple splice isoforms for anywhere close to 50-70% of multi-exon genes.

Kim et al claimed 2,450 genes with isoform specific peptides and Wilhelm et al didn't give a number, but the figures they show mean that it could not have been more than 1,300 and was probably much lower.

Which means that ChatGPT has invented the 50-70% figure.; Friday, May 23, 2025 9:39:00 AM
Larry Moran said...: @Zammo See: It's trivial to note that animals have lost some of the genes required to synthesize valine. That's just a fact that can be found in any undergraduate biochemistry textbook.

Why have animals lost these genes? If it's just because there's plenty of valine in the diet of mammals then why haven't they also lost the ability to synthesize the other amino acids because they are also abundant in the diet?

A good biochemistry student could address this question by looking at all the data on amino acid pathways and by understanding the role of chance and contingency in evolution. I don't see any AI program that's capable of that kind of intelligent response.

You said, "AI chatbots target ordinary questions, the requirements of examination questions are different." Are you suggesting that a true artificial intelligence algorithm can't come up with the correct answer to examination questions that intelligent human students can answer?; Friday, May 23, 2025 2:40:00 PM
Mitchell said...: I put the questions to "o3", the most advanced agent available via ChatGPT, in the context of its Deep Research feature, which conducts extensive web searches for approximately fifteen minutes.

https://chatgpt.com/share/6831a0b2-3244-8001-a7b2-3a111007a87e; Saturday, May 24, 2025 6:35:00 AM
Neil Taylor said...: The answers in the link are an interesting mix! As a layman who's tried to educate myself via reading Sandwalk for many years some answers aren't bad, but others are poorer quality. I'd love to see a detailed "Fisking" of the answers... Or just a mark and comment for each answer from an expert like Prof Moran. But suspect it would be a lot of work to get right into the details.; Saturday, May 24, 2025 7:54:00 AM
Mark Sturtevant said...: One area of entertainment with Chat GTP is to ask it to render a labelled picture of something. It can take a while, but the results are absolutely bizarre. One of my favorites are labelled pictures of eukaryote cells. To quickly see examples, one can do a Google search of such images. Actually, they would make lovely posters.; Saturday, May 24, 2025 3:48:00 PM
Larry Moran said...: @Mitchell: Thanks for those answers. Most of them are very similar to the ones that the standard version of ChatGPT would give but some are better. I wonder what sources the “Deep Research” version is looking at?

The general impression I get is that the best answers are similar to what you would find in the best undergraduate textbooks. I see very little evidence of significant intelligence or critical thinking but in some cases the answer has avoided the most common misconceptions and that’s an improvement.

I’ll give you a grade for each question assuming that the answer was submitted by a graduate student in biochemistry/molecular biology. Most of the answers are somewhat incoherent and many of them include contradictory statements. I expect better writing from graduate students but I’ll try and ignore that in order to focus on the information in the answer.; Saturday, May 24, 2025 4:02:00 PM
Larry Moran said...: @Mitchell: Question #1 (What is a gene?) F

The answer doesn’t distinguish between the Mendelian definition of a gene and the molecular gene. It doesn’t define the boundaries of a molecular gene - the better definition is that a gene is a transcribed region.

In the beginning and at the end it says that a gene includes regulatory sequences but in the middle it says that distant regulatory elements aren’t included. There’s no justification for including (or excluding) regulatory sequences from the definition.

The main problem is that the answer devotes too much attention to alternative splicing and puts far too much emphasis on Mark Gerstein’s 2007 paper. I don’t think there are very many knowledgeable scientists who quote that paper. We usually don’t think that ENCODE researchers are reliable sources.

Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Weissman, S. and Snyder, M. (2007) What is a gene, post-ENCODE? History and updated definition. Genome Research 17:669-681. https://doi.org/10.1101/gr.6339607; Saturday, May 24, 2025 4:04:00 PM
Larry Moran said...: @Mitchell: Question #2 (How much junk DNA?) F

No graduate student is going to get a passing grade in my course if they don’t know the difference between junk DNA and non-coding DNA and the true history of the subject. Statements such as “It’s important to note that “junk” is not a precise term – some non-coding DNA is critical (e.g. regulatory enhancers, noncoding RNA genes), and some is likely neutral or nearly neutral sequence that persists without function.” are not acceptable. I expect graduate students to come up with a definition of junk DNA that they can defend.

There’s no critical discussion of the evidence for and against the idea that most of our genome is junk. The good news is that this version of ChatGPT acknowledges that most of our genome is probably junk.

Also, Quanta Magazine is not a reliable source.; Saturday, May 24, 2025 4:25:00 PM
Larry Moran said...: @Mitchell: Question #3 (How much alternative splicing?) F

The answer is wrong. I’d give a barely passing grade to a student who intelligently discusses the possibility that most splice variants are junk but gives reasons for tentatively rejecting that idea. But you do not get a pass if you don’t even realize that there’s a controversy.; Saturday, May 24, 2025 4:32:00 PM
Larry Moran said...: @Mitchell: Question #4 (Most common mechanism of evolution?) D

The answer is correct although it should have discussed whether it applies to all organisms or just those that have a lot of junk DNA. Also, the answer should have made clear what definition of evolution it was using.

It doesn’t get more than a “D” because the answer is rambling and because it clearly doesn’t understand the difference between mutation and random genetic drift and which one counts as a mechanism of evolution. It gets it right in one part of the answer but not another.; Saturday, May 24, 2025 4:48:00 PM
Larry Moran said...: @Mitchell: Question #5 (Oldest biochemical pathway?) ?

I don’t know how to grade this answer. It makes sense that a carbon fixing pathway such as the Wood–Ljungdahl pathway (the acetyl-CoA pathway) is one of the oldest pathways. I’d give high marks to any student who named this pathway and defended the logic.

However, the ChatGPT response also talks about glycolysis as one of the oldest pathways. This is the standard misconception in the scientific literature and biochemistry textbooks but there’s a problem. You can’t have a pathway for degrading glucose until you have a pathway for making glucose (gluconeogenesis). The Chat GPT answer seems to appreciate the problem but doesn’t follow through with the logic.

Also, the ChatGPT answer seems to think that glycolysis is some sort of magical net energy producing pathway in bacteria and the earliest forms of life. This is a misconception based on the unfortunate emphasis on animal metabolism (human) in undergraduate biochemistry courses.; Sunday, May 25, 2025 10:54:00 AM
Larry Moran said...: @Mitchell: Question #6 (Free energy change.) B-

This is a pretty good answer. ChatGPT seems to understand the difference between the standard free energy change and the actual free energy change inside the cell. It’s a little confused about “coupling” and it adds extraneous information about enzymes - that’s mainly why it doesn’t get an “A.” Also, I’m not happy about the way ChatGPT refers to exergonic and endergonic reactions. How do you define these terms inside a cell where reactions can go in both directions?

The sources are chemistry texts and Wikipedia. I wonder how this version of ChatGPT knows which sources to rely on for correct information?

(Students whose main area of study is metabolism would not get as high a grade.); Sunday, May 25, 2025 11:11:00 AM
Larry Moran said...: @Mitchell: Question #7 (Why is most of the human genome transcribed?) F

The most important part of the answer is that the transcription of known genes accounts for more than 40% of the human genome if you define a gene as a DNA sequence that’s transcribed. Most of this is introns but it’s important to note this point.

The question then becomes what about the remaining 40% or so of the human genome that’s transcribed? ChatGPT says that some unspecified amount of this is due to “tens of thousands” of non-coding genes but that not true. The second reason for pervasive transcription is that regulatory sequences are bidirectionally transcribed but no amount is given. ChatGTP also notes that there is a certain (unspecified) amount of transcriptional noise.

The answer is a mix of bits of information gleaned from the scientific literature with no attempt to put it together into a coherent answer. It also gets the history wrong - we’ve known since the early 1970s that most of the genome is transcribed.; Sunday, May 25, 2025 1:46:00 PM
Larry Moran said...: @Mitchell: Question #8 (What’s the best definition of photosynthesis?) F

Carbon fixation (e.g. Calvin cycle) is not an necessary part of photosynthesis. Water is not the only electron donor so oxygen production is not a necessary part of photosynthesis. The best definition of photosynthesis has to apply to all photosynthetic organisms.

National Geographic and the Khan Academy are not reliable sources of information.; Sunday, May 25, 2025 1:55:00 PM
Larry Moran said...: @Mitchell: Question #9 (What’s the best current explanation of the origin of life?) F

ChatGPT can’t explain the differences between RNA world, primordial soup, hydrothermal vents, Miller-Urey, meteorites, and warm little ponds in order to come up with a coherent and critical examination of the leading explanations.; Sunday, May 25, 2025 2:05:00 PM
Larry Moran said...: @Mitchell: Question #9 (What’s the best current explanation of the origin of life?) F

ChatGPT can’t explain the differences between RNA world, primordial soup, hydrothermal vents, Miller-Urey, meteorites, and warm little ponds in order to come up with a coherent explanation of the leading explanations.; Sunday, May 25, 2025 2:12:00 PM
Mitchell said...: @Larry Very interesting to see your assessments. I hope to make a follow-up comment on AI capabilities and cognitive styles, a few days from now.; Monday, May 26, 2025 8:39:00 PM
Neil Taylor said...: Like Mitchell I am finding your grades fascinating. I hope you continue for all 13 questions. Thank you.; Tuesday, May 27, 2025 4:49:00 PM
Larry Moran said...: @Mitchell: Question #10 (Is the Three Domain Hypothesis valid?) B

ChatGPT recognizes that eukaryotes can’t be a separate domain of life because the first eukaryote arose from WITHIN the Archaea branch of the tree. This fact has been documented in the scientific literature for about 25 years but the idea of Three Domains is still widespread in textbooks and in popular science literature. I’m pleased that the AI program is able to give more credence to reliable sources.

However, that’s only part of the story so I’m being a bit generous by awarding a B grade. The other half of the story is that half of eukaryotic genomes are derived from the alphaproteobacteria lineage that’s within the Bacteria Domain. Eukaryotes arose from a fusion of and archaeal cell and a bacterial cell. The bacterial ancestor survives as mitochondria but many of its genes were transferred to the nucleus.

The fact that eukaryotes have two different ancestors is another nail in the coffin for the Three Domain Hypothesis. This is mentioned in the ChatGPT answer but the AI program clearly doesn’t recognize its significance.; Wednesday, May 28, 2025 6:15:00 PM
Larry Moran said...: @Mitchell: Question #11 (why is there an appropriate molecular clock?) B

This is a pretty good explanation but it could be much better written.; Thursday, May 29, 2025 2:07:00 PM
John Harshman said...: I'm assuming that "appropriate" is a typo for "approximate", but did the typo happen before or after submitting the question to the AI?; Thursday, May 29, 2025 2:22:00 PM
Larry Moran said...: @Mitchell: Question #12 (Why can humans synthesize proline but not valine?) ?

This is one of those questions that could be answered in at least two ways. I try to avoid such questions but this time I messed up.

The simple answer is that humans can’t make valine because they are missing some of the enzymes in the valine biosynthesis pathway but they have all the enzymes required to make proline. ChatGPT answers this part correctly.

My intent was go beyond that simple answer and ask students to explain why humans have lost the ability to synthesize some amino acids and not others. The ChatGPT answer attempts to answer that part of the question but doesn’t really comes to grips with the problem. The answer states correctly that the branched chain biosynthesis pathway is more complex than some others but then goes on to state that animals might have found it to be “metabolically cheaper” to obtain valine from the diet rather than synthesize it directly. That doesn’t make sense because the same logic applies to all the other amino acid biosynthetic pathways.

I’d give a B+ grade to student lawyers who insist that the simple answer must be counted as correct. I’d give a C- grade to the answer that I expected. There is no “correct” answer to the deeper “why” question - it’s a way of evaluating the ability to think critically.; Thursday, May 29, 2025 2:32:00 PM
Larry Moran said...: @Mitchell: Question #13 (What is the Central Dogma of Molecular Biology?) F

You can’t get a passing grade on this question unless you correctly define the Central Dogma in the way Francis Crick did in his 1958 and 1970 papers AND explain why it differs from the common DNA –> RNA –> protein definition.

There’s enough information out there to alert ChatGPT to the problem of conflicting definitions and it’s interesting to note that it quotes a correct Crick version. However, it’s clear that the current algorithm isn’t capable of handling the conflict and mostly reverts to defending the popular version.

It’s interesting that ChatGPT extracts a quote from Wikipedia as “often stated as ‘DNA makes RNA, and RNA makes protein’” when the intent of that phrase was to illustrate the INCORRECT version of the Central Dogma. (I know this because I wrote much of the Wikipedia article.) This tells us that the current version of ChatGPT can find information and can, to some extent, recognize which sources are more reliable but it still has trouble dealing with controversy and nuance. It’s still a long way from exhibiting a Ph.D. level of intelligence.

Wikipedia: Central dogma of molecular biology
https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology; Thursday, May 29, 2025 2:53:00 PM
Mitchell said...: Larry - Sorry it's taken me so long to get back to this.

I just saw a comment by a mathematician on GPT-o3's performance when discussing mathematics:

https://thezvi.substack.com/p/ai-120-while-o3-turned-pro?open=false#%C2%A7on-your-marks

"The model often resembles a keen graduate student who has read extensively and can name-drop many results and authors. While this can seem impressive initially, it quickly becomes apparent to experts that the student hasn’t deeply internalized the material and is mostly regurgitating what they’ve read. The model exhibits a similar behavior—good at recognizing relevant material, but unable to extend or apply it in a novel way."

Does that sound like what you saw in these answers?; Thursday, June 12, 2025 6:50:00 PM
Larry Moran said...: @Mitchell

Yes, that pretty much sums up the problem with current AI programs. They can retrieve information but they aren't "intelligent" in the sense that I use the word.; Friday, June 13, 2025 6:38:00 PM
Mitchell said...: I think one way to get at the difference, is that current AIs don't reflect on what they "learn". They "memorize" all kinds of things, and they even develop templates of generalization which allow them to produce novelty, but all of that happens unreflectively. In these newer AIs that are called reasoning models, a kind of reflection happens during the "chain of thought" process, in that they plan what to say, and weigh up whether it's a good plan, before they say it. So reflection happens when they answer questions, but not when they are trained - and perhaps this just isn't enough reflection, to equal the performance of someone who truly learns, and thinks about it as they learn.

If they were reflecting throughout the training process, they might actually develop individual opinions, which could be a good thing or a bad thing... There's a new AI architecture from China called Absolute Zero in which it's supposed to guide its own training (the "zero" refers to zero presuppositions); that *might* produce something closer to the human capacity to reflect as you're learning, and not just when you're applying that knowledge.; Saturday, June 14, 2025 4:27:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, May 22, 2025

Is AI really "intelligent"? Here are 13 biology questions to test the latest AI algorithms.

35 comments :