Sandwalk: Genes

Showing posts with label Genes. Show all posts

Saturday, July 19, 2025

The genes for all seven of Mendel's traits have now been identified

The seven traits that Gregor Mendel worked with were: seed shape (R/r), cotyledon color (I/i), seed and flower color (A/a), pod shape (V/v), pod color (Gp/gp), flower position (Fa/fa), and stem length (Le/le). The last trait is also known as Tall (T) and short (t).

The genes for four of these traits (seed shape, cotyledon color, flower color, and stem length) were identified and characterized many years ago. The genes for the remaining three traits have now been identified. The results were published in the June 25, 2025 issue of Nature (Feng et al., 2025).

Kostas Kampourakis cautions us against teaching Mendelian genetics

A few weeks ago I put up a post on Were you lied to in your genetics class?. At the time I thought it was just a fringe view being expressed by a graduate student who didn't understand genetics but now I realize that it's much more important than that.

Kostas Kampourakis is a respected scientist who is being promoted by the National Center for Science Education (NCSE) as an excellent communicator of evolution. Here's a short video (below) where he explains why teachers are making a mistake by saying that Mendel is the father of genetics and that simple Mendelian genetics can explain complex traits.

I'm struggling to understand his point. Here's what I think he means.

Kampourakis uses the example of eye color in Drosophila. He agrees that segregation of the allele responsible for eye color may follow the Mendelian rules¹ but it's wrong to assume that there's a single gene responsible for eye color in fruit flies. He thinks that the goal is to understand the complexities of development and standard Mendelian genetics gives a completely distorted view of that subject, assuming, of course, that teachers can't separate genetics from understanding development.

Part of the problem is that we use the word "trait" differently. Take Mendel's example of pea color as another example. [Identity of the Product of Mendel's Green Cotyledon Gene (Update)] What Mendel was studying was the segregation of alleles in a gene called sgr (stay-green). It codes for an enzyme involved in the degradation of chlorophyll during senescence. When the enzyme is defective, chlorophyll isn't degraded and one of the visible phenotypes is that peas stay green instead of turning yellow.

I believe that the fundamental trait is the lack of an enzyme for degrading chlorophyll and this is what I would teach my students. I would also show them that the phenotype can be easily explained once you understand the biochemistry. It shows you that the connection between the fundamental trait and the visible phenotype can be mysterious so you should be careful about jumping to conclusions.

I think that Kampourakis sees this differently. He thinks that the example of green vs yellow peas is used to teach students that the color of peas is completely determined by a single gene. He thinks that the "trait" is the develpment of seed color in peas.

Kampourakis believes that "people looking for explanations and whatever happens to them in terms of disease and their own features, they find hard to reconcile the simplistic model that they have been taught with the realities of life." I think what he means is that students are being taught that single genes will always determine complex characteristics. He attributes that to the teaching of Mendelian genetics.

If he is correct, then that kind of teaching has to stop but I don't think it's the fault of Mendelian genetics. Mendelian genetics—indeed the entire field of genetics sensu stricto—is about the segregation of alleles. It is not about development even though we have come to learn a lot about development through genetics and the phenotype of mutants. I think that genetics and development are separate topics.

Kampourakis disagrees. He says, "we need, I think, to teach genetics from a developmental perspective. We need to show that genes do not determine traits but they are implicated in development." I believe this perspective comes from a more fundamental bias that distinguishes his worldview from mine. I tend to see genetics as a subject that covers all of biology and that includes all species such as bacteria, viruses, and single-cell eukaryotes. He tends to see things from a human perspective, which is much more complex than dealing with simple organisms. I think we should concentrate on teaching students about simple well-understood model organisms and then move on to explaining how this applies to more complex organisms. Kampourakis seems to be implying that we should jump right into reaching high school students about the most challenging issues in biology.

1. Actually, the common allele for white eye in Drosophila (see image) is X-linked so it doesn't follow the standard rules for Mendelian segregation!

Friday, October 11, 2024

Philip Ball says RNA may rule our genome

Philip Ball is on a roll. He has published a new book plus several articles in popular magazines and he has appeared in a bunch of podcasts and YouTube videos. The message is all the same, he claims that it's time for a revolution in biology.

Ball's ideas are complicated and I won't go into all of them in this article. Instead, I want to focus on one of his more scientific claims; namely, the claim that genomic data has overthrown the fundamental principles of molecular biology. Let's look at his recent (May 14, 2024) article in Scientific American: Revolutionary Genetics Research Shows RNA May Rule Our Genome.¹

The subtile of the article is "Scientists have recently discovered thousands of active RNA molecules that can control the human body" and that's the issue that I want to discuss here.

Science misinformation is being spread in the lecture halls of top universities

Should universities remove online courses that contain incorrect or misleading information?

There are lots of scientific controversies where different scientists have conflicting views. Eventually these controversies will be solved by normal scientific means involving evidence and logic but for the time being there isn't enough data to settle a genuine scientific controversy. Many of us are interested in these controversies and some of us have chosen to invest time and effort into defending one side or the other.

But there's a dark side of science that infects these debates—false or misleading information used to support one side of a legitimate controversy. To give just one example, I'm frustrated at the constant reference to junk DNA being defined as non-coding DNA. Many scientists believe that this was the way junk DNA was defined by its earliest proponents and then they go on to say that the recent discovery of functional non-coding DNA refutes junk.

I don't know where this idea came from because there's nothing in the scientific literature from 50 years ago to support such a ridiculous claim. It must be coming from somewhere since the idea is so widespread.

Where does misinformation come from and how is it spread?

Philip Ball's new book: "How Life Works"

Philip Ball has just published a new book "How Life Works." The subtitle is "A User’s Guide to the New Biology" and that should tell you all you need to know. This is going to be a book about how human genomics has changed everything.

How many genes in the human genome (2023)?

The latest summary of the number of genes in the human genome gets the number of protein-coding genes correct but their estimate of the number of known non-coding genes is far too high.

In order to have a meaningful discussion about molecular genes, we have to agree on the definition of a molecular gene. I support the following definition (see What Is a Gene?).

Definition of a gene (again)

The correct definition of a molecular gene isn't difficult but getting it recognized and accepted is a different story.

When writing my book on junk DNA I realized that there was an issue with genes. The average scientist, and consequently the average science writer, has a very confused picture of genes and the proper way to define them. The issue shouldn't be confusing for Sandwalk readers since we've covered that ground many times in the past. I think the best working definition of a gene is, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]

David Allis (1951 - 2023) and the "histone code"

C. David Allis died on January 8, 2023. You can read about his history of awards and accomplishments in the Nature obituary with the provocative subtitle Biologist who revolutionized the chromatin and gene-expression field. This refers to his work on histone acetyltransferases (HATs) and his ideas about the histone code.

The key paper on the histone code is,

Strahl, B. D., and Allis, C. D. (2000) The language of covalent histone modifications. Nature, 403:41-45. [doi: 10.1038/47412]

Histone proteins and the nucleosomes they form with DNA are the fundamental building blocks of eukaryotic chromatin. A diverse array of post-translational modifications that often occur on tail domains of these proteins has been well documented. Although the function of these highly conserved modifications has remained elusive, converging biochemical and genetic evidence suggests functions in several chromatin-based processes. We propose that distinct histone modifications, on one or more tails, act sequentially or in combination to form a ‘histone code’ that is, read by other proteins to bring about distinct downstream events.

They are proposing that the various modifications of histone proteins can be read as a sort of code that's recognized by other factors that bind to nucleosomes and regulation gene expression.

This is an important contribution to our understanding of the relationship between chromatin structure and gene expression. Nobody doubts that transcription is associated with an open form of chromatin that correlates with demethylation of DNA and covalent modifications of histone and nobody doubts that there are proteins that recognize modified histones. However, the key question is what comes first; the binding of transcription factors followed by changes to the DNA and histones, or do the changes to DNA and histones open the chromatin so that transcription factors can bind? These two models are referred to as the histone code model and the recruitment model.

Strahl and Allis did not address this controversy in their original paper; instead, they concentrated on what happens after histones become modified. That's what they mean by "downstream events." Unfortunately, the histone code model has been appropriated by the epigenetics cult and they do not distinguish between cause and effect. For example,

The “histone code” is a hypothesis which states that DNA transcription is largely regulated by post-translational modifications to these histone proteins. Through these mechanisms, a person’s phenotype can change without changing their underlying genetic makeup, controlling gene expression. (Shahid et al. (2022)

The language used by fans of epigenetics strongly implies that it's the modification of DNA and histones that is the primary event in regulating gene expression and not the sequence of DNA. The recruitment model states that regulation is primarily due to the binding of transcription factors to specific DNA sequences that control regulation and then lead to the epiphenomenon of DNA and histone modification.

The unauthorized expropriation of the histone code hypothesis should not be allowed to diminish the contribution of David Allis.

Saturday, August 20, 2022

Editing the 'Intergenic region' article on Wikipedia

Just before getting banned from Wikipedia, I was about to deal with a claim on the Intergenic region article. I had already fixed most of the other problems but there is still this statement in the subsection labeled "Properties."

According to the ENCODE project's study of the human genome, due to "both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length)"[2]

The source is one of the ENCODE papers published in the September 6 edition of Nature (Djebali et al., 2012). The quotation is accurate. Here's the full quotation.

As a consequence of both the expansion of genic regions by the discovery of new isoforms and the identification of novel intergenic transcripts, there has been a marked increase in the number of intergenic regions (from 32,481 to 60,250) due to their fragmentation and a decrease in their lengths (from 14,170 bp to 3,949 bp median length.

What's interesting about that data is what it reveals about the percentage of the genome devoted to intergenic DNA and the percentage devoted to genes. The authors claim that there are 60,250 intergenic regions, which means that there must be more than 60,000 genes.¹ The median length of these intergenic regions is 3,949 bp and that means that roughly 204.5 x 10⁶ bp are found in intergenic DNA. That's roughly 7% of the genome depending on which genome size you use. It doesn't mean that all the rest is genes but it sounds like they're saying that about 90% of the genome is occupied by genes.

In case you doubt that's what they're saying, read the rest of the paragraph in the paper.

Concordantly, we observed an increased overlap of genic regions. As the determination of genic regions is currently defined by the cumulative lengths of the isoforms and their genetic association to phenotypic characteristics, the likely continued reduction in the lengths of intergenic regions will steadily lead to the overlap of most genes previously assumed to be distinct genetic loci. This supports and is consistent with earlier observations of a highly interleaved transcribed genome, but more importantly, prompts the reconsideration of the definition of a gene.

It sounds like they are anticipating a time when the discovery of more noncoding genes will eventually lead to a situation where the intergenic regions disappear and all genes will overlap.

Now, as most of you know, the ENCODE papers have been discredited and hardly any knowledgeable scientist thinks there are 60,000 genes that occupy 90% of the genome. But here's the problem. I probably couldn't delete that sentence from Wikipedia because it meets all the criteria of a reliable source (published in Nature by scientists from reputable universities). Recent experience tells me that the ~~Wikipolice~~ Wikipedia editors would have blocked me from deleting it.

The best I could do would be to balance the claim with one from another "reliable source" such as Piovasan et al. (2019) who list the total number of exons and introns and their average sizes allowing you to calculate that protein-coding genes occupy about 35% of the genome. Other papers give slightly higher values for protein-coding genes.

It's hard to get a reliable source on the real number of noncoding genes and their average size but I estimate that there are about 5,000 genes and a generous estimate that they could take up a few percent of the genome. I assume in my upcoming book that genes probably occupy about 45% of the genome because I'm trying to err on the side of function.

An article on Intergenic regions is not really the place to get into a discussion about the number of noncoding genes but in the absence of such a well-sourced explanation the audience will be left with the statement from Djebali et al. and that's extremely misleading. Thus, my preference would be to replace it with a link to some other article where the controversy can be explained, preferably a new article on junk DNA.²

I was going to say,

The total amount of intergenic DNA depends on the size of the genome, the number of genes, and the length of each gene. That can vary widely from species to species. The value for the human genome is controversial because there is no widespread agreement on the number of genes but it's almost certain that intergenic DNA takes up at least 40% of the genome.

I can't supply a specific reference for this statement so it would never have gotten past the ~~Wikipolice~~ Wikpipedia editors. This is a problem that can't be solved because any serious attempt to fix it will probably lead to getting blocked on Wikipedia.

There is one other statement in that section in the article on Intergenic region.

Scientists have now artificially synthesized proteins from intergenic regions.[3]

I would have removed that statement because it's irrelevant. It does not contribute to understanding intergenic regions. It's undoubtedly one of those little factoids that someone has stumbled across and thinks it needs to be on Wikipedia.

Deletion of a statement like that would have met with fierce resistance from the Wikipedia editors because it is properly sourced. The reference is to a 2009 paper in the Journal of Biological Engineering: "Synthesizing non-natural parts from natural genomic template."

1. There are no intergenic regions between the last genes on the end of a chromosome and the telomeres.

2. The Wikipedia editors deleted the Junk DNA article about ten years ago on the grounds that junk DNA had been disproven.

Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A. et al. (2012) Landscape of transcription in human cells. Nature 489:101-108. [doi: 10.1038/nature11233]

Piovesan, A., Antonaros, F., Vitale, L., Strippoli, P., Pelleri, M. C., and Caracausi, M. (2019) Human protein-coding genes and gene feature statistics in 2019. BMC research notes 12:315. [doi: 10.1186/s13104-019-4343-8]

Wednesday, June 30, 2021

Richard Dawkins talks about the genetic code and information

This is a video published a few weeks ago where Jon Perry interviews Richard Dawkins. Jon Perry is the author of animations posted on his website Stated Clearly. He (Perry) has a very adaptaionist view of evolution—a view that he got from Richard Dawkins.

The main topic of the interview concerns DNA as information and the genetic code. Both Dawkins and Perry give the impression that the only kind of information in the genome is the genetic code (sensu stricto); in other words, the code that specifies a sequence of amino acids using the sequence of nucleotides in a coding region [The Real Genetic Code]. Dawkins makes the same point he has often made; namely, that this is a real code just like any other code.

Perry points out that most people don't understand this, including many atheists who argue that the "code" is merely an analogy and not to be taken literally. Atheists, and others, also argue that the information content of DNA includes lots of other things such as genes that specify functional RNAs and sites that bind proteins. It's hard to argue that a gene for tRNA functions as any kind of a code and it's hard to argue that the DNA binding sites in origins of replication are codes even though you could argue that they carry information.

I don't get excited about arguments over whether DNA carries "information" because there's not much to be gained by such arguments. Who cares whether the genetic code falls under the definition of "information theory"? However, I do get annoyed when people say that the ONLY information in DNA is in the form of the genetic code.

Watch the video and let me know what you think. Jerry Coyne watched it and he wasn't the least bit bothered by the things that bothered me [A discussion on genetics, evolution, and information with Richard Dawkins].

Friday, March 12, 2021

The bad news from Ghent

A group of scientists, mostly from the University of Ghent¹ (Belgium), have posted a paper on bioRxiv.

Lorenzi, L., Chiu, H.-S., Cobos, F.A., Gross, S., Volders, P.-J., Cannoodt, R., Nuytens, J., Vanderheyden, K., Anckaert, J. and Lefever, S. et al. (2019) The RNA Atlas, a single nucleotide resolution map of the human transcriptome. bioRxiv:807529. [doi: 10.1101/807529]

The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

They spent a great deal of effort identifying RNAs from 300 human samples in order to construct an extensive catalogue of five kinds of transcripts: mRNAs, lncRNAs, antisenseRNAs, miRNAs, and circularRNAs. The paper goes off the rails in the first paragraph of the Results section where they immediately equate transcripts wiith genes. They report the following:

19,107 mRNA genes (188 novel)
18,387 lncRNA genes (13,175 novel)
7,309 asRNA genes (2,519 novel)
5,427 miRNAs
5,427 circRNAs

Of mice and Michael

Michael Behe has published a book containing most of his previously published responses to critics. I was anxious to see how he dealt with my criticisms of The Edge of Evolution but I was disappointed to see that, for the most part, he has just copied excerpts from his 2014 blog posts (pp. 335-355).

I think it might be worthwhile to review the main issues so you can see for yourself whether Michael Behe really answered his critics as the title of his most recent book claims. You can check out the dueling blog posts at the end of this summary to see how the discussion evolved in real time more than four years ago.

Many Sandwalk readers participated in the debate back then and some of them are quoted in Behe's book although he usually just identifies them as commentators.

My Summary

Michael Behe has correctly indentified an extremely improbably evolution event; namely, the development of chloroquine resistance in the malaria parasite. This is an event that is close to the edge of evolution, meaning that more complex events of this type are beyond the edge of evolution and cannot occur naturally. However, several of us have pointed out that his explanation of how that event occurred is incorrect. This is important because he relies on his flawed interpretation of chloroquine resistance to postulate that many observed events in evolution could not possibly have occurred by natural means. Therefore, god(s) must have created them.

In his response to this criticism, he completely misses the point and fails to understand that what is being challenged is his misinterpretation of the mechanisms of evolution and his understanding of mutations.

The main point of The Edge of Evolution is that many of the beneficial features we see could only have evolved by selecting for a number of different mutations where none of the individual mutations confer a benefit by themselves. Behe claims that these mutations had to occur simultaneously or at least close together in time. He argues that this is possible in some cases but in most cases the (relatively) simultaneous occurrence of multiple mutations is beyond the edge of evolution. The only explanation for the creation of these beneficial features is god(s).

Why is the Central Dogma so hard to understand?

The Central Dogma of molecular biology states ...

... once (sequential) information has passed into protein it cannot get out again (F.H.C. Crick, 1958).

The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid (F.H.C. Crick, 1970).

This is not difficult to understand since Francis Crick made it very clear in his original 1958 paper and again in his 1970 paper in Nature [see Basic Concepts: The Central Dogma of Molecular Biology]. There's nothing particularly complicated about the Central Dogma. It merely states the obvious fact that sequence information can flow from nucleic acid to protein but not the other way around.

So, why do so many scientists have trouble grasping this simple idea? Why do they continue to misinterpret the Central Dogma while quoting Crick? I seems obvious that they haven't read the paper(s) they are referencing.

I just came across another example of such ignorance and it is so outrageous that I just can't help sharing it with you. Here's a few sentences from a recent review in the 2020 issue of Annual Reviews of Genomics and Human Genetics (Zerbino et al., 2020).

Once the role of DNA was proven, genes became physical components. Protein-coding genes could be characterized by the genetic code, which was determined in 1965, and could thus be defined by the open reading frames (ORFs). However, exceptions to Francis Crick's central dogma of genes as blueprints for protein synthesis (Crick, 1958) were already being uncovered: first tRNA and rRNA and then a broad variety of noncoding RNAs.

I can't imagine what the authors were thinking when they wrote this. If the Central Dogma actually said that the only role for genes was to make proteins then surely the discovery of tRNA and rRNA would have refuted the Central Dogma and relegated it to the dustbin of history. So why bother even mentioning it in 2020?

Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

Zerbino, D.R., Frankish, A. and Flicek, P. (2020) "Progress, Challenges, and Surprises in Annotating the Human Genome." Annual review of genomics and human genetics 21:55-79. [doi: 10.1146/annurev-genom-121119-083418]

Tuesday, September 22, 2020

The Function Wars Part VIII: Selected effect function and de novo genes

Discussions about the meaning of the word "function" have been going on for many decades, especially among philosphers who love that sort of thing. The debate intensified following the ENCODE publicity hype disaster in 2012 where ENCODE researchers used the word function an entirely inappropriate manner in order to prove that there was no junk in our genome. Since then, a cottege indiustry based on discussing the meaning of function has grown up in the scientific literature and dozens of papers have been published. This may have enhanced a lot of CV's but none of these papers has proposed a rigorous definition of function that we can rely on to distinguish functional DNA from junk DNA.

The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.
Stephen Jay Gould (1982)

That doesn't mean that all of the papers have been completely useless. The net result has been to focus attention on the one reliable definition of function that most biologists can accept; the selected effect function. The selected effect function is defined as ...

Structure and expression of the SARS-CoV-2 (coronavirus) genome

Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.¹

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10^-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

Where did your chicken come from?

Scientists have sequenced the genomes of modern domesticated chickens and compared them to the genomes of various wild pheasants in southern Asia. It has been known for some time that chickens resemble a species of pheasant called red jungle fowl and this led Charles Darwin to speculate that chickens were domesticated in India. Others have suggested Southeast Asia or China as the site of domestication.

The latest results show that modern chickens probably descend from a subspecies of red jungle fowl that inhabits the region around Myanmar (Wang et al., 2020). The subspecies is Gallus gallus spadiceus and the domesticated chicken subspecies is Gallus gallus domesticus. As you might expect, the two subspecies can interbreed.

The authors looked at a total of 863 genomes of domestic chickens, four species of jungle fowl, and all five subspecies of red jungle fowl. They identified a total of 33.4 million SNPs, which were enough to genetically distinguish between the various species AND the subspecies of red jungle fowl. (Contrary to popular belief, it is quite possible to assign a given genome to a subspecies (race) based entirely on genetic differences.)

The sequence data suggest that chickens were domesticated from wild G. g. spadiceus about 10,000 years ago in the northern part of Southeast Asia. The data also suggest that modern domesticated chickens (G. g. domesticus) from India, Pakistan, and Bangladesh interbred with another subspecies of red jungle fowl (G. g. murghi) after the original domestication. These chickens from South Asia contain substantial contributions from G. g. murghi ranging from 8-22%.

Next time you serve chicken, if someone asks you where it came from you won't be lying if you say it came from Myanmar.

Image credits: BBQ chicken, Creative Common License [Chicken BBQ]
Red Jungle Fowl, Creative Commons License [Red_Junglefowl_-Thailand]
Map: Lawler, A. (2020) Dawn of the chicken revealed in Southeast Asia, Science: 368: 1411.

Wang, M., Thakur, M., Peng, M. et al. (2020) 863 genomes reveal the origin and domestication of chicken. Cell Res (2020) [doi: 10.1038/s41422-020-0349-y]

Thursday, June 11, 2020

Dan Graur proposes a new definition of "gene"

I've thought a lot about how to define the word "gene." It's clear that no definition will capture all the possibilities but that doesn't mean we should abandon the term. Traditionally, the biochemical definition attempts to describe the part of the genome that produces a functional product. Most scientists seem to think that the only possible product is a protein so it's common to see the word "gene" defined as a DNA sequence that produces a protein.

But from the very beginning of molecular biology the textbooks also talked about genes for ribosomal RNAs and tRNAs so there was never a time when knowledgeable scientists restricted their definition of a gene to protein-coding regions. My best molecular definition is described in What Is a Gene?.

A gene is a DNA sequence that is transcribed to produce a functional product.
Dan Graur has also thought about the issue and he comes up with a different definition in a recent blog post: What Is a Gene? A Very Short Answer with a Very Long Footnote

A gene is a sequence of genomic material (DNA or RNA) that has a selected effect function.

This is obviously an attempt to equate "function" with "gene" so that all functional parts of the genome are genes, by definition. You might think this is rather silly because it excludes some obvious functional regions but Dan really does want to count them as genes.

Performance of the function may or may not require the gene to be translated or even transcribed.

Genes can, therefore, be classified into three categories:

(1) protein-coding genes, which are transcribed into RNA and subsequently translated into proteins.

(2) RNA-specifying genes, which are transcribed but not translated

(3) nontranscribed genes.

Really? Is it useful to think of centromeres and telomeres as genes? Is it useful to define an origin of replication as a gene? And what about regulatory sequences? Should each functional binding site for a transcription factor be called a gene?

The definition also leads to some other problems. Genes (my definition) occupy about 30% of the human genome but most of this is introns, which are mostly junk (i.e. no selected effect function). How does that make sense using Dan's definition?

Monday, October 21, 2019

The evolution of de novo genes

De novo genes are new genes that arise spontaneously from junk DNA [De novo gene birth]. The frequency of de novo gene creation is important for an understanding of evolution. If it's a frequent event, then species with a large amount of junk DNA might have a selective advantage over species with less junk DNA, especially in a changing environment.

Last week I read a short Nature article on de novo genes [Levy, 2019] and I think the subject deserves more attention. Most new genes in a species appear to arise by gene duplication and subsequent divergence but de novo genes are genes that are unrelated to genes in any other clade so we can assume that they are created from junk DNA that accidentally becomes associated with a promoter causing the DNA to be transcribed. A new gene is formed if the RNA acquires a function. If the transcript contains an open reading frame then it may be translated to produce a polypeptide and if the polypeptide performs a new function then the resulting de novo gene is a new protein-coding gene.

The important question is whether the evolution of de novo genes is a common event or a rare event.

How many protein-coding genes in the human genome? (2)

It's difficult to know how many protein-coding genes there are in the human genome because there are several different ways of counting and the counts depend on what criteria are used to identify a gene. Last year I commented on a review by Abascal et al. (2018) that concluded there were somewhere between 19,000 and 20,000 protein-coding genes. Those authors discussed the problems with annotation and pointed out that the major databases don't agree on the number of gene [How many protein-coding genes in the human genome?].

Gerald Fink promotes a new definition of a gene

This is the 2019 Killian lecture at MIT, delivered in April 2019 by Gerald Fink. Fink is an eminent scientist who has done excellent work on the molecular biology of yeast. He was director of the prestigious Whitehead Institute at MIT from 1990-2001. With those credentials you would expect to watch a well-informed presentation of the latest discoveries in molecular genetics. Wouldn't you?

Subscribe to: Comments ( Atom )

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Saturday, July 19, 2025

Friday, November 22, 2024

Friday, October 11, 2024

Thursday, March 21, 2024

Wednesday, February 07, 2024

Tuesday, October 10, 2023

Wednesday, March 01, 2023

Wednesday, February 15, 2023

Saturday, August 20, 2022

Wednesday, June 30, 2021

Friday, March 12, 2021

Tuesday, December 01, 2020

Sunday, November 15, 2020

Tuesday, September 22, 2020

Thursday, July 09, 2020

Wednesday, July 08, 2020

Thursday, June 11, 2020

Monday, October 21, 2019

Tuesday, September 24, 2019

Wednesday, September 11, 2019