More Recent Comments

Wednesday, March 01, 2023

Definition of a gene (again)

The correct definition of a molecular gene isn't difficult but getting it recognized and accepted is a different story.

When writing my book on junk DNA I realized that there was an issue with genes. The average scientist, and consequently the average science writer, has a very confused picture of genes and the proper way to define them. The issue shouldn't be confusing for Sandwalk readers since we've covered that ground many times in the past. I think the best working definition of a gene is, "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]

The key points about this definition are that it defines a gene as a transcription unit so it covers both protein-coding genes and noncoding genes. The product of the gene is RNA. The other key point is that it puts an emphasis on function so the definition can be used to eliminate transcribed regions that don't have a function [Must a Gene Have a Function?]. Of course that raises a question about how to define biologically relevant function but that's covered elsewhere.

The definition isn't meant to include all the exceptions—there are very few definitions on biology that do that.1 So, what's the problem? The problem is that many people don't understand this definition and its implications. I was reminded of this when I tried to edit the Wikipedia article on Gene.2 It's important to nail down an acceptable definition of a gene on Wikipedia because so many other articles rely on having a common understanding of this important concept. It will also be extremely important when (if?) we ever succeed in getting an article on junk DNA accepted.

I didn't expect there to be much of a problem but I was wrong. Here are the main issues that I have encountered with other editors.

  1. There's still a lot of confusion about the difference between the Mendelian gene and the molecular gene. The gene article is mostly about the molecular gene and there's general agreement that this should be the focus. However, some editors can't keep this distinction straight.
  2. The majority of editors think that up until recently all genes were defined as DNA that encodes a protein. They insist that noncoding genes have only recently been discovered by ENCODE and this needs to be noted in the Wikipedia article. I have references that support my claim that both protein-coding genes and noncoding genes have been recognized for 50 years but they have many more references that say otherwise [Paradigm shifting].
  3. Many editors don't think the "function" part of the definition is important. They believe that any transcribed region counts as a gene and they have plenty of references that support that ridiculous claim. Wikpipedia is really big on "neutrality" and "fairness" so as long as there are reliable sources for a statement it has to be in the article.
  4. Most editors don't understand the difference between coding region and gene. They have learned that only 1% of the human genome is devoted to coding regions so they assume that genes account for only 1% of our genome. I have tried to correct this with references stating the correct value (35-45%) but that meets with a lot of resistance since the vast majority of scientific papers say otherwise.
  5. Many editors think that the discovery of alternative splicing has refuted all definitions of a gene because a single gene can make many different proteins. Attempts to explain what the definition actually says don't work because they have references that support them.
  6. The formation of de novo genes is a problem for some editors. They think this is an exception to the definition I use because the literature is full of examples of de novo genes that don't (yet) have a well-defined function. I've tried unsuccessfully to explain that the scientific literature is wrong when researchers say they've discovered a new gene but they're not sure what the product does. By definition, it's not a de novo gene if it doesn't have a function.
  7. One active editor recently made changes to accommodate the idea of synthetic genes, which he thinks are an exception to the standard definition.
  8. I also encountered a fair number of scientists who think that ENCODE discovered hundreds of overlapping genes that negate any reasonable definition of a gene. Pointing out that we've been describing overlapping genes in the textbooks for forty years without having to redefine a gene doesn't seem to impress them.
  9. Several editors want regulatory sequences to be part of a gene and since, according to their references, genes can be controlled by regulatory sequences on different chromosomes, the standard definition of a gene as a transcription unit is wrong.
  10. Epigeneics is a serious problem for a lot of Wikipedia editors. Apparently it casts doubt on all definitions of a gene and requires prominent mention in just about every Wikipedia article.

You can see some of these problems in the Wikipedia gene article under Functional definitions. I would like to delete that entire section but I suspect that will meet with considerable resistance.

Philosophers have tried to help out with descriptions of genes but so far they haven't been successful [Debating philosophers: The molecular gene] [Philosophers talking about genes] [Stanford Encyclopedia of Philosophy: Gene]. Their main contribution has been to provide ammunition for Wikipedia editors who want to quibble about how to define a gene.

1. The main exceptions are split genes and bacterial operons.

2. My account on Wikipedia is no longer blocked thanks to the helpful intervention of a friendly editor.


gert korthof said...

Larry: "A gene is a DNA sequence that is transcribed to produce a functional product"

So, RNA viruses like SARS-CoV-2 don't have genes?

Wade said...

I confess to a knee-jerk bristling at the 'molecular gene' definition displacing the 'Mendelian gene' definition as the ranking default. Some of that is curmudgeonly thinking that students should learn this stuff the hard way like I did, including the history of how the abstract concepts of trans and cis factors in genetics resolved into molecularly understood functions and functional units through a history of clever experiments (and hard work).

To defend my reaction, if the editors you complain about had that background, they might bristle a bit too but would sigh then acquiesce in accepting the nice/robust molecular definition (but missing the label of gene for some binding sites, a bit like missing Pluto as the 9th planet).

Wade said...

@gert korthof

see a prior answer from Larry

Mark Sturtevant said...

Commiserations. It's like trying to argue with Chat GTP.

Unknown said...

Thanks Larry. Your comments on biology are always very helpful.