More Recent Comments

Thursday, June 11, 2020

Dan Graur proposes a new definition of "gene"

I've thought a lot about how to define the word "gene." It's clear that no definition will capture all the possibilities but that doesn't mean we should abandon the term. Traditionally, the biochemical definition attempts to describe the part of the genome that produces a functional product. Most scientists seem to think that the only possible product is a protein so it's common to see the word "gene" defined as a DNA sequence that produces a protein.

But from the very beginning of molecular biology the textbooks also talked about genes for ribosomal RNAs and tRNAs so there was never a time when knowledgeable scientists restricted their definition of a gene to protein-coding regions. My best molecular definition is described in What Is a Gene?.

A gene is a DNA sequence that is transcribed to produce a functional product.

Dan Graur has also thought about the issue and he comes up with a different definition in a recent blog post: What Is a Gene? A Very Short Answer with a Very Long Footnote

A gene is a sequence of genomic material (DNA or RNA) that has a selected effect function.

This is obviously an attempt to equate "function" with "gene" so that all functional parts of the genome are genes, by definition. You might think this is rather silly because it excludes some obvious functional regions but Dan really does want to count them as genes.
Performance of the function may or may not require the gene to be translated or even transcribed.

Genes can, therefore, be classified into three categories:

(1) protein-coding genes, which are transcribed into RNA and subsequently translated into proteins.

(2) RNA-specifying genes, which are transcribed but not translated

(3) nontranscribed genes.
Really? Is it useful to think of centromeres and telomeres as genes? Is it useful to define an origin of replication as a gene? And what about regulatory sequences? Should each functional binding site for a transcription factor be called a gene?

The definition also leads to some other problems. Genes (my definition) occupy about 30% of the human genome but most of this is introns, which are mostly junk (i.e. no selected effect function). How does that make sense using Dan's definition?


  1. Not sure I see what the problem with Dan's definition is. I could just say yes, telomeres, centromeres, and regulatory elements are types of genes.

    1. That completely changes the definition of a gene and leaves us with no suitable word for the pieces of DNA that are transcribed to make a functional product. It means that debates over the number of genes in the human genome are worthless because there are about 100,000 origins of replication and about 100,000 SARs (scaffold attachment regions). I don't think Dan's definition is helpful.

    2. Well we could have sub-categories of genes. In a way we already do. We have protein coding genes(transcribed and translated, and RNA genes(transcribed only). And then we could have non-transcribed genes.

      I think Dan's definition is fine, as it seems broad enough to capture all functional DNA sequences. It then shifts the question to categorization, having established the basis for a gene is it's selected effect-function.

      If that requires us to adjust the number of "functional genes" in the human genome up or down, then so be it. Okay, so there will be a lot of confusion about conflicting numbers depending on how the terms have been used traditionally, but aren't we already in that situation? It seems to me we already have to constantly spend lots of time defining terms, explaining concepts, and giving historical contexts in discussions of these matters.

      I suppose your goal seems different from Dan's. I guess you're more concerned about clearing up all this confusion about functional vs junk-DNA, and Dan is more concerned about having a definition broad enough to capture all currently known, and yet to be discovered functional DNA sequences.

    3. "All currently known, and yet to be discovered functional DNA sequences" are called "functional DNA sequences." Why do we need to call them "genes"? What do we gain by changing the definition of "gene" and creating more confusion?

    4. I take your meaning, that's a much better point. There is no reason why the term gene has to encompass all possible functional DNA sequences.

    5. Why not return to a classical definition: if a DNA mutation caused an (ahem) change in phenotype, then a “gene” just got mutated!

      That’s how Jacob & Monod could distinguish dominance in “cis” vs dominance in “trans”

      ... and how the Lac Operator region was identified as a crucial “gene”

    6. @Tom Mueller

      Image a mutation in junk DNA upstream of a gene where the mutation creates a new promoter that interferes with the normal expression of the gene. This is a mutation that causes a phenotype but it makes no sense to label the junk DNA as a gene.

      It’s true that the lac operator region was sometimes referred to as a gene but that mistake didn’t last very long and there’s no reason to resurrect it.

  2. Problem 1: Most of what we call a structural gene is not in fact a gene, for example most introns are not genes, nor are UTRs, nor is every third base in most open reading frames. "Genes" are individual motes of dust scattered throughout genomic regions that encode functional products.

    Problem 2: Endlings like Lonesome George have no genes. Their ancestors had genes, but they themselves do not. There is not a single nucleotide in their genomes that has any bearing on their ability to reproduce.

    Problem 3: If an asteroid wipes out the USA, some of my genes will cease to be genes. Whether or not a sequence has a selected effect depends on population size in addition to the magnitude of the selective coefficient. If the selection coefficient is small enough then the sequence will evolve neutrally. When a bottleneck reduces the population size, sequences that were previously under selection stop being under selection. Genes stop being genes.

    1. Problem 1: UTRs and introns are part of a gene by my definition.

      Problem 2: Lonesome George had plenty of genes by anyone's definition, including Dan Graur's.

      Problem 3: You are correct. When a given DNA sequence is not currently subject to purifying selection then it is not a functional gene by most definitions. It doesn't matter whether it was previously functional. Pseudogenes are a good example.

      This is related to the differences between two different kinds of selected effect (SE) functions: origin SE functions, and maintenance SE functions. It's possible to have currently functional sequences that did not arise by traditional natural selection.

      This point is discussed in ...

      Linquist, S., W. F. Doolittle and A. F. Palazzo (2020). "Getting clear about the F-word in genomics." PLOS Genetics 16(4): e1008702. [doi: 10.1371/journal.pgen.1008702]

  3. My own preference is to define it without any reference to phenotype at all. Since 'allele' is not restricted to functional variants - any autosomal DNA stretch of whatever length can be homozygous or heterozygous - then 'gene' shouldn't be, either. It has subclasses - protein-coding, transcriptional, promoter, repressor.

    1. Then you would define a gene as any DNA sequence of any length and any or no function? How would you decide where a gene starts or ends?

    2. That would be the reductio ad absurdum, yes. But all such stretches are, nonetheless, legitimate character states. A character state does not have to have a phenotypic manifestation beyond genotype. It's just that we started at the back end, with phenotypic states, where these terms had their genesis, and are now retrofitting to include genomic states.

      To reflect the question back, if we define a 'gene' as necessarily discrete, delimited by (for example) an ORF, what would the variants of the genome between such addresses be called? They can be homozygous or heterozygous, ie can have alleles, but they can't be genes? That seems unsatisfactory - especially when many such regions can affect phenotype in their own right.

    3. I don't see the problem. Why does all genetic variation have to be contained in genes? Why does all genetic function have to be contained in genes? I like Larry's definition because it gives the term "gene" a useful meaning that differentiates a gene from a randomly delimited sequence. Junk DNA doesn't consist of genes. Promoters are not parts of genes, though introns and UTRs are. Functional microRNAs are transcribed from genes; nonfunctional ones are still transcribed, but not from genes. It ain't broke, so let's not fix it.