Sandwalk: What Is a Gene?

Sunday, January 28, 2007

What Is a Gene?

(Other definitions are at Discovering Biology in a Digital World, Pharyngula, and Greg Laden.)

The concept of a gene is a fundamental part of the fields of genetics, molecular biology, evolution and all the rest of biology. Gene concepts can be divided into two main categories: abstract and physical. Abstract genes are the kind we refer to when we talk about genes “for” a certain trait, including many genetic diseases. Most geneticists and many evolutionary biologists use an abstract gene concept.

Philosophers have coined the term “Gene-P” for the abstract gene concept. The “P” stands for “phenotype” indicating that this gene concept defines a gene by it’s phenotypic effects and not its physical structure.

Physical genes consist of stretches of DNA with a beginning and an end. These are molecular genes that can be cloned and sequenced. Philosophers call them “Gene-D” where “D” stands for “development”—a very unfortunate choice.

This essay describes various modern definitions of physical genes (Gene-D). I like to define a gene as “a DNA sequence that’s transcribed” but that’s a bit too brief for a formal definition. We need to include something that restricts the definition of gene to those entities that are biologically significant. Hence,

A gene is a DNA sequence that is transcribed to produce a functional product.

This eliminates those parts of the chromosome that are transcribed by accident or error. These regions are significant in large genomes; in fact, the confusion between accidental transcripts and real transcripts is responsible for the overestimates of gene number in many genome projects. (In technical parlance, most ESTs are artifacts and the sequences they come from are not genes.)

We could refine the definition by including RNA genes but that’s such a insignificant percentage of all genes that the refinement is hardly worth it. As we shall see, there are more significant limitations to the definition.

This "DNA sequence that's transcribed" definition describes a physical entity. Let’s examine a simple molecular gene to see how the definition applies.

This is a simple bacterial protein-encoding gene. The horizontal line represents a stretch of double-stranded DNA with the rectangular part being the gene. The gene is copied into RNA as shown by the arrow below the gene. This process is called transcription. Transcription begins when the transcription enzyme (RNA polymerase) binds to a promoter region (P) and starts copying the DNA beginning at the initiation site (i). The DNA is copied until a termination site (t) is reached at the end of the gene. According to my preferred definition of a gene, it starts at “i” and ends at “t.”

The part of the gene that’s transcribed includes the coding region, shown in black. This is the part of the gene that contains sequential codons specifying the amino acid sequence of the protein. At the beginning of the gene, called the 5ʹ (5-prime) end, there’s a short stretch of sequence that will be transcribed but not translated into protein. This 5ʹ untranslated region (5ʹ UTR) will contain various signals for starting protein synthesis.

The other end of the gene is called the 3ʹ (3-prime) end and there’s almost always a stretch that’s transcribed but not translated (3ʹ UTR). The 3ʹ UTR contains signals that cause transcription termination and also signals that regulate translation.

There are regions upstream of the promoter that control whether or not the gene is transcribed. These regions are called regulatory regions. They may contain binding sites for various proteins that will attach there in order to enhance the binding of RNA polymerase to the promoter. One of the differences between my preferred definition of a gene and others is that some other definitions include the promoter and the regulatory region.

There are two problems with such definitions. First, they’re not consistent with standard usage when we talk about the regulation of gene expression. We don’t say that only “part” of a gene is transcribed, which would be correct if we included the regulatory region in our definition of a gene. How often have we heard anyone say that regulatory sequences control the expression of part of the gene? That doesn’t make sense.

Second, by including regulatory sequences in the definition of a gene the actual extent of the gene becomes ill-defined. For most genes, we don’t know where all the regulatory sequences are located so we don’t know for sure where the gene begins or ends. Furthermore, there are some regulatory sequences, especially in eukaryotes, that are not contiguous with the gene and this leads to “genes” that are split into various pieces. It’s much easier to use a definition like “a DNA sequence that’s transcribed” because it defines a start and an end.

The organization of a typical eukaryote gene is shown below.

The main difference between this type of gene and a typical bacterial gene is the presence of introns and exons. These genes are transcribed from an initiation site to a termination site just like bacterial genes. When the RNA transcript is finished it undergoes an additional step called RNA processing. In that step, parts of the original transcript are spliced out and discarded. These parts correspond to the introns in the gene—shown as thinner rectangular region within the genes.

Note that the coding region (black) can be interrupted by these introns so the final messenger RNA (mRNA) cannot be translated until RNA processing is completed. The important point for our purposes is that the introns are part of the gene since they are transcribed.

My preferred definition has been used by molecular biologists for many decades but there are several other definitions that have been popular over the years. All of them have good points and bad points. I’ve already dealt with the definition that includes regulatory regions.

Some people still prefer a gene definition that corresponds to one used over half a century ago; namely, a gene is a sequence that encodes a polypeptide. This is the so-called one gene:one protein definition. It’s very old-fashioned. We’ve known for years that there are genes that do not encode proteins in spite of the fact that we commonly show protein-encoding genes whenever we describe typical genes. (As I did above.) There are genes for transfer RNA (tRNA), genes for ribosomal RNA, and genes for a large heterogeneous class of small RNAs. None of them have coding regions. The transcript is the functional product, often after RNA processing.

Because this old-fashioned definition is rarely used, the examples of alternative splicing producing different proteins pose no problem for modern definitions. These modern definitions refer to the transcript as the important product and not a protein.

There are exceptions to every generality in biology. Here’s a short list of gene examples that do not conform to my preferred definition.

Operons: In some cases adjacent “genes” are transcribed together to produce a large initial transcript containing several coding regions. In other cases the primary transcript is subsequently cleaved to produce multiple functional RNAs. In these cases it doesn’t make sense to refer to the co-transcribed genes as a single “gene.” Instead, we identify the stretches of DNA that correspond to a single functional unit as the “gene.” Thus, the lac operon contains three “genes” and the ribosomal RNA operons contain two, three, or four genes.

Trans-splicing: There are examples of “genes” that are split into pieces. The transcript from one piece is joined to the transcript from another to produce a functional RNA.

Overlapping Genes: Some “genes” overlap. This means that a single stretch of DNA can be part of two, and in at least one case, three genes.

RNA Editing: In some cases the primary transcript is extensively edited before it becomes functional. In the most extreme cases nucleotides are inserted and deleted. What this means is that the information content of the “gene” is insufficient to ensure a functional product and the assistance of other “genes” is required.

17 comments :

John S. Wilkins said...: One philosopher - Lenny Moss - uses (and coined) the Gene-P and Gene-D terminology. It hasn't really caught on (yet). It's based on the old preformationist/epigeneticist distinction of the 17thC, in ways I don't entirely understand.

And Lenny trained as a molecular biologist.; Monday, January 29, 2007 12:00:00 AM
SPARC said...: Just to make clear my identity: Unfortunately my comments here appear under my google account name (is there a way to change this?). Normally I comment as SPARC. So, you already know my concerns about your definition. I would summarize them like this:

Nothing in transcription makes sense except in the light of regulatory sequences.; Monday, January 29, 2007 3:38:00 AM
Larry Moran said...: John Wilkins says,

One philosopher - Lenny Moss - uses (and coined) the Gene-P and Gene-D terminology. It hasn't really caught on (yet).

I thought it was good to mention the distinction between the two basic gene concepts. I was basing it on the papers by Moss and also on those by Paul Griffiths and Karola Sotz who also discuss the terms Gene-P and Gene-D. Perhaps you've heard of Paul Giffiths? :-)

(For the benefit of others, John Wilkins works with Paul Griffiths in Brisbane, Australia. Griffiths is a leading expert on the gene concept.); Monday, January 29, 2007 6:26:00 AM
Larry Moran said...: Martin (SPARC) says,

Nothing in transcription makes sense except in the light of regulatory sequences.

I agree with this sentiment. Regulatory sequences control the expression of the gene. Or, do you think of them as controlling the expression of part of the gene?

Just because regulatory sequences are important does not mean they have to be included in the definition of a gene.; Monday, January 29, 2007 6:31:00 AM
Larry Moran said...: Peter Ellis says,

I don't see why you exclude RNA genes though - you could simply alter the definition to read "nucleic acid sequence" rather than "DNA sequence". Currently you're left with the rather odd proposition that the smallpox virus has genes, but the SARS virus doesn't, for example.

I'm not "excluding" RNA genes—I'm simply relegating them to the category of exception to the rule. It's true that we could substitute "nucleic acid" for "DNA" in the definition but I think that weakens the definition considerably.

The tricky part about definitions in biology is that they can almost never be airtight. What we're usually looking for is a generality that conveys the truth about most of the things we're defining. In this case we're trying to describe a typical gene and in 99.99% of the cases, that gene is made of DNA.

The other problem is to reconcile a "definition" with general usage. While I agree with you that we could call an entire operon a "gene" this doesn't really make a lot of sense in light of the fact that nobody would ever agree with us. Like it or not, molecular biologists will continue to refer to the β-galactosidase gene and not the β-galactosidase fragment of the lac gene.; Monday, January 29, 2007 6:41:00 AM
Greg Laden said...: Larry,

This is a good definition of a gene. I like much of it.

I started to comment on the question of regulatory bits. (I agree with you) and ended up with comments extensive enough that I made my own post:

http://www.gregladen.com/wordpress/

I do think regulatory regions are not genetic any more than the boardroom at the Ford Plant over in Saint Paul is a pickup truck. You need lots of things to make a pickup truck, but those things do not become the pickup truck.

This is why the Verizon commercial is funny and not real live. All those people in "the network" following you around really do work for verizon (well, they are actors, but...) but they are not part of your cell phone.

I happen to think the same of non-coding RNA consequences in the DNA, an idea that is either terribly old fashioned or very very modern. I'm still thinking about it.; Tuesday, January 30, 2007 9:27:00 AM
Anonymous said...: you guys are so clever :| im an undergraduate at the university of nottingham, England, studying BSc Biochemistry. my tutor set me an essay to write : "WHAT IS A GENE?". so confused :-(; Wednesday, October 15, 2008 12:38:00 AM
solitarybee said...: In defining a gene, it is easy to focus on the coding sequence that results in the functional protein. However perhaps we should consider it as a system; where the trigger and feedback systems that modulate it's activity are taken into account. It does after all sit in a context, and if you do take the gene out of it's context as in a naive attempt to GM an organism the chances are you are heading for a fail. Of course defining the full context is a bit of an art.; Monday, October 12, 2009 10:38:00 AM
Physeter said...: @Anonymous,
Thanks for your valuable and spammy advice.

Larry,
I like your definition, but I have a further question (hope you're still reading comments in old posts): what is a "function" in biology?; Tuesday, January 11, 2011 6:12:00 AM
Larry Moran said...: I don't know if I can come up with a catchy definition of "function." What I mean is that the transcript or it's product has to do some biochemical duty in order to qualify. It doesn't have to be an essential function but it has to make a difference of some sort.; Tuesday, January 11, 2011 12:21:00 PM
Physeter said...: Thanks Larry. I understood what you meant, didn't want to imply you were ambiguous. I think the concept of biological function is an interesting issue by itself. Most of the definitions I've heard around are restricted to evolution, and more specifically to traits evolved by NS. That doesn't convince me.
If you find it to be interesting, take a look at the Wikipedia article. I can't make sense of the first paragraph ("part of a question"???).; Tuesday, January 11, 2011 5:26:00 PM
Paul said...: My two cents worth, the "raison d'être" for any maintained and active gene system is purely and simply an environmental 'intervention'.
Be it providing this indirectly as part of a support structure for a higher function (e.g. in photosynthesis), or directly in coding the actual active site as an enzyme.; Wednesday, January 12, 2011 9:48:00 AM
Tim Tyler said...: Genes should be what genetics studies - and genetics is the science of inheritance and variation in living organisms. Organisms do not *have* to use nucleic acids for inheritance - that is simply a local historical accident - so genes should not have to be made out of DNA.; Wednesday, October 12, 2011 10:43:00 PM
John McKenzie said...: Help please ... I am a complete amateur but interested. I am teaching myself about "Life" in general and how it came to be ... and am at a very basic level.

My question at the moment is from the studying I have done so far. I have ended up thinking of alleles as a collection of physical entities and genes as descriptive of the physical manifestation of the alleles' properties (i.e. not physical but descriptive of a pattern). Am I adrift?; Thursday, April 19, 2012 5:57:00 AM
PhillsBlog said...: A gene is something between a nucleotide and a chromosome. do genes have week defined boundaries if not, then what is an allele?; Thursday, February 19, 2015 4:52:00 AM
Jathro said...: So am I to understand that the UTRs are part of the gene. They are both transcribed and even though not translated they have regulatory aspects that govern transcription and protein synthesis - thus fulfilling both parts of the definition - transcription and a functional product?

Thanks!; Wednesday, February 07, 2024 3:22:00 PM
Larry Moran said...: @Jathro: Yes. They are part of the gene whether they have regulatory functions or not.; Thursday, February 08, 2024 1:50:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Sunday, January 28, 2007

What Is a Gene?

17 comments :