More Recent Comments

Wednesday, February 08, 2012

Must a Gene Have a Function?

Biology is such a messy subject.1 It's impossible to come up with simple definitions of fundamental concepts in biology because there are exceptions to everything. In the case of "gene," there are so many exceptions that it seems hopeless to propose a general definition of such an important term. Nevertheless, we need some basic ground rules to prevent the situation from getting out-of-hand.

In an earlier posting from 2007 [What Is a Gene?], I suggested the following ...
This essay describes various modern definitions of physical genes (Gene-D). I like to define a gene as “a DNA sequence that’s transcribed” but that’s a bit too brief for a formal definition. We need to include something that restricts the definition of gene to those entities that are biologically significant. Hence,

A gene is a DNA sequence that is transcribed to produce a functional product.

This eliminates those parts of the chromosome that are transcribed by accident or error. These regions are significant in large genomes; in fact, the confusion between accidental transcripts and real transcripts is responsible for the overestimates of gene number in many genome projects. (In technical parlance, most ESTs are artifacts and the sequences they come from are not genes.)
Let's not quibble about all of the exceptions. Most of them are covered in my original article and in the comments there. I want to concentrate here on the idea that a gene has to have a "function" of some sort. As I explained in the comments ....
I don't know if I can come up with a catchy definition of "function." What I mean is that the transcript or it's product has to do some biochemical duty in order to qualify. It doesn't have to be an essential function but it has to make a difference of some sort.
This is important because there's a growing tendency to label all kinds of things as "genes" just because they produce small RNA molecules or, in some cases, a small protein. In most cases the products have no known biological function.

Here's a couple of examples.

De Novo Protein-Encoding Genes

It's plainly obvious that new genes must arise from time to time in various lineages. Lot's of people are interested in the evolution of humans and in particular the changes that distinguish us from our closest cousins. Almost all of the changes can be explained by alterations in the timing or location of orthologous gene expression but that doesn't exclude the possibility that entirely new genes might arise de novo in some lineages.

Let's just think about genes that encode proteins. There are three steps required for the de novo creation of a new protein-encoding gene. (1) A part of the ancestral genome must be transcribed. (2) The transcript must contain an open reading frame with a start and stop codon. (3) The new protein must have a function.

That last step needs explaining. If the new protein doesn't have a function then the putative new gene is no different than a pseudogene or a mutant gene that produces a truncated protein because of a premature stop codon. It's also indistinguishable from bits of the genome that are accidentally transcribed and just happen to have an open reading frame.

Wu et al. (2010) looked at the evolution of new genes in the lineage leading to humans. The title of their paper is: "De Novo Origin of Human Protein-Coding Genes." I want to challenge their definition of "gene" by suggesting that what they've really discovered are "potential" or "candidate" genes that don't deserve to be called "genes" until one discovers a biological function for their products.

The authors searched the human genome (build 56) for annotated "genes" with small open reading frames greater that 100 codons long. Then they examined the corresponding loci in the chimpanzee and orangutan genomes looking for case where there was no open reading frame in the other apes. Various expressed RNA databases and two expressed peptide databases were screened to see if the candidate genes were expressed as RNA and protein. They found 27 examples. These are the candidates for de novo genes in humans.

Their collection did not contain some of the de novo "genes" reported by others. As it turns out, those "genes" were annotated in previous versions of the human genome (builds 40-55) but were dropped from the latest versions because there were no homologues in the other ape genomes. By using those older builds, Wu et al. discovered another 33 candidates for a total of 60 putative new protein-encoding genes in the human genome.

Wu et al. concede that the expression levels of these candidate genes are "very low" but unfortunately they don't give us any specific levels. This is important because there's plenty of evidence that the expressed RNA databases contain spurious transcripts [How to Evaluate Genome Level Transcription Papers].

I wonder how many spurious peptides are in the peptide databases? Wu et al report that one of the peptides used to identify an earlier example of a de novo gene (Knowles and McLysaght, 2009) has been removed from the current build of PeptideAtlas. What happened to it?

The authors are aware of the fact that function is important, especially if they want to argue that these new genes conferred some selective advantage on our hominid ancestors. The only "evidence" they offer is that the putative genes are expressed at a low level in testis and brains but at an even lower level in other tissues. This is no evidence at all since we've known for fifty years that the complexity of RNA sequences in brain and testis is much higher than in other tissues. We still don't know whether that's due to elevated spurious transcription in those tissues of whether it is biologically significant.

Are these 60 candidates really new "protein-coding genes"? I don't think so. I don't think they can be called "genes" until it has been demonstrated that the products have a biological function. Guerzoni and McLysaght (2010) seem to agree because they write,
The observation by Wu et al. that some of the candidate de novo genes are expressed at their highest in brain tissues and testis is interesting, but by no means proves they are functional. A major challenge remains to demonstrate functionality of the de novo genes.

Genes that Encode Functional RNAs

The people who annotate the human genome are somewhat skeptical of these new genes and that's why so many putative genes have disappeared from the more recent builds. (But the Ensembl group still lists 434 "novel protein-coding genes.")

However, they don't seem to be as skeptical when it comes to genes that produce small RNAs. The most recent Ensembl build (GRCh37.p5, Feb 2009), for example, lists 12,523 RNA genes [Ensembl: Human Genome].

What are the criteria they use to prove that these are really genes? It can't have anything to do with biological function since it's simply not true that the human genome contains more that twelve thousand genes that produce an RNA whose function has been demonstrated.

Should that be a requirement before declaring that a bit of transcribed DNA is a gene? You're damn right it should because otherwise every bit of DNA that's accidentally transcribed in some tissue at some time during development qualifies as a gene. That makes no sense [What is a gene, post-ENCODE?].

1. That's why it's much more difficult than physics where there's talk about unifying the entire discipline under a single theory of everything. :-)

Guerzoni D, McLysaght A. (2011) De novo origins of human genes. PLoS Genet. 2011 Nov;7(11):e1002381. Epub 2011 Nov 10. [PLoS Genetics]

Knowles, D.G. and McLysaght, A. (2009) Recent de novo origin of human protein-coding genes. Genome Res. 19:1752-1759. PLoS Genet. 2011 Nov;7(11):e1002379. Epub 2011 Nov 10. [doi: 10.1101/gr.095026.109]

Wu, D.D., Irwin, D.M., and Zhang, Y.P. (2010) De novo origin of human protein-coding genes. [PLoS Genetics]


FitzRoy said...

To qualify as a a "function" within this definition, must a character be subject to natural selection? For example, there are evidently so many bits of transcribed RNA floating around that no doubt there are some that, just by chance, will have a measurable biochemical effect on something else somewhere in the cell. However, if the effect of the RNA was not selected for in the first place, and if it is not subject to selection because the effect in question has no impact on reproductive success -- then does this piece of RNA have a "function" within the contemplation of the definition?

Larry Moran said...

Everything is subject to natural selection but that's probably not what you meant.

You meant to ask whether it has to be adaptive. The answer is no. There are many identical copies of some genes and it's likely that the extra copies are redundant. That's why they usually "die" by mutation.

Peter said...

Given this very stringent definition of "function" (with which I largely agree), is your earlier challenge here really a fair one?

I suspect that even if you restrict yourself to the primary splice isoforms of genes conserved across several species, there just isn't enough evidence out there yet to demonstrate function for more than a fraction of them, unless you allow yourself to infer function from sequence conservation.

Conversely, even where there's a demonstrable biological effect, it can be arguable as to whether that constitutes a function. Consider the case of a newly-evolved miRNA, that alters the expression levels of a few dozen protein-coding genes and has some phenotypic consequence when deleted. Can we say that this miRNA truly has a function? Or is rather that the miRNA is a bit of rubbish left over from RNA processing, and which interferes with the expression levels of a number of genuinely functional genes. The fact that the functional genes subsequently evolved workarounds to maintain their function in the presence of the interfering detritus doesn't mean that the detritus itself plays a meaningful role.

gillt said...

But it's always the exceptions to the rule--human-specific duplications with implications for disease and evolution--that make for good research projects and likely create a bias among researchers about function.

John Harshman said...

Since the best (or at least the easiest) test for functionality is evolutionary conservation, it would probably be a good idea if you're looking for de novo genes not to look at a terminal node but at some internal node, old enough that we could distinguish neutral evolution from purifying selection. I would suggest looking for sequences present and conserved in hominids (sensu lato) but not present in other primates.

Anonymous said...


"Genes that Encode Functional RNAs"