Sandwalk: Michael White's misleading history of the human gene

Saturday, May 03, 2014

Michael White's misleading history of the human gene

There are many ways of defining the gene but only some of them are reasonable in the 20th and 21st centuries [What Is a Gene?]. By the 1980s most knowledgeable biologists were thinking of a gene as a DNA sequence that's transcribed to produce a functional product.

They were familiar with genes that encoded proteins and with a wide variety of genes that produce functional RNAs like ribosomal RNA , transfer RNA, regulatory RNAs, and various catalytic RNAs. It would have been difficult to find many knowledgeable biologists who thought that all genes encoded proteins.

By the 1980s, most knowledgeable biologists were aware of RNA processing. They knew that the primary transcripts of genes could be modified in various ways to produce the final functional form. They knew about alternative splicing. All these things were taught in undergraduate courses and written in the textbooks.

Here's how Michael White views that history in: Your Genes Are Obsolete.

As the century progressed, biologists came to see genes as real physical objects. They discovered that genes have a definite size, that they are linearly arrayed on chromosomes, that individual genes are responsible for specific chemical events in the cell, and that they are made of DNA and written in the language of the Genetic Code. By the time the Human Genome Project was initiated in 1988, researchers knew that a gene was a segment of DNA with a clear beginning and end and that it acted by directing the production of a particular enzyme or other molecule that did a specific job in the cell. As real things, genes are countable, and in 1999 biologists estimated that humans had "80,000 or so" of them.

If he means that knowledgeable researchers knew about genes for functional RNAs (e.g. ribosomal RNA genes) then he is right. If he thinks that knowledgeable researchers thought that all genes encoded proteins, then he's wrong.

As for the number of genes, I've addressed this in False History and the Number of Genes, and Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome.

There may have been researchers who speculated about the number of genes in the human genome but surely the only estimates that count are those from scientists who were knowledgeable about the subject. Those experts expected about 30,000 genes based on genetic load arguments and data from the early 1970s on the amount of DNA that was unique. Most of those researchers were expecting humans to have about the same number of genes as fruit flies, or maybe a few thousand more.

Michael White continues ....

Yet, when the dust from the Human Genome Project cleared, we didn’t have nearly as many genes as we thought. By the latest count, we have 20,805 conventional genes that encode enzymes and other proteins. Our inflated gene count, though, wasn’t the only casualty of the Human Genome Project. The very idea of a gene as a well-defined segment of DNA with a clear functional role has also taken a hit, and as a result, our understanding of our relationship with our genes is changing.

There are about 21,000 protein encoding genes in our genomes and several thousand more genes that produce functional RNAs. The numbers may be a bit lower than most experts thought, but not by much. No great surprises there unless you count those people who made speculative guesses without knowing the data from the 60s and 70s.

And, there weren't many surprises about defining a gene either.

One major challenge to the concept of a gene is the growing evidence that many genes are shapeshifters. Instead of a well-defined segment of DNA that encodes a single protein with a clear function, we should view a gene as "a polyfunctional entity that assumes different forms under different cellular states," according to University of Washington biologist John Stamatoyannopoulos. While researchers have long known that genes are made up of discrete subunits called "exons," they hadn’t realized until recently the degree to which exons are assembled—like Legos—into sometimes thousands of different combinations. With new technologies, biologists are cataloging these various combinations, but in most cases they don’t know whether those combinations all serve the same function, different functions, or no function at all.

Maybe some people didn't know about RNA processing and alternative splicing but many of us did. No surprises there.

We don't know how many genes in the human genome are "shapeshifters" but there's a growing realization that many splice variants are just biological noise due to errors in splicing. Those variants have no biological function. The point is that the definition of "gene" wasn't affected by any discoveries by those who sequenced the human genome or by the ENCODE Consortium.

Our concept of a gene is also challenged by the fact that much of the function in our DNA is located outside of conventionally defined genes. These "non-coding" functional DNA segments regulate when and where conventional protein-coding genes operate. For our biology, non-coding regulatory DNA elements are as consequential as genes, but their properties are even more difficult to define because their function isn’t based on the well-understood Genetic Code and their boundaries are even fuzzier than gene boundaries.

No surprises there either. Knowledgeable researchers have known about regulatory sites since the 1960s. Most of them don't incorporate regulatory regions into the definition of "gene." Every gene is going to be associated with regulatory regions that regulate transcription.

I don't see why this well-known fact makes the definition of "gene" obsolete.

As a result, non-coding regulatory DNA elements are much more difficult to count. One consortium of researchers put the number of regulatory DNA segments in the human genome between 580,000 and 2.9 million, while just last month a different consortium claimed that there are only 43,000. Regardless of how you count them, it’s clear that these non-gene regulatory DNA elements far outnumber conventional genes. It is hard not to wonder, then, what good is the concept of a gene if it doesn’t include most of our functional DNA?

I think it's totally unreasonable to speculate that every gene would have 20 different regulatory sites scattered around the genome as some of those numbers suggest. If there were only a few near the promoter then this is exactly what we've known for decades and there's no reason to redefine a gene.

Finally, I don't know what Michael White was thinking but I've never heard any knowledgeable scientist say that all functional DNA has to be in "genes." So, what's the problem? If the definition of "gene" wasn't made obsolete with the decades-old discoveries of origins of replication, regulatory sequences, telomeres, and centromeres then what's changed in the past decade?

I don't get it. Why are so many prominent scientists saying that we need to redefine the word "gene"?

106 comments:

Jonathan BadgerSaturday, May 03, 2014 10:16:00 PM
Personally I blame the human genetics types. Because of practical and ethical reasons, in many ways the field is not as advanced as it is in model plants, animals and microbes. So the human data is noisy and confusing. Because people like to think they are special, they assume this noise must be all meaningful signal making us way more complicated than other organisms. To accept that we are not would be devastating to human exceptionalism.
ReplyDelete
Replies
SPARCSunday, May 04, 2014 2:45:00 AM
Unfortunately, splice variant databases contain many dubious entries that may be caused by reverse transcribed non fully processed pre-mRNAs. Thus, the relevance of alternative splicing appears to be overrated (at least to someone who remembers single identical Northern bands from every tissue he analayzed now being confronted with allegedly more than 10 alternative transcripts of the very same gene).
A recent paper in Genome Research points in the other direction:
"Background
RNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.

Results
Here we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein.

Conclusions
Overall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies."
ReplyDelete
Replies
UnknownSunday, May 04, 2014 5:56:00 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
SRMSunday, May 04, 2014 9:16:00 AM
Finally, I don't know what Michael White was thinking but I've never heard any knowledgeable scientist say that all functional DNA has to be in "genes."

Maybe its as simple as that. For decades a somewhat sloppy conceptualization regarding the molecular basis of life called genes the "blueprint" for making the organism. When it dawns on writers that there is more to phenotype than genes (coding regions), they think the concept of the gene must then be expanded to include all functional regions of the genome - hence a fuzzying of the concept of gene. This would seem to have the eventual effect of rendering the word "gene" rather meaningless except perhaps to distinguish between functional and non-functional regions of DNA. But there are already words for that and for non-coding regions of DNA such as promoters, TF binding sites, origins of replication, etc.

Another possible factor is that it seems anytime a firm definition is proferred in biology, one will eventually encounter a circumstance that suggests a need for caveats or qualifications to the definition. These exceptions seem to excessively trouble people at times.
ReplyDelete
Replies
UnknownSunday, May 04, 2014 7:40:00 PM
Larry, I am new to the Sandwalk but I must admit that I like it. I have commented on many blogs, but I have also been blocked from many (eg. Lifesitenews) simply because I presented evidence that disagreed with them. However, you continue to allow people like Quest to comment (to his intellectual embarrassment) and when you do delete a comment, you are transparent about it. The ID and religious sites are never that open about it.
ReplyDelete
Replies
Rosie RedfieldSunday, May 04, 2014 8:37:00 PM
After reading James Gleick's The Information, I now teach that genes are primarily informational constructs, not structures. (Well, of course, all of genetics is understanding the intersections between physical molecules and information.)
ReplyDelete
Replies
Robert ByersSunday, May 04, 2014 10:23:00 PM
The error here in classification is that only the option for genetic change is evolution say evolutionists.
The bible says there are kinds. Fixed and settled.
yet within the kinds, as shown by people, diversity can be great.
So species is just a moment in time. There are no species just as people are not species thought often more different then species in biology classifications.
Thats why the terms don't work.
there are KINDS but no species. Not species changing until its a very new thing.
No reason to see it that way.
ReplyDelete
Replies
UnknownMonday, May 05, 2014 12:07:00 PM
A few points:

1) Gene counts: It's great that some people had better estimates of the number of human genes before the Human Genome Project, but the larger estimates were widely believed, particularly in the molecular biology circles I was in when the draft genome sequence was published. It sounds like those with better ties to the classical genetics/evolution community believed more accurate estimates, while the molecular bio people, including the leaders of the HGP, were going with the inflated number. My 1998 Lodish et al text puts the number at 60,000-100,000.

2) The point of my piece is that the definition of a gene was broad at first- something that would have included cis-regulatory elements - but it developed into something much more restricted by the end of the 20th C, at least in the community associated with genome sequencing projects. That more restricted definition, the molecular concept of a gene as on ORF or RNA gene, was a big motivator behind genome sequencing project - we would solve the organism largely by assigning a function to each gene, first in yeast and other model organisms, and eventually in humans, as described here: http://www.ncbi.nlm.nih.gov/pubmed/15451511

3) Maybe someone here knows the answer: did anyone writing before 2000 expect that there was substantially more conserved regulatory DNA than coding DNA? These days, there is a big emphasis on regulatory genomics, which, as far as I can tell, was not expected 20 years ago.

4) You may not like John Stam’s estimate of 580,000 functional regulatory elements, but it’s not the result of idle speculation, it’s the result of associating DHS with specific genes. You may not believe the result (personally I favor the FANTOM estimate of 43,000), but it’s a serious attempt to put a number on regulatory DNA.
ReplyDelete
Replies
Tom MuellerTuesday, May 06, 2014 10:13:00 AM
Hi John, Hi Piotr

Let’s return to Piotr’s original question:

Tom, what's the advantage of using the term basal outgroup rather than simply sister clade?

To recap: what do I mean by “basal ? I am referring to nodes… and please remember I commenced this line of inquiry with the caveat “advocatus diaboli”

I notice that august publications no less than PNAS can publish articles entitled
Evaluating hypotheses of basal animal phylogeny using complete sequences of large and small subunit rRNA

... which seem to use the term “basal” in conjunction with “clades” along the same lines I do.

I hope I am not confused… that said, perhaps I should in future be more circumspect when bandying technical vocabulary.

Let’s reboot:

Is it possible to construct a phylogenetic tree from molecular with nested nodes? Yes!

Do some nodes represent earlier branch points than others? Yes!

As a matter of fact, so-called “outgroups” to “root” phylogenetic trees often cannot be presumed but rather need to be deduced. False presumptions about outgroups can really mess up data analysis, if I understand theory correctly. In other words, rooting of trees using outgroups is problematic if our assumptions about the status of outgroups is incorrect.

I hope we agree so far.

I employed the term “basal outgroup” exactly along these lines. Perhaps, I need to be a little more careful how I express myself in class.

I refer you to an activity provided by Professor David Hillis for his textbook Principles of Life

I also have my students finish Hillis’ “Working with Data” worksheet I have my students do in class: http://tinyurl.com/o6bfp2f

Let’s try this one more time using Vertebrate cladograms.

Humans would be an appropriate “out group” to root Echinoderm Phylogenetic Trees.

Sea Cucumbers would be an appropriate “out group” to root Vertebrate Phylogenetic Trees.

I guess my question would be:

Can jawless fish represent an appropriate “out group” to root a
Vertebrate phylogenetic tree that included the “remaining” vertebrate clades, Gnathostomata, Osteichthyes, Choanata & Tetrapoda?

BUT on the other hand

Gnathostomata would NOT be an appropriate “out group” to root a phylogenetic tree that included all the “remaining” vertebrate clades as well as the clade that comprised Jawless Fish?!

That was my intent when stating basal outgroup rather than simply sister clade as Piotr suggested.

Furthermore, I humply suggest that Yes – it is reasonable to speculate which differences between deuterostomes and lopotochozoans and ecdysozoans are derived and in which taxon…

Please tell me I am not confused, or I will need to seek the nearest brick wall to bounce my head against.

I thank you all for your patience and for your indulgence.
ReplyDelete
Replies
Tom MuellerTuesday, May 06, 2014 10:15:00 AM
This comment has been removed by the author.
ReplyDelete
Replies
Tom MuellerTuesday, May 06, 2014 2:53:00 PM
Just to set this discussion in historical context. Jacob and Monod in their original paper employed “operator locus” and “operator gene” interchangeably.

https://www.pasteur.fr/ip/resource/filecenter/document/01s-000046-03t/genetic-regulatory.pdf

The 1968 Edition of Levine’s Genetics specifically refers to “operator gene”

http://tinyurl.com/o55pumj

I found 23 hits for “operator gene” in PNAS, the last one being 1973
http://tinyurl.com/mbg3hl8

Searching google scholar, I can find the specific term “operator gene” (albeit with decreasing frequency) in the 1970’s, 1980’s, 1990’s and even beyond.

I can even search the term with under the restriction “since 2010” http://tinyurl.com/l2u8mmp

Finally, the current Wikipedia article on Operon cites “operator gene no less than 3 times.

http://en.wikipedia.org/wiki/Operon

I recognize that Larry is correct when he explains that current consensus favors a more modern vernacular by invoking cis-Regulatory Elements (CREs) as opposed to “genes”

http://en.wikipedia.org/wiki/Cis-regulatory_element

My contention is unambitious – I merely state that it wasn’t always so and consensus on this change in terminology does not yet appear unanimous.
ReplyDelete
Replies

Add comment