More Recent Comments

Sunday, July 10, 2016

What is a "gene" and how do genes work according to Siddhartha Mukherjee?

It's difficult to explain fundamental concepts of biology to the average person. That's why I'm so interested in Siddhartha Mukherjee's book "The Gene: an intimate history." It's a #1 bestseller so he must be doing something right.

My working definition of a gene is based on a blog post from several years ago [What Is a Gene?].
A gene is a DNA sequence that is transcribed to produce a functional product.
This covers two types of genes: those that eventually produce proteins (polypeptides); and those that produce functional noncoding RNAs. This distinction is important when discussing what's in our genome.

My definition of a gene, which is shared by many scientists, includes introns. In the case of protein-coding genes it includes the parts of the gene specifying untranslated sequences at the ends of an mRNA molecule (5′-UTRs and 3′-UTRs). Thus, protein-coding genes make up 25-30% of our genome. Most of that fraction is noncoding and most of it is junk. Coding regions are only 1.25% of the human genome. It is misleading to say that genes make up only 2% of our genome.

My definition of a gene does not include the regulatory regions that control gene expression. These sequences play an essential role in the genome and they are largely responsible for the differences between closely related species. Mutations in those non-gene regions often cause genetic diseases in humans—a major topic in Mukherjee's book.

His book has a short glossary where you can find the following definition of a gene (p. 499) ...
Gene: A unit of inheritance, normally comprised of a stretch of DNA that codes for a protein or for an RNA chain (in special cases, it might be carried in RNA form).
This isn't as clear as I would like but maybe it's okay for a general audience. There are lots of units of inheritance that aren't genes—regulatory regions are prime examples—but maybe the average person doesn't need to know this? Unfortunately, the discussion in the main text doesn't stick to the glossary definition. It focuses almost exclusively on protein-coding genes. There are several figures based on a diagram like this ...

This view, according to Mukherjee, is Crick's Central Dogma. He writes on page 169,
Crick was referring to the striking universality of the flow of genetic information throughout biology. From bacteria to elephants—from red-eyed flies to blue-blooded princes—biological information flowed through living systems in a systematic, archetypal manner: DNA provided instructions to build RNA. RNA provided instructions to build proteins. Proteins ultimately enabled structure and function—bringing genes to life.
This is, of course, an incorrect description of Crick's Central Dogma1 but, more importantly, it's a restricted definition of genes and information. Is this misrepresentation excusable when writing for a general audience? Does the average reader need to know that there's information outside of genes and that some genes don't encode proteins?

Mukherjee doesn't ignore introns. He describes them on pages 219-220 but it's not clear that he considers them to be a part of a gene. What is clear is that he has bought into the idea that introns have a purpose. According to Mukherjee, introns allow shuffling of protein-coding regions to create "a vast number of variant messages—called isoforms—out of a single gene." This is consistent with the message in most of the book. Like many scientists, Mukherjee adores adaptive explanations. You will not find anything in this book that suggests evolution by accident.

This includes the DNA between genes. Mukherjee says they are there to regulate genes (p. 220).

You might be wondering if Mukherjee addresses the human genome and the controversy over junk DNA. The answer is "no." He doesn't give his readers much information on this topic. The relevant chapter2 is the one beginning on page 322 "The Book of Man (in Twenty-Three Volumes)."3 It's five pages of bullet points.

Let's look at some of them.
  • It has 3,088,286,401 letters of DNA (give or take a few).
The actual size is 3.2 billion base pairs [How Big Is the Human Genome?]. The amount of DNA that has actually been sequenced and organized into scaffolds will depend on the build—the latest ones cover about 92% of the genome [How Much of Our Genome Is Sequenced?].

The exact number of bases isn't important to the average reader but if you are going to include it in your book shouldn't it be the correct value?
  • It encodes about 20,687 genes in total—only 1,796 more than worms, 12,000 fewer than corn, and 25,000 fewer than rice or wheat. The difference between "human" and "breakfast cereal" is not a matter of gene numbers, but of sophistication of gene networks. It is not what we have; it is how we use it.
It sounds like Siddhartha Mukherjee might have a mild case of the The Deflated Ego Problem. Both humans and breakfast cereal have sophisticated gene networks so that's not really a significant difference. The difference is in how and when genes are expressed but also, in this case, in the types of genes in the genome. Humans and rice plants have thousands of different genes that aren't shared.

There are about 25,000 genes in the human genome.4 All mammals have about the same number of genes and they all have pretty much the same genes. The difference between whales, bats, elephants, and humans are largely due to differences in when and where developmental genes are expressed during embryogenesis. It's not what all these species have, it's how they use it that makes most of the difference. Humans are not special.

I think it's time to stop being surprised by the fact that some species might have more genes than we do and time to explain why some plants might have more genes. And it's time to stop saying that humans might have a more sophisticated way of controlling their genes. Non-experts5 might have been surprised by the low number of genes back in 2001 but that was 15 years ago. Get over it.

If your ego has been deflated by the fact that we don't have lots more genes than breakfast cereal, then you'd better come up with an explanation other than the fact that you just don't understand evolution. I listed the seven most common rationalizations. One of them is alternative splicing. Another is "sophisticated" and highly precise gene regulation. Mukherjee goes part way down the path of using some of these rationalizations to explain his disappointment at our low number of genes ....
  • It [the human genome] is fiercely inventive. It squeezes complexity out of simplicity. It orchestrates the activation or repression or certain genes in only certain cells and at certain times, creating unique contexts and partners for each gene in time and space, and thus produces near-infinite functional variation out of its limited repertoire. [all multicellular species do this - LAM] And it mixes and matches gene modules—called exons—within single genes to extract even further combinational diversity out of its gene repertoire. These two strategies—gene regulation and gene splicing—appear to be used more extensively in the human genome than in the genomes of most organisms. More than the enormity of gene numbers, the diversity of gene types, or the originality of gene function, it is the ingenuity of our genome that is the secret to our complexity.
This is false. First, we are not significantly more complex than whales, bats, and elephants and not more complex than fruit flies that can fly and can exist in two very different forms; adult and larva.

Second, gene regulation in humans is no different than gene regulation in other species.

Third, alternative splicing exists but it only affects a small number of genes and, for the most part, those genes are also alternatively spliced in all other mammals. The idea that most human genes are alternatively spliced to produce different functional proteins is certainly false. And the idea that only humans can do this is even more false!

What about junk DNA? Here's another bullet point ...
  • Genes, oddly, comprise only a minuscule fraction of it. An enormous proportion—a bewildering 98 percent—is not dedicated to genes per se, but to enormous stretches that are interspersed between genes (intergenic DNA) or within genes (introns). These long stretches encode no RNA [introns? - LAM], and no protein; they exist in the genome either because they regulate gene expression, or for reasons that we do not yet understand, or because of no reason whatsoever (i.e. they are "junk" DNA).
Remember that expert scientists have known that most of our genome is junk for over 40 years. Isn't it time we stopped telling the general public that this is "odd" or "bewildering"?

This is the only attempt at explaining junk DNA and the idea that much of our genome could be there for "no reason." I wonder what the average person thinks when they are told, once again, that genes make up only 2% of our genome. I bet they focus on the idea that much of the rest is devoted to regulation and that we just don't understand what else is going on. This is misleading.

It's 2016 and we know a lot about noncoding DNA and a lot about how much of our genome is junk. Isn't it time we explain this to the general public?

Why doesn't Siddhartha Mukherlee do this when he's got the chance?
  • Although we fully understand the genetic code—i.e., how the information in a single gene is used to build a protein—we comprehend virutally nothing about the genetic code—i.e., how multiple genes spread across the human genome coordinate gene expression in space and time to build, maintain, and repair a human organism. The genetic code is simple: DNA is used to build RNA, and RNA is used to build a protein. A triplet of bases in DNA specifies one amino acid in a protein. The genomic code is complex: appended to a gene are sequences of DNA that carry information on when and where to express the gene. We do not know why certain genes are located in particular geographical locations in the genome, and how tracts of DNA that lie between genes regulate and coordinate gene physiology. There are codes beyond codes, like mountains beyond mountains. [my emphasis - LAM]
I've been writing textbooks on biochemistry and molecular biology for 30 years and I've been reading textbooks for much longer than that. All those books contain plenty of information on the regulation of gene expression. We know a heck of a lot about transcription factors and DNA binding and we know a heck of a lot about why some genes are expressed in some cells and not in others.

Why would Siddhartha Mukherjee give his readers the impression that this is a big mystery? Do you agree with him?

I'm interested in whether the general public—and most science journalists—are being told the truth about the human genome. I think they are being given a very false view of the modern state of knowledge about the regulation of gene expression and the amount of junk in our genome. Here's how Mukherjees sums up his views in the epilogue (p. 486-487) ...
Three enormous projects lie ahead for human genetics. All three concern discrimination, division, and eventual reconstruction. The first is to discern the exact nature of information in the human genome. The Human Genome Project provided the starting point for this inquiry, but it raised a series of intriguing questions about what, precisely, is "encoded" by the 3 billion nucleotides of human DNA. What are the functional elements in the genome? There are protein-coding genes, of course—about twenty-one to twenty-four thousand in all—but also regulatory sequences of genes, and stretches of DNA (introns) that split genes into modules. There is information to build tens of thousands of RNA molecules that do not get translated into proteins but seem to perform diverse roles in cellular physiology. There are long highways of "junk" DNA that are unlikely to be junk after all and may encode hundreds of yet-unknown functions. There are kinks and folds that allow one part of the chromosome to associate with another in three-dimensional space.

To understand the role of each of these elements, a vast international project, launched in 2013 (sic), hopes to create a compendium of every functional element in the human genome—i.e., any part of any sequence in any chromosome that has a coding or instructional function. Ingeniously termed the Encyclopedia of DNA Elements (ENC-O-DE), this project will cross-annotate the sequence of the human genome against all the information contained within it.

Once these functional "elements" have been identified, biologists can move to the second challenge: understanding how the elements can be combined in time and space to enable human embryology and physiology, the specification of anatomical parts, and the development of an organism's features and characteristics. One humbling fact about our understanding of the human genome is how little we know of the human genome: much of our knowledge of our genes and their functions is inferred from similar-looking genes in yeast, worms, flies, and mice.
I think this is very misleading. Perhaps it's just a case of Mukherjee seeing the glass half empty whereas I see it as half full. He focuses on all the things we don't know whereas I think he giving short shrift to everything we do know.

Is this all, or is there something else going on? Is it possible that Mukherjee doesn't know enough about genomes and gene regulation to have an informed opinion?

No matter what the reason, the public is being misinformed about the state of knowledge in biochemistry, molecular biology, developmental biology, and genomics. This book is being bought—and presumably read—by a huge number of people. Most reviews are glowing.

Some reviewers have enthusiastically embraced Mukherjee's point of view. For example, here's what Nathaniel Comfort wrote in The Atlantic [Genes Are Overrated].
Ironically, the more we study the genome, the more “the gene” recedes. A genome was initially defined as an organism’s complete set of genes. When I was in college, in the 1980s, humans had 100,000; today, only about 20,000 protein-coding genes are recognized. Those that remain are modular, repurposed, mixed and matched. They overlap and interleave. Some can be read forward or backward. The number of diseases understood to be caused by a single gene is shrinking; most genes’ effects on any given disease are small. Only about 1 percent of our genome encodes proteins. The rest is DNA dark matter. It is still incompletely understood, but some of it involves regulation of the genome itself. Some scientists who study non-protein-coding DNA are even moving away from the gene as a physical thing. They think of it as a “higher-order concept” or a “framework” that shifts with the needs of the cell. The old genome was a linear set of instructions, interspersed with junk; the new genome is a dynamic, three-dimensional body—as the geneticist Barbara McClintock called it, presciently, in 1983, a “sensitive organ of the cell.”

The point is not that this is the correct way to understand the genome. The point is that science is not a march toward truth. Rather, as the author John McPhee wrote in 1967, “science erases what was previously true.” Every generation of scientists mulches under yesterday’s facts to fertilize those of tomorrow.
It's going to take a lot of work to convince the readers of The Atlantic that a lot of "old science" is still valid and there's nothing wrong with the old definition of a gene.


1. The Central Dogma of Molecular Biology.

2. Chapters are not numbered.

3. That would be 22 autosomes, plus one X chromosome plus one Y chromosome = 23!!!

4. We don't have a very good estimate for the total number of genes that specify noncoding RNAs.

5. False History and the Number of Genes 2010 and Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome

38 comments :

Tim Tyler said...

RNA viruses mean that any definition of "gene" based on DNA is already obsolete. Past, future, and alien organisms might have other, different genetic substrates. At this stage, tying your definition of "gene" and "genetic" to nucleic acids seems very parochial.

Larry Moran said...

I'm happy to be taught the meaning of "parochial" from an expert like you.

Did you bother to read my blog post?

... I didn't think so.

SRM said...

At this stage, tying your definition of "gene" and "genetic" to nucleic acids seems very parochial.

What stage are we in, exactly? Did I miss another paradigm-shifting transition?

Marcoli said...

I rather like your definition of a gene, but here is my somewhat different definition of a gene:
A gene is a region of DNA that is used to make a functioning molecule of RNA.
This definition of a gene includes regions of DNA that regulate transcription (promoters and so on). What I like about this aspect of the definition is that it permits one to include mutations effecting a regulatory regions in the list of mutations of genes. This is something that people do anyway, and so we have consistency here. What I don't like about this definition again has to do with gene regulatory regions. These can be hard to identify, they can be dispersed, and they can control more than one 'gene'. I must allow that some things are just going to be hard to define in concrete terms.

Robert Byers said...

I only recently discovered most of biology had the same number of genes and then it was said it was how they were used.
This fits a common blueprint concept from a creator quite well.
In fact one would think evolutionism would welcome or easily accomadate if biological entities had as many different number of genes. Not a common number.

Anyways its how they are used. or rather the issue of information.
so information systems is the glory of genes and not the genes themselves.
this welcome again to ID/YEC because the information origin is suggestive of the creator.
anyways.
I speculate, repeat speculate, why is it not a better concept to say genes are memory bytes like in computers? I mean why not genes be seen as just pieces of memory. Whether information or systems it all comes down to them having a fixed conclusion for operation/function. This to me is just memory.
so from these tiny memories bits then mans memory is just another manifestation of this organization.
We are very defined by our memory and so possibly its just a simple extension of the memory bits called genes.

DK said...

@Tim Tyler
All known functional transcription happens from DNA. Hence, if it never makes it into DNA, it's not a gene.

Also, go easy on "different genetic substrates" fantasies. Wild imagination =/= science.

Fré Hoogendoorn said...

I must admit that I have been hesitating in whether I want to read Mukherjee's book, as it does seem to not represent the current state of knowlegde of what a gene is and does very well. I've been reading Larry's blog for quite some time, and he has certainly shaped my view of the subject to quite an extent.

One thing in this post that was new to me, however, was that Larry's definition of a gene "does not include the regulatory regions that control gene expression". I would have thought that regulatory regions could only function if they are expressed as RNA, which surely would be a functional product and would hence fall under the definition of a gene. Could someone explain this, or give a link to where this is explained?

Corneel said...

Hi Fré,

A good place to start start learning about regulation of gene expression is the lac-operon (which was the first instance of gene regulation to be characterised). You can find it in any textbook, but of course there's a page on Wikipedia as well.

Graham Jones said...

See https://en.wikipedia.org/wiki/Promoter_(genetics)

Tim Tyler said...

Not much attempt at a defense, instead, ad hominen. The RNA world is not "wild imagination". We can be pretty sure that our ancestors had a non-DNA genetic substrate at some point. It would be nice to say that these ancestors had genes. However, we can't do that if biochemists keep promoting archaic notions of what a "gene" is.

Larry Moran said...

@Tim Tyler

I asked if you had bothered to read my post on What Is a Gene?. The reason I asked is because it contains the following sentence ...

We could refine the definition by including RNA genes but that’s such a insignificant percentage of all genes that the refinement is hardly worth it. As we shall see, there are more significant limitations to the definition.

Larry Moran said...

Here's what I say in my post ...

There are regions upstream of the promoter that control whether or not the gene is transcribed. These regions are called regulatory regions. They may contain binding sites for various proteins that will attach there in order to enhance the binding of RNA polymerase to the promoter. One of the differences between my preferred definition of a gene and others is that some other definitions include the promoter and the regulatory region.

There are two problems with such definitions. First, they’re not consistent with standard usage when we talk about the regulation of gene expression. We don’t say that only “part” of a gene is transcribed, which would be correct if we included the regulatory region in our definition of a gene. How often have we heard anyone say that regulatory sequences control the expression of part of the gene? That doesn’t make sense.

Larry Moran said...

The lac Operon

Repression of the lac Operon

The Lactose Paradox

How RNA Polymerase Binds to DNA

DNA Binding Proteins

Brian said...

Fre: Regulatory regions function by containing clusters of discrete (5-20 bp, with exceptions) specific recognition sequences to which various DNA-binding transcription factors bind, which drives the site-specific recruitment of chromatin remodeling complexes and ultimately the RNA polymerase holocomplex that actually transcribes the gene from a discrete start site. That is, they function as DNA, not as transcribed RNA.

Jonathan Badger said...

Non-experts might have been surprised by the low number of genes...

In regard to the idea that classical geneticists had it "right" in regard to the number of human genes before the genome era and that the molecular biologists with their 100,000 estimate had it wrong, you have to consider that it is quite possible to get a "right" answer for the wrong reasons, which is still a wrong answer scientifically. Is there any evidence that reasoning used in estimating the 20,000 human genes prior to the genomic era was generally applicable? Did classical geneticists also correctly estimate the number of E. coli genes? How about the number of Daphnia genes?

Tim Tyler said...

I'm happy to hear you recognize that there are some problems. The main problem I have with these narrow versions of 'gene' and 'genetics' is that they open the flood gates to those who would criticize and discredit these concepts whenever some new form of heredity comes along. Grandfather clocks, food boluses, location, the environment and lots of other things are inherited along with DNA. I keep hearing that there's more to heredity than genes - or that there's such a thing as 'epigenetic' inheritance. Genes are supposed to be the units of heredity - and they were until middle of the last century. Then a bunch of physical "gene" concepts gained currency - and because the same word was used, people muddled up the biochemical gene and the evolutionary gene - and much confusion ensued. Using the term 'gene' was OK, but redefining the term to exclude any other type of heritable material was a short-sighted mistake. Mistakes happen, but then they need to be corrected. We can't have generation on generation of students learning that 'gene' means one thing in biochemistry class and another thing in evolution class. It is confusing - and completely unnecessary.

Tim Tyler said...

Here's Steven Pinker on the narrow molecular biology gene:

"Molecular biologists have appropriated the term "gene" to refer to stretches of DNA that code for a protein. Unfortunately, this sense differs from the one used in population genetics, behavioral genetics, and evolutionary theory, namely any information carrier that is transmissible across generations and has sustained effects on the phenotype. This includes any aspect of DNA that can affect gene expression, and is closer to what is meant by "innate" than genes in the molecular biologists' narrow sense."

IMO, the molecular biologists should recant. They got confused about what the term "gene" means - and then confused a lot of other people in the process. However, it's not too late. The sooner the molecular biologists throw in the towel, the less confusion they will cause.

Unknown said...

The definition Pinker alludes to is Williams definition and apart from a very narrow section of evolutionary biology it has never had much traction and even there you would specifically note that you are using Williams definition and preferably even use a term like Williams-gene. The molecular definition of a gene is far more widespread and has obvious utility in the field. It's unlikely to go away. On the other hand I think the Williams gene has some nice properties and the worst part of it is that it is called gene. We need the concept, but since the name is far more commonly used in a different way another term would be preferable.

It's also worth noting that Mukherjee provides a definition that is somewhat of a trainwreck because he has obviously read things that refer to Williams genes, not understood how these are different from molecular genes and then produced a mashup that works in neither case.

Jonathan Badger said...

And I suppose physicists should recant because the "atoms" they talk about today aren't the solid geometric solids of Democritus or even just the indivisible elemental units of Dalton.

Unknown said...

@Jonathan: Well, in these cases you can at least point to some historical development that lead to a shift in meaning as the properties of matter became clearer. The Williams definition includes a lot of things that aren't even DNA sequences (for instance a protein coding DNA sequence and the amino acid sequence it codes for are different Williams genes). And depending on what you are looking at molecular genes may not qualify as Williams genes (few protein coding sequences are conserved enough to be WGs on the species level for instance). That both concepts share a word is a problem (see also species vs. types, where a lot of things that make perfect sense as types are introduced as alternative species concepts. See also the general proximate vs. ultimate causes distinction in biology. I'm sure there's more. For some reason we like to introduce novel concepts and instead of giving them a new name so they can easily be differentiated from existing concepts, we just reuse an existing term and call our concept an alternative concept of that term. Oh, also see about 20 or so different concepts of diversity).

Tim Tyler said...

The terms 'gene' and 'genetics' flourished for 50 years at the start of the last century before the molecular biologists came on the scene. 'Gene' meant 'hereditary factor' back then - and genetics was all about tracking and forecasting inherited variation. Much of this was before George Williams was born. The idea that all genes in modern living organisms are made out of nucleic acid would have been a conjecture during that era - one now long-ago disproven. To evolutionists, the molecular biologist's "gene" looks like an attempt to impose this mistaken dogma as fact by definitional fiat.

Larry Moran said...

There's plenty of evidence that geneticists and evolutionary biologists were making accurate predictions of the number of genes back in the early 1970s. The reasoning based on genetic load is solid and the understanding of mutation rates was accurate.

The biochemists and molecular biologists were basing their predictions on hybridization studies showing that there were only 10-20 thousand mRNAs in most species. It wasn't just the "classical geneticists" who made accurate predictions.

By the 1980s there was even more evidence from studying fruit fly yeast and bacterial genomes. Developmental biologists were telling us that humans didn't need to have very many more genes than these other species.

This is why Benjamin Lewin could say with some confidence that humans have 30,000 - 40,000 genes when he wrote the 2nd edition of his textbook in 1983. That was thought to be a generous estimate even back then.

The Alberts et al. textbook (Molecular Biology of the Cell) estimated 30,000 in 1983.

In my own book (Moran, Scrimgeour, et al. 1994) I said that mammals have 20,000 - 50,000 genes. This was based on an extensive review of the literature. I believed that the number was closer to 20,000 but this was a time when much larger numbers were being quoted in the popular press and you can't get too far ahead of the average biochemistry lecturer.

These historical estimates not just lucky guesses that were right for the wrong reasons. Instead it was the inflated estimates of the uninformed that were wrong for the right reasons [False History and the Number of Genes 2010].

I realize that Craig Venter and the people at Celera and TIG were among the group predicting many more genes. You may have been influenced by their views but, believe me, they did not represent the people I hung out with who had studied the problem.

Jonathan Badger said...

The problem is the "molecular biologist's gene" is the only thing that's really there in the cell and can actually have a biological effect. "Hereditary factors" were simply crude ways of understanding what was going on before we knew better.

Tim Tyler said...

Jonathan, that's a mistaken perspective. Organisms inherit all kinds of things from their parents in ways that don't involve coding it DNA, including diet preferences, stress levels, resources, money, tattoos, dress styles - and so on. This is widely acknowledged and supported by extensive evidence. All of these things are biologically active and some of them even have high-fidelity transmission and exhibit cumulative adaptive evolution. The idea that inheritance boils down to DNA transmission is a fundamentally mistaken one.

Jonathan Badger said...

1) Almost all of that is only "inheritance" in a purely poetical sense. Tattoos and dress styles are not the stuff of (natural) science.
2) Some things you mention like stress levels may have some actual biological cross-generational inheritance through epigenetics. But the thing about epigenetics (despite being overhyped) is it is all due to actual molecular biological genes involved in methylation and the like.

Tim Tyler said...

Um, it's not "poetic", it's the inheritance that is needed for Darwinian evolution. It produces adaptations and adaptive fitness. If you think that cultural inheritance is merely "poetic", you need to hit the library. There's now an extensive literature of Darwinian cultural evolution. Perhaps start with Mesoudi, A (2011) Cultural Evolution, and Richerson, P. J. and Christiansen, M. H. (2013) Cultural Evolution. Or for something online, maybe start with: Mesoudi, A, Whiten, A, Laland, K. N. (2004), Perspective: is human cultural evolution Darwinian? Evidence reviewed from the perspective of the Origin of Species.

Of course, tattoos and dress styles can be studied scientifically. They are part of biology, the study of living systems. If you think human beings are somehow walled off from the natural sciences, you need another reading list addressing that issue.

SRM said...

From Mesoudi et al: "The claim that human culture evolves through the differential adoption of cultural variants, in a manner analogous to the evolution of biological species, has been greeted with much resistance and confusion."

Well, the word "analogous" in that sentence is key and just how analogous is another matter for debate. The extent to which cultural effects track with molecular genetics is another.

The conflation, potentially confused at times, of potentially similar things may be useful for the philosopher and may be useful in general. But it cannot be imposed upon the molecular biologist who strives to understand biochemical structure and function. The extent to which you can know much about the molecular biology of genetics and the cell stems from the work of generations of focused molecular biologists who did not indulge too excessively in fuzzy thinking.

Graham Jones said...

Larry, referring to protein-coding genes, said "Coding regions are only 1.25% of the human genome." Michael Lynch says 0.8%.

1.25% of 3.2 billion is 400 million, so given 20k protein-coding genes, that's 2000bp per gene on average, which seems too high to me (though I am vague about average gene copy numbers). So Lynch's figure seems more reasonable to me.

Larry Moran said...

I think the average MW of human proteins is about 70,000 daltons so 2000 bp sounds about right to me. The actual data varies all over the map depending on how you count pseudogenes and presumptive splice variants.

Graham Jones said...

When they measure that 70000, how do they deal with dimers, trimers, etc?

Larry Moran said...

I should have said "polypeptide" instead of "protein." I thought it was clear from the context that I was talking about polypeptides - the primary product of a protein-coding gene. Apparently I was wrong.

Some recent results have suggested a somewhat lower number but they are based on older data that included many small genes that have disappeared from the latest genome builds.

SRM said...

Average bacterial genes usually cited as around 1000 nt, and I was under impression that avg eukaryotic coding region slightly larger (maybe 1200 nt or so, but don't have source in front of me right now). Is it true more recent data indicates that avg eukaryotic proteins twice the mass of bacterial? I wonder to what extent this is skewed by a few extremely large eukaryotic proteins.

Larry Moran said...

It is definitely skewed by some very large polypetides. That's okay because the goal was to calculate the total amount of coding DNA.

David B said...

Are there any major problems with this book? I am a layman interested in the subject but am reading this critique as if it's somewhat of a nitpick.

Larry Moran said...

1. Mukherjee fails to explain properlyly what a gene is. It's the subject of his book.

2. He tells his readers that most of the genome probably has a function when the best scientific evidence says that it's 90% junk.

3. He doesn't describe correctly how genes are expressed. Instead, he mumbles something incoherent about epigenetics claiming that it overthrows modern evolutionary theory.

4. He claims that alternative splicing explains how humans can made one hundred thousand different proteins from only 20,000 genes. This is not true.

5. Throughout the book he emphasizes our lack of knowledge of genetics and molecular biology when, in fact, the real problem is HIS lack of knowledge.

6. Throughout the book he emphasizes the supposedly "new" information that's come out in the past decade—information that has changed the way we look at genes. In fact, very little new information has changed the way we look at genes.

He gets the history of genetics wrong and his warnings about the dangers of modifying genes are 50 years old.

Other than that, there are no problems with the book.

Rolf Aalberg said...

I bought the book, I read it, and am none the wiser for that. Guess I'll do more research before buying my next book.

David B said...

Lol, okay so I guess this book will go straight to the used bookstore without being read. Any recommendations for an alternative?

Andrew said...

FYI Siddhartha Mukherjee is coming to town.

http://www.torontopubliclibrary.ca/detail.jsp?Em=1&Entt=RDMEVT252529&R=EVT252529