More Recent Comments

Showing posts with label Genes. Show all posts
Showing posts with label Genes. Show all posts

Friday, July 14, 2017

Revisiting the genetic load argument with Dan Graur

The genetic load argument is one of the oldest arguments for junk DNA and it's one of the most powerful arguments that most of our genome must be junk. The concept dates back to J.B.S. Haldane in the late 1930s but the modern argument traditionally begins with Hermann Muller's classic paper from 1950. It has been extended and refined by him and many others since then (Muller, 1950; Muller, 1966).

Sunday, July 02, 2017

Confusion about the number of genes

My last post was about confusion over the sizes of the human and mouse genomes based on a recent paper by Breschi et al. (2017). Their statements about the number of genes in those species are also confusing. Here's what they say about the human genome.
[According to Ensembl86] the human genome encodes 58,037 genes, of which approximately one-third are protein-coding (19,950), and yields 198,093 transcripts. By comparison, the mouse genome encodes 48,709 genes, of which half are protein-coding (22,018 genes), and yields 118,925 transcripts overall.
The very latest Ensembl estimates (April 2017) for Homo sapiens and Mus musculus are similar. The difference in gene numbers between mouse and human is not significant according to the authors ...
The discrepancy in total number of annotated genes between the two species is unlikely to reflect differences in underlying biology, and can be attributed to the less advanced state of the mouse annotation.
This is correct but it doesn't explain the other numbers. There's general agreement on the number of protein-coding genes in mammals. They all have about 20,000 genes. There is no agreement on the number of genes for functional noncoding RNAs. In its latest build, Ensemble says there are 14,727 lncRNA genes, 5,362 genes for small noncoding RNAs, and 2,222 other genes for nocoding RNAs. The total number of non-protein-coding genes is 22,311.

There is no solid evidence to support this claim. It's true there are many transcripts resembling functional noncoding RNAs but claiming these identify true genes requires evidence that they have a biological function. It would be okay to call them "potential" genes or "possible" genes but the annotators are going beyond the data when they decide that these are actually genes.

Breschi et al. mention the number of transcripts. I don't know what method Ensembl uses to identify a functional transcript. Are these splice variants of protein-coding genes?

The rest of the review discusses the similarities between human and mouse genes. They point out, correctly, that about 16,000 protein-coding genes are orthologous. With respect to lncRNAs they discuss all the problems in comparing human and mouse lncRNA and conclude that "... the current catalogues of orthologous lncRNAs are still highly incomplete and inaccurate." There are several studies suggesting that only 1,000-2,000 lncRNAs are orthologous. Unfortunately, there's very little overlap between the two most comprehensive studies (189 lncRNAs in common).

There are two obvious possibilities. First, it's possible that these RNAs are just due to transcriptional noise and that's why the ones in the mouse and human genomes are different. Second, all these RNAs are functional but the genes have arisen separately in the two lineages. This means that about 10,000 genes for biologically functional lncRNAs have arisen in each of the genomes over the past 100 million years.

Breschi et al. don't discuss the first possibility.


Breschi, A., Gingeras, T.R., and Guigó, R. (2017) Comparative transcriptomics in human and mouse. Nature Reviews Genetics [doi: 10.1038/nrg.2017.19]

Wednesday, May 10, 2017

Debating philosophers: Pierrick Bourrat responds to my criticism of his paper

I recently criticized a paper by Lu and Bourrat on the extended evolutionary synthesis [Debating philosophers: The Lu and Bourrat paper]. Pierrick Bourrat responds in this guest post.


by Pierrick Bourrat
Research Fellow, Department of Philosophy
Macquarie University
Sydney, Australia

Both Qiaoying Lu and I are grateful to Professor Moran for the copious attention he has bestowed on our paper. We are early career researchers and didn’t expect our paper to receive so much attention from a senior academic in a public forum. Moran claims that our work is out of touch with science (and more generally works in philosophy of biology), that the paper is weakly argued and that some of what we write is false. But in the end, he puts forward a similar position to ours.

Thursday, May 04, 2017

Debating philosophers: The molecular gene

This is my fifth post on the Lu and Bourrat paper [Debating philosophers: The Lu and Bourrat paper]. The authors are attempting to justify the inclusion of epigenetics into current evolutionary theory by re-defining the concept of "gene," specifically the evolutionary gene concept. So far, I've discussed their understanding of current evolutionary theory and why I think it is flawed [Debating philosophers: The Modern Synthesis]. I described their view of "genes" and pointed out the confusion between "genes" and "alleles" and why I think "alleles" is the better term [Debating philosophers: The difference between genes and alleles]. In my last post I discussed their definition of the evolutionary gene and why it is too adaptationist to serve a useful function [Debating philosophers: The evolutionary gene].

Wednesday, May 03, 2017

Debating philosophers: The evolutionary gene

This is the forth post on the Lu and Bourrat paper [Debating philosophers: The Lu and Bourrat paper]. The philosophers are attempting to redefine the word "gene" in order to make epigenetics compatible with current evolutionary theory.

I define a gene in the following way: "A gene is a DNA sequence that is transcribed to produce a functional product" [What Is a Gene?]. This is a biochemical/molecular definition and it's not the same as the definition used in traditional evolution.

Lu and Bourrat discuss the history of the evolutionary gene and conclude,

Debating philosophers: The difference between genes and alleles

This is my third post on the Lu and Bourrat (2017) paper [Debating philosophers: The Lu and Bourrat paper]. Part of their argument is to establish that modern evolutionary theory is a gene-centric theory. They need to make this connection because they are about to re-define the word "gene" in order to accommodate epigenetics.

In my last post I referred to their defense of the Modern Synthesis and quoted them as saying that the major tenets of the Modern Synthesis (MS) are still the basis of modern evolutionary theory. They go on to say,

Tuesday, May 02, 2017

Debating philosophers: The Lu and Bourrat paper

John Wilkins posted a link on Facebook to a recent paper by his colleagues in Australia. The authors are Qiaoying Lu of the Department of Philosophy at Macquarie University in Sidney Australia and Pierrick Bourat of the Department of Philosophy at The University of Sydney in Sidney Australia.

Lu, Q., and Bourrat, P. (2017) The evolutionary gene and the extended evolutionary synthesis. The British Journal for the Philosophy of Science, (advanced article) April 20, 2017. [doi: 10.1093/bjps/axw035] [PhilSci Archive]

Abstract: Advocates of an ‘extended evolutionary synthesis’ have claimed that standard evolutionary theory fails to accommodate epigenetic inheritance. The opponents of the extended synthesis argue that the evidence for epigenetic inheritance causing adaptive evolution in nature is insufficient. We suggest that the ambiguity surrounding the conception of the gene represents a background semantic issue in the debate. Starting from Haig’s gene-selectionist framework and Griffiths and Neumann-Held’s notion of the evolutionary gene, we define senses of ‘gene’, ‘environment’, and ‘phenotype’ in a way that makes them consistent with gene-centric evolutionary theory. We argue that the evolutionary gene, when being materialized, need not be restricted to nucleic acids but can encompass other heritable units such as epialleles. If the evolutionary gene is understood more broadly, and the notions of environment and phenotype are defined accordingly, current evolutionary theory does not require a major conceptual change in order to incorporate the mechanisms of epigenetic inheritance.

1 Introduction
2 The Gene-centric Evolutionary Theory and the ‘Evolutionary Gene’
      2.1 The evolutionary gene
      2.2 Genes, phenotypes, and environments
3 Epigenetic Inheritance and the Gene-Centred Framework
      3.1 Treating the gene as the sole heritable material?
      3.2 Epigenetics and phenotypic plasticity
4 Conclusion

Saturday, April 08, 2017

Somatic cell mutation rate in humans

A few years ago, Tomasetti and Vogelstein (2015) published a paper where they noted a correlation between rates of cancer and the number of cell divisions. They concluded that a lot of cancers could be attributed to bad luck. This conclusion didn't sit well with most people for two reasons. (1) There are many well-known environmental effects that increase cancer rates (e.g. smoking, radiation), and (2) there's a widespread belief that you can significantly reduce your chances of getting cancer by "healthy living" (whatever that is). The first objection is based on solid scientific evidence but the second one is not as scientific.

Some of the objections to the original Tomasetti and Vogelstein paper were based on the mathematical models they used to reach their conclusions. The authors have now followed up on their original study with more data. The paper appears in the March 24, 2017 issue of Science (Tomasetti and Vogelstein, 2017). If you're interested in the debate over "bad luck" you should read the accompanying review by Nowak and Waclaw (2017). They conclude that the math is sound and many cancer-causing mutations are, in fact, due to chance mutations in somatic cells. They point out something that should be obvious but bears repeating.

Saturday, January 07, 2017

What the heck is epigenetics?

"Epigenetics" is the (relatively) new buzzword. Old-fashioned genetics is boring so if you want to convince people (and grant agencies) that you're on the frontlines of research you have to say you're working on epigenetics. Even better, you can tell them that you are on the verge of overthrowing Darwinism and bringing back Jean-Baptiste Lamarck.

But you need to be careful if you adopt this strategy. Don't let anyone pin you down by defining "epigenetics." It's best to leave it as ambiguous as possible so you can adopt the Humpty-Dumpty strategy.1 Sarah C.P. Williams made that mistake a few years ago and incurred the wrath of Mark Ptashne [Core Misconcept: Epigenetics].

Friday, January 06, 2017

Genetic variation in the human population

With a current population size of over 7 billion, the human population should contain a huge amount of genetic variation. Most of it resides in junk DNA so it's of little consequence. We would like to know more about the amount of variation in functional regions of the genome because it tells us something about population genetics and evolutionary theory.

A recent paper in Nature (Aug. 2016) looked at a large dataset of 60,706 individuals. They sequenced the protein-coding regions of all these people to see what kind of variation existed (Lek et al., 2016) (ExAC). The group included representatives from all parts of the world although it was heavily weighted toward Europeans. The authors used a procedure called "principal component analysis" (PCA) to cluster the individuals according to their genetic characteristics. The analysis led to the typical clustering by "population clusters." (That term is used to avoid the words "race" and/or "subspecies.")


Thursday, January 05, 2017

Birth and death of genes in a hybrid frog genome

De novo genes1 are quite rare but genome duplications are quite common. Sometimes the duplicated regions contain genes so the new genome contains two copies of a gene that was formerly present in only one copy. "Common" in this sense means on a scale of millions of years. Michael Lynch and his colleague have calculated that the rate of fixed gene duplication is about 0.01 per gene per million years (Lynch and Conery, 2003 a,b; Lynch 2007). Since a typical vertebrate has more than 20,000 genes, this means that 200 genes will be duplicated and fixed every million years.


The initial duplication event is likely to be deleterious since there will now be redundant DNA in the genome. The slightly deleterious allele (duplication) can be purged by negative selection in species with large population sizes (e.g. bacteria). But in species with smaller populations, natural selection is not powerful enough to eliminate slightly deleterious alleles so the duplication persists and may become fixed in the population.

Tuesday, December 20, 2016

Is the high frequency of blood type O in native Americans due to random genetic drift?

The frequency of blood type O is very high in some populations of native Americans. In many North American tribes, for example, the frequency is over 90% and often approaches 100%. A majority of individuals in those populations have blood type O (homozygous for the O allele). [see Theme: ABO Blood Types]

Since there's no solid evidence that blood types are adaptive,1 the standard explanation is random genetic drift.

Jerry Coyne explains it in Why Evolution Is True.
One example of evolution by drift may be the unusual frequencies of blood types (as in the ABO system) in the Old Order Amish and Dunker religious communities in America. These are small, isolated, religious groups whose members intermarry—just the right circumstances for rapid evolution by genetic drift.

Accidents of sampling can also happen when a population is founded by just a few immigrants, as occurs when individuals colonize an island or a new area. The almost complete absence of genes producing the B blood type in Native American populations, for example, may reflect the loss of this gene in a small population of humans that colonized North America from Asia around twelve thousand years ago.

Tuesday, August 23, 2016

Splice variants of the human triose phosphate isomerase gene: is alternative splicing real?

Triose phosphate isomerase (TIM) is one of the enzymes in the gluconeogenesis pathway leading to the synthesis of glucose from simple precursors. It also plays a role in the degradation of glucose (glycolysis). The enzyme catalyzes the following reaction ....


Triose phosphate isomerase is found in almost all species. The structure and sequence of the enzyme is well-conserved. It is a classic β-barrel enzyme that usually forms a dimer. The overall structure of a single subunit is classic example of an αβ-barrel known as a TIM-barrel in reference to this enzyme.

To the best of my knowledge, no significant variants of this enzyme due to alternative promoters, alternative splicing, or proteolytic cleavage are known.1 The enzyme has been actively studied in biochemistry laboratories for at least eighty years.

Thursday, July 28, 2016

False history and the number of genes: 2016

There's an article about junk DNA in the latest issue of New Scientist. The title is: You are junk: Why it’s not your genes that make you human. The author is Colin Barras, a science writer from Michigan with a Ph.D. in paleontology.

He begins with .....
IT WAS a discovery that threatened to overturn everything we thought about what makes us human. At the dawn of the new millennium, two rival teams were vying to be the first to sequence the human genome. Their findings, published in February 2001, made headlines around the world. Back-of-the-envelope calculations had suggested that to account for the sheer complexity of human biology, our genome should contain roughly 100,000 genes. The estimate was wildly off. Both groups put the actual figure at around 30,000. We now think it is even fewer – just 20,000 or so.

"It was a massive shock," says geneticist John Mattick. "That number is tiny. It’s effectively the same as a microscopic worm that has just 1000 cells."

Sunday, July 10, 2016

What is a "gene" and how do genes work according to Siddhartha Mukherjee?

It's difficult to explain fundamental concepts of biology to the average person. That's why I'm so interested in Siddhartha Mukherjee's book "The Gene: an intimate history." It's a #1 bestseller so he must be doing something right.

My working definition of a gene is based on a blog post from several years ago [What Is a Gene?].
A gene is a DNA sequence that is transcribed to produce a functional product.
This covers two types of genes: those that eventually produce proteins (polypeptides); and those that produce functional noncoding RNAs. This distinction is important when discussing what's in our genome.

Wednesday, June 15, 2016

What does a person's genome reveal about their ethnicity and their appearance?

If you knew the complete genome sequence of someone could you tell where they came from and their ethnic background (race)? The answer is confusing according to Siddhartha Mukherjee writing in his latest book "The Gene: an intimate history." The answer appears to be "yes" but then Mukherjee denies that knowing where someone came from tells us anything about their genome or their phenotype. He writes the following on page 342.

... the genetic diversity within any racial group dominates the diversity between racial groups. This degree of intraracial variability makes "race" a poor surrogate for nearly any feature: in a genetic sense, an African man from Nigeria is so "different" from another man from Namibia that it makes little sense the lump them into the same category.

For race and genetics, then, the genome is strictly a one-way street. You can use the genome to predict where X or Y came from. But knowing where A or B came from, you can predict little about the person's genome. Or: every genome carries a signature of an individual's ancestry—but an individual's racial ancestry predicts little about the person's genome. You can sequence DNA from an African-American man and conclude that his ancestors came from Sierra Leone or Nigeria. But if you encounter a man whose great-grandparents came from Nigeria or Sierra Leone, you can say little about the features of this particular man. The geneticist goes home happy; the racist returns empty-handed.
I find this view very strange. Imagine that you were an anthropologist who was an expert on humans and human evolution. Imagine you were told that there's a woman in the next room whose eight great-grandparents all came from Japan. According to Mukherjee, such a scientist could not predict anything about the features of that woman. Does that make any sense?

I suspect this is just a convoluted way of reconciling science with political correctness.

Steven Monroe Lipkin has a different view. He's a medical geneticist who recently published a book with Jon R. Luoma titled "The Age of Genomes: tales from the front lines of genetic medicine." Here's how they explain it on page 6.
Many ethnic groups carry distinct signatures. For example, from a genome sequence you can usually tell if an individual is African-American, Caucasian, Asian, Satnami, or Ashkenazi Jew, even if you've never laid eyes on the patient. A well-regarded research scientist whom I had never met made his genome sequence publically available as part of a research study. I remember scrolling through his genetic variant files and trying, more successfully than I had expected, to guess what he would look like before I peeked at his webpage photo. The personal genome is more than skin deep.
This makes more sense to me. If you know what you look for—and Simon Monroe certainly does—then many of the features of a particular person can be deduced from their genome sequence. And if you know which variants are more common in certain ethnic groups then you can certainly predict what a person might look like just by knowing where their ancestors came from.

What's wrong with that?


Tuesday, May 24, 2016

University of Toronto press release distorts conclusions of RNA paper

My colleague, Ben Blencowe, just published a paper ...

Sharma, E., Sterne-Weiler, T., O’Hanlon, D., and Blencowe, B.J. (2016) Global Mapping of Human RNA-RNA Interactions. Molecular Cell, [doi: 10.1016/j.molcel.2016.04.030]

ABSTRACT (Summary)

The majority of the human genome is transcribed into non-coding (nc)RNAs that lack known biological functions or else are only partially characterized. Numerous characterized ncRNAs function via base pairing with target RNA sequences to direct their biological activities, which include critical roles in RNA processing, modification, turnover, and translation. To define roles for ncRNAs, we have developed a method enabling the global-scale mapping of RNA-RNA duplexes crosslinked in vivo, ‘‘LIGation of interacting RNA followed by high-throughput sequencing’’ (LIGR-seq). Applying this method in human cells reveals a remarkable landscape of RNA-RNA interactions involving all major classes of ncRNA and mRNA. LIGR-seq data reveal unexpected interactions between small nucleolar (sno) RNAs and mRNAs, including those involving the orphan C/D box snoRNA, SNORD83B, that control steady-state levels of its target mRNAs. LIGR-seq thus represents a powerful approach for illuminating the functions of the myriad of uncharacterized RNAs that act via base-pairing interactions.

Thursday, January 28, 2016

"The Selfish Gene" turns 40

Richard Dawkins published The Selfish Gene 40 years ago and Matt Ridley notes the anniversary in a Nature article published today (Jan. 28, 2016): In retrospect: The selfish gene.

I don't remember when I first read it—probably the following year when the paperback version came out. I found it quite interesting but I was a bit put off by the emphasis on adaptation (taken from George Williams) and the idea of inclusive fitness (from W.D. Hamilton). I also didn't much like the distinction between vehicles and replicators and the idea that it was the gene, not the individual, that was the unit of selection ("selection" not "evolution").
It is finally time to return to the problem with which we started, to the tension between individual organism and gene as rival candidates for the central role in natural selection...One way of sorting this whole matter out is to use the terms ‘replicator’ and ‘vehicle’. The fundamental units of natural selection, the basic things that survive or fail to survive, that form lineages of identical copies with occasional random mutations, are called replicators. DNA molecules are replicators. They generally, for reasons that we shall come to, gang together into large communal survival machines or ‘vehicles’.

Richard Dawkins

Sunday, January 17, 2016

Origin of de novo genes in humans

We know quite a lot about the origin of new genes (Carvunis et al., 2012; Kaessman, 2010; Long et al., 2003; Long et al., 2013; Näsvall et al., 2012); Neme and Tautz, 2013; Schlötterer, 2015; Tautz and Domazet-Lošo (2011); Wu et al., 2011). Most of them are derived from gene duplication events and subsequent divergence. A smaller number are formed de novo from sequences that were not part of a gene in the ancestral species.

In spite of what you might have read in the popular literature, there are not a large number of newly formed genes in most species. Genes that appear to be unique to a single species are called "orphan" genes. When a genome is first sequenced there will always be a large number of potential orphan genes because the gene prediction software tilts toward false positives in order to minimize false negatives. Further investigation and annotation reduces the number of potential genes.

Thursday, December 10, 2015

How many human protein-coding genes are essential for cell survival?

The human genome contains about 20,000 protein-coding genes and about 5,000 genes that specify functional RNAs. We would like to know how many of those genes are essential for the survival of an individual and for long-term survival of the species.

It would be almost as interesting to know how many are required for just survival of a particular cell. This set is the group of so-called "housekeeping genes." They are necessary for basic metabolic activity and basic cell structure. Some of these genes are the genes for ribosomal RNA, tRNAs, the RNAs involved in splicing, and many other types of RNA. Some of them are the protein-coding genes for RNA polymerase subunits, ribosomal proteins, enzymes of lipid metabolism, and many other enzymes.

The ability to knock out human genes using CRISPR technology has opened to door to testing for essential genes in tissue culture cells. The idea is to disrupt every gene and screen to see if it's required for cell viability in culture.

Three papers using this approach have appeared recently:
Blomen, V.A., Májek, P., Jae, L.T., Bigenzahn, J.W., Nieuwenhuis, J., Staring, J., Sacco, R., van Diemen, F.R., Olk, N., and Stukalov, A. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science, 350:1092-1096. [doi: 10.1126/science.aac7557 ]

Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E. S., and Sabatini, D.M. (2015) Identification and characterization of essential genes in the human genome. Science, 350:1096-1101. [doi: 10.1126/science.aac7041]

Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., and Sun, S. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163:1515-1526. [doi: 10.1016/j.cell.2015.11.015]
Each group identified between 1500 and 2000 protein-coding genes that are essential in their chosen cell lines.

One of the annoying things about all three papers is that they use the words "gene" and "protein-coding gene" as synonyms. The only genes they screened were protein-coding genes but the authors act as though that covers ALL genes. I hope they don't really believe that. I hope it's just sloppy thinking when they say that their 1800 essential "genes" represent 9.2% of all genes in the genome (Wang et al. 2015). What they meant is that they represent 9.2% of protein-coding genes.

By looking only at genes that are essential for cell survival, they are ignoring all those genes that are specifically required in other cell types. For example, they will not identify any of the genes for olfactory receptors or any of the genes for keratin or collagen. They won't detect any of the genes required for spermatogenesis or embryonic development.

What they should detect is all of the genes required in core metabolism.

The numbers seen too low to me so I looked for some specific examples.

The HSP70 gene family encodes the major heat shock protein of molecular weight 70,000. The protein functions as a chaperone to help fold other proteins. They are among the most highly conserved genes in all of biology and they are essential. The three genes for the normal cellular proteins are HSPA5 (Bip, the ER protein); HSPA8 (the cytoplasmic version); and HSPA9 (mitochondrial version). All three are essential in the Blomen et al. paper. Only HSPA5 and HSPA9 are essential in Hunt et al. (This is an error.) (I can't figure out how to identify essential genes in the Wang et al. paper.)

There are two inducible genes, HSPA1A and HSPA1B. These are the genes activated by heat shock and other forms of stress and they churn out a lot of HSP70 chaperone in order to save the cells. There are not essential genes in the Blomen et al. paper and they weren't tested in the Hunt et al. paper. This is an example of the kind of gene that will be missed in the screen because the cells were not stressed during the screening.

I really don't like these genomics papers because all they do is summarize the results in broad terms. I want to know about specific genes so I can see if the results conform to expectations.

I looked first at the genes encoding the enzymes for gluconeogenesis and glycolysis. The results are from the Blomen et al. paper. In the figure below, the genes names in RED are essential and the ones in blue are not.


As you can see, at least one of the genes for the six core enzymes is essential. But none of the other genes is essential. This is a surprise since I expect both pathways (gluconeogenesis and glycolysis) to be active and essential in those cells. Perhaps the cells can survive for a few days without making these enzymes. It means they can't take up glucose because one of the hexokinase enzymes should be essential.

These result suggest that the Blomen et al. study is overlooking some important essential genes.

Now let's look at the citric acid cycle. All of the enzymes should be essential.


That's very strange. It's hard to imagine that cells in culture can survive without any of the genes for the subunits of the pyruvate dehydrogenase complex or the subunits of the succinyl C0A synthetase complex. Or malate dehydrogenase, for that matter.

Something is wrong here. The study must be missing some important essential genes. I wish the authors had looked at some specific sets of genes and told us the results for well-known genes. That would allow us to evaluate the results. Perhaps this sort of thing isn't done when you are in "genomics" mode?

The "core fitness" protein-coding genes that were identified are more highly conserved than the other genes and they tend to be more highly expressed. They also show lower levels of variation within the human population. This is consistent with basic housekeeping features.

Each group identified several hundred unannotated genes in their core sample. These are genes with no known function (yet).

The results of the three studies do not overlap precisely but most of the essential genes were common to all three analyses.