Sandwalk: June 2017

Tuesday, June 27, 2017

Debating alternative splicing (Part IV)

In Debating alternative splicing (Part III) I discussed a review published in the February 2017 issue of Trends in Biochemical Sciences. The review examined the data on detecting predicted protein isoforms and concluded that there was little evidence they existed.

My colleague at the University of Toronto, Ben Blencowe, is a forceful proponent of massive alternative splicing. He responded in a letter published in the June 2017 issue of Trends in Biochemical Sciences (Blencowe, 2017). It's worth looking at his letter in order to understand the position of alternative splicing proponents. He begins by saying,

It is estimated that approximately 95% of multiexonic human genes give rise to transcripts containing more than 100 000 distinct AS events [3,4]. The majority of these AS events display tissue-dependent variation and 10–30% are subject to pronounced cell, tissue, or condition-specific regulation [4].

Monday, June 26, 2017

Debating alternative splicing (Part III)

Proponents of massive alternative splicing argue that most human genes produce many different protein isoforms. According to these scientists, this means that humans can make about 100,000 different proteins from only ~20,000 protein-coding genes. They tend to believe humans are considerably more complex than other animals even though we have about the same number of genes. They think alternative splicing accounts for this complexity [see The Deflated Ego Problem].

Opponents (I am one) argue that most splice variants are due to splicing errors and most of those predicted protein isoforms don't exist. (We also argue that the differences between humans and other animals can be adequately explained by differential regulation of 20,000 protein-coding genes.) The controversy can only be resolved when proponents of massive alternative splicing provide evidence to support their claim that there are 100,000 functional proteins.

Saturday, June 24, 2017

Debating alternative splicing (part II)

Mammalian genomes are very large. It looks like 90% of it is junk DNA. These genomes are pervasively transcribed, meaning that almost 90% of the bases are complementary to a transcript produced at some time during development. I think most of those transcripts are due to inappropriate transcription initiation. They are mistakes in transcription. The genome is littered with transcription factor binding sites but only a small percentage are directly involved in regulating gene expression. The rest are due to spurious binding—a well-known property of DNA binding proteins. These conclusions are based, I believe, on a proper understanding of evolution and basic biochemistry.

If you add up all the known genes, they cover about 30% of the genome sequence. Most of this (>90%) is intron sequence and introns are mostly junk. The standard mammalian gene is transcribed to produce a precursor RNA that is subsequently processed by splicing out introns to produce a mature RNA. If it's a messenger RNA (mRNA) then it will be translated to produce a protein (technically, a polypeptide). So far, the vast majority of protein-coding genes produce a single protein but there are some classic cases of alternative splicing where a given gene produces several different protein isoforms, each of which has a specific function.

Friday, June 23, 2017

Debating alternative splicing (part I)

I recently had a chance to talk science with my old friend and colleague Jack Greenblatt. He has recently teamed up with some of my other colleagues at the University of Toronto to publish a paper on alternative splicing in mouse cells. Over the years I have had numerous discussions with these colleagues since they are proponents of massive alternative splicing in mammals. I think most splice variants are due to splicing errors.

There's always a problem with terminology whenever we get involved in this debate. My position is that it's easy to detect splice variants but they should be called "splice variants" until it has been firmly established that the variants have a biological function. This is not a distinction that's acceptable to proponents of massive alternative splicing. They use the term "alternative splicing" to refer to any set of processing variants regardless of whether they are splicing errors or real examples of regulation. This sometimes makes it difficult to have a discussion.

In fact, most of my colleagues seem reluctant to admit that some splice variants could be due to meaningless errors in splicing. Thus, they can't be pinned down when I ask them what percentage of variants are genuine examples of alternative splicing and what percentage are splicing mistakes. I usually ask them to pick out a specific gene, show me all the splice variants that have been detected, and explain which ones are functional and which ones aren't. I have a standing challenge to do this with any one of three sets of genes [A Challenge to Fans of Alternative Splicing].

Human genes for the enzymes of glycolysis
Human genes for the subunits of RNA polymerase with an emphasis on the large conserved subunits
Human genes for ribosomal proteins

I realize that proponents of massive alternative splicing are not under any obligation to respond to my challenge but it sure would help if they did.

Thursday, June 22, 2017

Are most transcription factor binding sites functional?

The ongoing debate over junk DNA often revolves around data collected by ENCODE and others. The idea that most of our genome is transcribed (pervasive transcription) seems to indicate that genes occupy most of the genome. The opposing view is that most of these transcripts are accidental products of spurious transcription. We see the same opposing views when it comes to transcription factor binding sites. ENCODE and their supporters have mapped millions of binding sites throughout the genome and they believe this represent abundant and exquisite regulation. The opposing view is that most of these binding sites are spurious and non-functional.

The messy view is supported by many studies on the biophysical properties of transcription factor binding. These studies show that any DNA binding protein has a low affinity for random sequence DNA. They will also bind with much higher affinity to sequences that resemble, but do not precisely match, the specific binding site [How RNA Polymerase Binds to DNA; DNA Binding Proteins]. If you take a species with a large genome, like us, then a typical DNA protein binding site of 6 bp will be present, by chance alone, at 800,000 sites. Not all of those sites will be bound by the transcription factor in vivo because some of the DNA will be tightly wrapped up in dense chromatin domains. Nevertheless, an appreciable percentage of the genome will be available for binding so that typical ENCODE assays detect thousand of binding sites for each transcription factor.

This information appears in all the best textbooks and it used to be a standard part of undergraduate courses in molecular biology and biochemistry. As far as I can tell, the current generation of new biochemistry researchers wasn't taught this information.

Jonathan Wells talks about junk DNA

Watch this video. It dates from this year. Almost everything Wells says is either false or misleading. Why? Is he incapable of learning about genomes, junk DNA, and evolutionary theory?

Some of my former students

Some of my former students were able to come to my retirement reception yesterday: Sean Blaine (left), Anna Gagliardi, Marc Perry.

Hot slash buns

I love hot "cross" buns but now we buy the atheist version.

I retired after 39 years and they gave me an old used book

... but it was a rather special book ...

Wednesday, June 21, 2017

John Mattick still claims that most lncRNAs are functional

Most of the human genome is transcribed at some time or another in some tissue or another. The phenomenon is now known as pervasive transcription. Scientists have known about it for almost half a century.

At first the phenomenon seemed really puzzling since it was known that coding regions accounted for less than 1% of the genome and genetic load arguments suggested that only a small percentage of the genome could be functional. It was also known that more than half the genome consists of repetitive sequences that we now know are bits and pieces of defective transposons. It seemed unlikely back then that transcripts of defective transposons could be functional.

Part of the problem was solved with the discovery of RNA processing, especially splicing. It soon became apparent (by the early 1980s) that a typical protein coding gene was stretched out over 37,000 bp of which only 1300 bp were coding region. The rest was introns and intron sequences appeared to be mostly junk.

Tuesday, June 20, 2017

On the evolution of duplicated genes: subfunctionalization vs neofunctionalization

New genes can arise by gene duplication. These events are quite common on an evolutionary time scale. In the current human population, for example, there are about 100 examples of polymorphic gene duplications. These are cases where some of us have two copies of a gene while others have only one copy (Zarrie et al., 2015). Humans have gained about 700 new genes by duplication and fixation since we diverged from chimpanzees (Demuth et al., 2006). The average rate of duplication in eukaryotes is about 0.01 events per gene per million years and the half-life of a duplicated gene is about 4 million years (Lynch and Conery, 2003).

The typical fate of these duplicated genes is to "die" by mutation or deletion. There are five possible fates [see Birth and death of genes in a hybrid frog genome]:

One of the genes will "die" by acquiring fatal mutations. It becomes a pseudogene.
One of the genes will die by deletion.
Both genes will survive because having extra gene product (e.g. protein) will be beneficial (gene dosage).
One of the genes acquires a new beneficial mutation that creates a new function and at the same time causes loss of the old function (neofunctionalization). Now both genes are retained by positive selection and the complexity of the genome has increased.
Both genes acquire mutations that diminish function so the genome now needs two copies of the gene in order to survive (subfunctionalization).

Monday, June 19, 2017

Austin Hughes and Neutral Theory

Austin Hughes (1949 - 2015) died a few years ago. He was one of my favorite evolutionary biologists.

Chase Nelson has written a nice summary of Hughes' work at: Austin L. Hughes: The Neutral Theory of Evolution. It's worth reading the first few pages if you aren't clear on the concept. Here's an excerpt ...

When the technology enabling the study of molecular polymorphisms—variations in the sequences of genes and proteins—first arose, a great deal more variability was discovered in natural populations than most evolutionary biologists had expected under natural selection. The neutral theory made the bold claim that these polymorphisms become prevalent through chance alone. It sees polymorphism and long-term evolutionary change as two aspects of the same phenomenon: random changes in the frequencies of alleles. While the neutral theory does not deny that natural selection may be important in adaptive evolutionary change, it does claim that natural selection accounts for a very small fraction of genetic evolution.

A dramatic consequence now follows. Most evolutionary change at the genetic level is not adaptive.

It is difficult to imagine random changes accomplishing so much. But random genetic drift is now widely recognized as one of the most important mechanisms of evolution.

I don't think there's any doubt that this claim is correct as long as you stick to the proper definition of evolution. The vast majority of fixations of alleles are likely due to random genetic drift and not natural selection.

If you don't understand this then you don't understand evolution.

The only quibble I have with the essay is the reference to "Neutral Theory of Evolution" as the antithesis of "Darwinian Evolution" or evolution by natural selection. I think "Neutral Theory" should be restricted to the idea that many alleles are neutral or nearly neutral. These alleles can change in frequency in a population by random genetic drift. The key idea that's anti-Darwinian includes that fact plus two other important facts:

New beneficial alleles can be lost by drift before they ever become fixed. In fact, this is the fate of most new beneficial alleles. It's part of the drift-barrier hypothesis.
Detrimental alleles can occasionally become fixed in a population due to drift.

In both cases, the alleles are not neutral. The key to understanding the overall process is random genetic drift not the idea of neutral alleles—although that's also important.

Originally proposed by Motoo Kimura, Jack King, and Thomas Jukes, the neutral theory of molecular evolution is inherently non-Darwinian. Darwinism asserts that natural selection is the driving force of evolutionary change. It is the claim of the neutral theory, on the other hand, that the majority of evolutionary change is due to chance.

I would just add that it's Neutral Theory PLUS the other effects of random genetic drift that make evolution much more random than most people believe.

Austin Hughes was a skeptic and a creative thinker who often disagreed with the prevailing dogma in the field of evolutionary biology. He was also very religious, a fact I find very puzzling.

His scientific views were often correct, in my opinion.

In 2013, the ENCODE (Encyclopedia of DNA Elements) Project published results suggesting that eighty per cent of the human genome serves some function. This was considered a rebuttal to the widely held view that a large part of the genome was junk, debris collected over the course of evolution. Hughes sided with his friend Dan Graur in rejecting this point of view. Their argument was simple. Only ten per cent of the human genome shows signs of purifying selection, as opposed to neutrality.

Saturday, June 17, 2017

I coulda been an astronomer

A long time ago I used to belong to the Royal Astronomical Society (amateur astronomers) in Ottawa (Canada). That's me on the right with some of my friends. We were testing our sun filters and getting ready to see Venus when the sun went down.

In spite of this promising beginning, I decided to go into biology because it was harder and more interesting.

Tuesday, June 06, 2017

June 6, 1944

Today is anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy.¹

For baby boomers it means a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket firing typhoons in close support of the ground troops. During the initial days his missions were limited to quick strikes and reconnaissance since Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day.

Stephen Meyer "predicts" there's no junk DNA

Here's an interview with Stephen Meyer on the Evolution 2.0 website: Stephen Meyer Debates Perry Marshall – Intelligent Design vs. Evolution 2.0. I'm posting some remarks by Stephen Meyer in order to preserve them for posterity. Meyer should know by now that the evidence for junk DNA is very solid and the ENCODE declarations are wrong. The fact that he persists in spreading false information about the ID "prediction" is revealing.