More Recent Comments

Showing posts with label Gene Expression. Show all posts
Showing posts with label Gene Expression. Show all posts

Sunday, August 25, 2024

Some transcription factors can be both activators and repressors! Textbooks have been saying this for decades

This is another post about a bad press release based on a lack of knowledge of the history of the field.

Here's the press release from Washington State University as reported in SciTechDaily

Scientists Discover “Spatial Grammar” in DNA: Breakthrough Could Rewrite Genetics Textbooks

“Contrary to what you will find in textbooks, transcription factors that act as true activators or repressors are surprisingly rare,” said WSU assistant professor Sascha Duttke, who led much of the research at WSU’s School of Molecular Biosciences in the College of Veterinary Medicine.

Rather, the scientists found that most activators can also function as repressors.

“If you remove an activator, your hypothesis is you lose activation,” said Bayley McDonald, a WSU graduate student who was part of the research team. “But that was true in only 50% to 60% of the cases, so we knew something was off.”

Looking closer, researchers found the function of many transcription factors was highly position-dependent.

They discovered that the spacing between transcription factors and their position relative to where a gene’s transcription began determined the level of gene activity. For example, transcription factors might activate gene expression when positioned upstream or ahead of where a gene’s transcription begins but inhibit its activity when located downstream, or after a gene’s transcription start site.

... By integrating this newly discovered ‘spatial grammar,’ Christopher Benner, associate professor at UC San Diego, anticipates scientists can gain a deeper understanding of how mutations or genetic variations can affect gene expression and contribute to disease.

”The potential applications are vast,” Benner said. “At the very least, it will change the way scientists study gene expression.”

Friday, April 01, 2022

Illuminating dark matter in human DNA?

A few months ago, the press office of the University of California at San Diego issued a press release with a provocative title ...

Illuminating Dark Matter in Human DNA - Unprecedented Atlas of the "Book of Life"

The press release was posted on several prominent science websites and Facebook groups. According to the press release, much of the human genome remains mysterious (dark matter) even 20 years after it was sequenced. According to the senior author of the paper, Bing Ren, we still don't understand how genes are expressed and how they might go awry in genetic diseases. He says,

A major reason is that the majority of the human DNA sequence, more than 98 percent, is non-protein-coding, and we do not yet have a genetic code book to unlock the information embedded in these sequences.

We've heard that story before and it's getting very boring. We know that 90% of our genome is junk, about 1% encodes proteins, and another 9% contains lots of functional DNA sequences, including regulatory elements. We've known about regulatory elements for more than 50 years so there's nothing mysterious about that component of noncoding DNA.

Monday, May 10, 2021

MIT Professor Rick Young doesn't understand junk DNA

Richard ("Rick") Young is a Professor of Biology at the Massachusetts Institute of Technology and a member of the Whitehead Institute. His area of expertise is the regulation of gene expression in eukaryotes.

He was interviewed by Jorge Conde and Hanne Winarsky on a recent podcast (Feb. 1, 2021) where the main topic was "From Junk DNA to an RNA Revolution." They get just about everything wrong when they talk about junk DNA including the Central Dogma, historical estimates of the number of genes, confusing noncoding DNA with junk, alternative splicing, the number of functional RNAs, the amount of regulatory DNA, and assuming that scientists in the 1970s were idiots.

In this episode, a16z General Partner Jorge Conde and Bio Eats World host Hanne Winarsky talk to Professor Rick Young, Professor of Biology and head of the Young Lab at MIT—all about “junk” DNA, or non-coding DNA.

Which, it turns out—spoiler alert—isn’t junk at all. Much of this so-called junk DNA actually encodes RNA—which we now know has all sorts of incredibly important roles in the cell, many of which were previously thought of as only the domain of proteins. This conversation is all about what we know about what that non-coding genome actually does: how RNA works to regulate all kinds of different gene expression, cell types, and functions; how this has dramatically changed our understanding of how disease arises; and most importantly, what this means we can now do—programming cells, tuning functions up or down, or on or off. What we once thought of as “junk” is now giving us a powerful new tool in intervening in and treating disease—bringing in a whole new category of therapies.

Here's what I don't understand. How could a prominent scientist at one of the best universities in the world be so ignorant of a topic he chooses to discuss on a podcast? Perhaps you could excuse a busy scientist who doesn't have the time to research the topic but what excuse can you offer to explain why the entire culture at MIT and the Whitehead must also be ignorant? Does nobody there ever question their own ideas? Do they only read the papers that support their views and ignore all those that challenge those views?

This is a very serious question. It's the most difficult question I discuss in my book. Why has the false narrative about junk DNA, and many other things, dominated the scientific literature and become accepted dogma among leading scientists? Soemething is seriously wrong with science.


Saturday, August 01, 2020

ENCODE 3: A lesson in obfuscation and opaqueness

The Encyclopedia of DNA Elements (ENCODE) is a large-scale, and very expensive, attempt to map all of the functional elements in the human genome.

The preliminary study (ENCODE 1) was published in 2007 and the main publicity campaign surrounding that study focused on the fact that much of the human genome was transcribed. The implication was that most of the genome is functional. [see: The ENCODE publicity campaign of 2007].

The ENCODE 2 results were published in 2012 and the publicity campaign emphasized that up to 80% of our genome is functional. Many stories in the popular press touted the death of junk DNA. [see: What did the ENCODE Consortium say in 2012]

Both of these publicity campaigns, and the published conclusions, were heavily criticized for not understanding the distinction between fortuitous transcription and real genes and for not understanding the difference between fortuitous binding sites and functional binding sites. Hundreds of knowledgeable scientists pointed out that it was ridiculous for ENCODE researchers to claim that most of the human genome is functional based on their data. They also pointed out that ENCODE researchers ignored most of the evidence supporting junk DNA.

ENCODE 3 has just been published and the hype has been toned down considerably. Take a look at the main publicity article just published by Nature (ENCODE 3). The Nature article mentions ENCODE 1 and ENCODE 2 but it conveniently ignores the fact that Nature heavily promoted the demise of junk DNA back in 2007 and 2012. The emphasis now is not on how much of the genome is functional—the main goal of ENCODE—but on how much data has been generated and how many papers have been published. You can read the entire article and not see any mention of previous ENCODE/Nature claims. In fact, they don't even tell you how many genes ENCODE found or how many functional regulatory sites were detected.

The News and Views article isn't any better (Expanded ENCODE delivers invaluable genomic encyclopedia). Here's the opening paragraph of that article ...
Less than 2% of the human genome encodes proteins. A grand challenge for genomic sciences has been mapping the functional elements — the regions that determine the extent to which genes are expressed — in the remaining 98% of our DNA. The Encyclopedia of DNA Elements (ENCODE) project, among other large collaborative efforts, was established in 2003 to create a catalogue of these functional elements and to outline their roles in regulating gene expression. In nine papers in Nature, the ENCODE consortium delivers the third phase of its valuable project.1
You'd think with such an introduction that you would be about to learn how much of the genome is functional according to ENCODE 3 but you will be disappointed. There's nothing in that article about the number of genes, the number of regulatory sites, or the number of other functional elements in the human genome. It almost as if Nature wants to tell you about all of the work involved in "mapping the functional elements" without ever describing the results and conclusions. This is in marked contrast to the Nature publicity campaigns of 2007 and 2012 where they were more than willing to promote the (incorrect) conclusions.

In 2020 Nature seems to be more interested in obfuscation and opaqueness. One other thing is certain, the Nature editors and writers aren't the least bit interested in discussing their previous claims about 80% of the genome being functional!

I guess we'll have to rely on the ENCODE Consortium itself to give us a summary of their most recent findings. The summary paper has an intriguing title (Perspectives on ENCODE) that almost makes you think they will revisit the exaggerated claims of 2007 and 2012. No such luck. However, we do learn a little bit about the human genome.
  • 20,225 protein-coding genes [almost 1000 more than the best published estimates - LAM]
  • 37,595 noncoding genes [I strongly doubt they have evidence for that many functional genes]
  • 2,157,387 open chromatin regions [what does this mean?]
  • 1,224,154 transcription factor binding sites [how many are functional?]
That's it. The ENCODE Consortium seems to have learned only two things in 2012. They learned that it's better to avoid mentioning how much of the genome is functional in order to avoid controversy and criticism and they learned that it's best to ignore any of their previous claims for the same reason. This is not how science is supposed to work but the ENCODE Consortium has never been good at showing us how science is supposed to work.

Note: I've looked at some of the papers to try and find out if ENCODE stands by it's previous claim that most the genome is functional but they all seem to be written in a way that avoids committing to such a percentage or addressing the criticisms from 2007 and 2012. The only exception is a paper stating that cis-regulatory elements occupy 7.9% of the human genome (Expanded encyclopaedias of DNA elements in the human and mouse genomes). Please let me know if you come across anything interesting in those papers.


1. Isn't it about time to stop dwelling on the fact that 2% (actually less than 1%) of our genome encodes protein? We've known for decades that there are all kinds of other functional regions of the genome. No knowledgeable scientist thinks that the remaining 98% (99%) has no function.

Saturday, July 11, 2020

The coronavirus life cycle

The coronavirus life cycle is depicted in a figure from Fung and Liu (2019). See below for a brief description.
The virus particle attaches to receptors on the cell surface (mostly ACE2 in the case of SARS-CoV-2). It is taken into the cell by endocytosis and then the viral membrane fuses with the host membrane releasing the viral RNA. The viral RNA is translated to produce the 1a and 1ab polyproteins, which are cleaved to produce 16 nonstructural proteins (nsps). Most of the nsps assemble to from the replication-transcription complex (RTC). [see Structure and expression of the SARS-CoV-2 (coronavirus) genome]

RTC transcribes the original (+) strand creating (-) strands that are subsequently copied to make more viral (+) strands. RTC also produces a cluster of nine (-) strand subgenomic RNAs (sgRNAs) that are transcribed to make (+) sgRNAs that serve as mRNAs for the production of the structural proteins. N protein (nucleocapsid) binds to the viral (+) strand RNAs to help form new viral particles. The other structural proteins are synthesized in the endoplasmic reticulum (ER) where they assemble to form the protein-membrane virus particle that engulfs the viral RNA.

New virus particles are released when the vesicles fuse with the plasma membrane.

The entire life cycle takes about 10-16 hours and about 100 new virus particles are released before the cell commits suicide by apoptosis.


Fung, T.S. and Liu, D.X. (2019) Human coronavirus: host-pathogen interaction. Annual review of microbiology 73:529-557. [doi: 10.1146/annurev-micro-020518-115759]


Thursday, July 09, 2020

Structure and expression of the SARS-CoV-2 (coronavirus) genome


Coronaviruses are RNA viruses, which means that their genome is RNA, not DNA. All of the coronaviruses have similar genomes but I'm sure you are mostly interested in SARS-CoV-2, the virus that causes COVID-19. The first genome sequence of this virus was determined by Chinese scientists in early January and it was immediately posted on a public server [GenBank MN908947]. The viral RNA came from a patient in intensive care at the Wuhan Yin-Tan Hospital (China). The paper was accepted on Jan. 20th and it appeared in the Feb. 3rd issue of Nature (Zhou et al. 2020).

By the time the paper came out, several universities and pharmaceutical companies had already constructed potential therapeutics and several others had already cloned the genes and were preparing to publish the structures of the proteins.1

By now there are dozens and dozens of sequences of SARS-CoV-2 genomes from isolates in every part of the world. They are all very similar because the mutation rate in these RNA viruses is not high (about 10-6 per nucleotide per replication). The original isolate has a total length of 29,891 nt not counting the poly(A) tail. Note that these RNA viruses are about four times larger than a typical retrovirus; they are the largest known RNA viruses.

Monday, April 01, 2019

The frequency of splicing errors reflects the balance between selection and drift

Splice variants are very common in eukaryotes. We know that it's possible to detect dozens of different splice variants for each gene with multiple introns. In the past, these variants were thought to be examples of differential regulation by alternative spicing but we now know that most of them are due to splicing errors. Most of the variants have been removed from the sequence databases but many remain and they are annotated as examples of alternative splicing, which implies that they have a biological function.

I have blogged about splice variants many times, noting that alternative splicing is a very real phenomenon but it's probably restricted to just a small percentage of genes. Most of splice variants that remain in the databases are probably due to splicing errors. They are junk RNA [The persistent myth of alternative splicing].

The ongoing controversy over the origin of splice variants is beginning to attract attention in the scientific literature although it's fair to say that most scientists are still unaware of the controversy. They continue to believe that abundant alternative splicing is a real phenomenon and they don't realize that the data is more compatible with abundant splicing errors.

Some molecular evolution labs have become interested in the controversy and have devised tests of the two possibilities. I draw your attention to a paper that was published 18 months ago.

Friday, March 29, 2019

Are multiple transcription start sites functional or mistakes?

If you look in the various databases you'll see that most human genes have multiple transcription start sites. The evidence for the existence of these variants is solid—they exist—but it's not clear whether the minor start sites are truly functional or whether they are just due to mistakes in transcription initiation. They are included in the databases because annotators are unable to distinguish between these possibilities.

Let's look at the entry for the human triosephosphate isomerase gene (TPI1; Gene ID 7167).


The correct mRNA is NM_0003655, third from the top. (Trust me on this!). The three other variants have different transcription start sites: two of them are upstream and one is downstream of the major site. Are these variants functional or are they simply transcription initiation errors? This is the same problem that we dealt with when we looked at splice variants. In that case I concluded that most splice variants are due to splicing errors and true alternative splicing is rare.

Saturday, December 08, 2018

The persistent myth of alternative splicing

I'm convinced that widespread alternative splicing does not occur in humans or in any other species. It's true that the phenomenon exists but it's restricted to a small number of genes—probably fewer than 1000 genes in humans. Most of the unusual transcripts detected by modern technology are rare and unstable, which is consistent with the idea that they are due to splicing errors. Genome annotators have rejected almost all of those transcripts.

You can see links to my numerous posts on this topic at: Alternative splicing and the gene concept and Are splice variants functional or noise?.

Wednesday, December 05, 2018

The textbook view of alternative splicing

As most of you know, I'm interested in the problem of alternative splicing. I believe that the number of splice variants that have been detected is perfectly consistent with the known rate of splicing errors and that there's no significant evidence to support the claim that alternative splicing leading to the production of biologically relevant protein variants is widespread. In fact, there's plenty of evidence for the opposite view; namely, splicing errors (lack of conservation, low abundance, improbable protein predictions, inability to detect the predicted proteins).

My preferred explanation is definitely the minority view. What puzzles me is not the fact that the majority is wrong () but the fact that they completely ignore any other explanation of the data and consider the case for abundant alternative splicing to be settled.

Tuesday, March 13, 2018

Making Sense of Genes by Kostas Kampourakis

Kostas Kampourakis is a specialist in science education at the University of Geneva, Geneva (Switzerland). Most of his book is an argument against genetic determinism in the style of Richard Lewontin. You should read this book if you are interested in that argument. The best way to describe the main thesis is to quote from the last chapter.

Here is the take-home message of this book: Genes were initially conceived as immaterial factors with heuristic values for research, but along the way they acquired a parallel identity as DNA segments. The two identities never converged completely, and therefore the best we can do so far is to think of genes as DNA segments that encode functional products. There are neither 'genes for' characters nor 'genes for' diseases. Genes do nothing on their own, but are important resources for our self-regulated organism. If we insist in asking what genes do, we can accept that they are implicated in the development of characters and disease, and that they account for variation in characters in particular populations. Beyond that, we should remember that genes are part of an interactive genome that we have just begun to understand, the study of which has various limitations. Genes are not our essences, they do not determine who we are, and they are not the explanation of who we are and what we do. Therefore we are not the prisoners of any genetic fate. This is what the present book has aimed to explain.

Friday, February 09, 2018

Are splice variants functional or noise?

This is a post about alternative splicing. I've avoided using that term in the title because it's very misleading. Alternative splicing produces a number of different products (RNA or protein) from a single intron-containing gene. The phenomenon has been known for 35 years and there are quite a few very well-studied examples, including several where all of the splice regulatory factors have been characterized.

Monday, February 05, 2018

ENCODE's false claims about the number of regulatory sites per gene

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

I realize that most of you are tired of seeing criticisms of ENCODE but it's important to realize that most scientists fell hook-line-and-sinker for the ENCODE publicity campaign and they still don't know that most of the claims were ridiculous.

I was reminded of this when I re-read Brendan Maher's summary of the ENCODE results that were published in Nature on Sept. 6, 2012 (Maher, 2012). Maher's article appeared in the front section of the ENCODE issue.1 With respect to regulatory sequences he said ...
The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes ... But the job is far from done, says [Ewan] Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished.

Saturday, February 03, 2018

What's in Your Genome?: Chapter 5: Regulation and Control of Gene Expression

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?]. Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs [What's in Your Genome? Chapter 4: Pervasive Transcription].

Chapter 5 is Regulation and Control of Gene Expression.
Chapter 5: Regulation and Control of Gene Expression

What do we know about regulatory sequences?
The fundamental principles of regulation were worked out in the 1960s and 1970s by studying bacteria and bacteriophage. The initiation of transcription is controlled by activators and repressors that bind to DNA near the 5′ end of a gene. These transcription factors recognize relatively short sequences of DNA (6-10 bp) and their interactions have been well-characterized. Transcriptional regulation in eukaryotes is more complicated for two reasons. First, there are usually more transcription factors and more binding sites per gene. Second, access to binding sites depends of the state of chromatin. Nucleosomes forming high order structures create a "closed" domain where DNA binding sites are not accessible. In "open" domains the DNA is more accessible and transcription factors can bind. The transition between open and closed domains is an important addition to regulating gene expression in eukaryotes.
The limitations of genomics
By their very nature, genomics studies look at the big picture. Such studies can tell us a lot about how many transcription factors bind to DNA and how much of the genome is transcribed. They cannot tell you whether the data actually reflects function. For that, you have to take a more reductionist approach and dissect the roles of individual factors on individual genes. But working on single genes can be misleading ... you may miss the forest for the trees. Genomic studies have the opposite problem, they may see a forest where there are no trees.
Regulation and evolution
Much of what we see in evolution, especially when it comes to phenotypic differences between species, is due to differences in the regulation of shared genes. The idea dates back to the 1930s and the mechanisms were worked out mostly in the 1980s. It's the reason why all complex animals should have roughly the same number of genes—a prediction that was confirmed by sequencing the human genome. This is the field known as evo-devo or evolutionary developmental biology.
           Box 5-1: Can complex evolution evolve by accident?
Slightly harmful mutations can become fixed in a small population. This may cause a gene to be transcribed less frequently. Subsequent mutations that restore transcription may involve the binding of an additional factor to enhance transcription initiation. The result is more complex regulation that wasn't directly selected.
Open and closed chromatin domains
Gene expression in eukaryotes is regulated, in part, by changing the structure of chromatin. Genes in domains where nucleosomes are densely packed into compact structures are essentially invisible. Genes in more open domains are easily transcribed. In some species, the shift between open and closed domains is associated with methylation of DNA and modifications of histones but it's not clear whether these associations cause the shift or are merely a consequence of the shift.
           Box 5-2: X-chromosome inactivation
In females, one of the X-chromosomes is preferentially converted to a heterochromatic state where most of the genes are in closed domains. Consequently, many of the genes on the X chromosome are only expressed from one copy as is the case in males. The partial inactivation of an X-chromosome is mediated by a small regulatory RNA molecule and this inactivated state is passed on to all subsequent descendants of the original cell.
           Box 5-3: Regulating gene expression by
           rearranging the genome

In several cases, the regulation of gene expression is controlled by rearranging the genome to bring a gene under the control of a new promoter region. Such rearrangements also explain some developmental anomalies such as growth of legs on the head fruit flies instead of antennae. They also account for many cancers.
ENCODE does it again
Genomic studies carried out by the ENCODE Consortium reported that a large percentage of the human genome is devoted to regulation. What the studies actually showed is that there are a large number of binding sites for transcription factors. ENCODE did not present good evidence that these sites were functional.
Does regulation explain junk?
The presence of huge numbers of spurious DNA binding sites is perfectly consistent with the view that 90% of our genome is junk. The idea that a large percentage of our genome is devoted to transcriptional regulation is inconsistent with everything we know from the the studies of individual genes.
           Box 5-3: A thought experiment
Ford Doolittle asks us to imagine the following thought experiment. Take the fugu genome, which is very much smaller than the human genome, and the lungfish genome, which is very much larger, and subject them to the same ENCODE analysis that was performed on the human genome. All three genomes have approximately the same number of genes and most of those genes are homologous. Will the number of transcription factor biding sites be similar in all three species or will the number correlate with the size of the genomes and the amount of junk DNA?
Small RNAs—a revolutionary discovery?
Does the human genome contain hundreds of thousands of gene for small non-coding RNAs that are required for the complex regulation of the protein-coding genes?
A “theory” that just won’t die
"... we have refuted the specific claims that most of the observed transcription across the human genome is random and put forward the case over many years that the appearance of a vast layer of RNA-based epigenetic regulation was a necessary prerequisite to the emergence of developmentally and cognitively advanced organisms." (Mattick and Dinger, 2013)
What the heck is epigenetics?
Epigenetics is a confusing term. It refers loosely to the regulation of gene expression by factors other than differences in the DNA. It's generally assumed to cover things like methylation of DNA and modification of histones. Both of these effects can be passed on from one cell to the next following mitosis. That fact has been known for decades. It is not controversial. The controversy is about whether the heritability of epigenetic features plays a significant role in evolution.
           Box 5-5: The Weismann barrier
The Weisman barrier refers to the separation between somatic cells and the germ line in complex multicellular organisms. The "barrier" is the idea that changes (e.g. methylation, histone modification) that occur in somatic cells can be passed on to other somatic cells but in order to affect evolution those changes have to be transferred to the germ line. That's unlikely. It means that Lamarckian evolution is highly improbable in such species.
How should science journalists cover this story?
The question is whether a large part of the human genome is devoted to regulation thus accounting for an unexpectedly large genome. It's an explanation that attempts to refute the evidence for junk DNA. The issue is complex and very few science journalists are sufficiently informed enough to do it justice. They should, however, be making more of an effort to inform themselves about the controversial nature of the claims made by some scientists and they should be telling their readers that the issue has not yet been resolved.


Wednesday, January 31, 2018

Herding Hemingway's Cats by Kat Arney

Kat Arney has written a very good book on genes and gene expression. She covers all the important controversies in a thorough and thoughtful manner.

Kat Arney is a science writer based in the UK. She has a Ph.D. from the University of Cambridge where she worked on epigenetics and regulation in mice. She also did postdoc work at Imperial College in London. Her experience in the field of molecular biology and gene expression shows up clearly in her book where she demonstrates the appropriate skepticism and critical thinking in her coverage of the major advances in the field.

Tuesday, October 31, 2017

Escape from X chromosome inactivation

Mammals have two sex chromosomes: X and Y. Males have one X chromosome and one Y chromosome and females have two X chromosomes. Since females have two copies of each X chromosome gene, you might expect them to make twice as much gene product as males of the same species. In fact, males and females often make about the same amount of gene product because one of the female X chromosomes is inactivated by a mechanism that causes extensive chromatin condensation.

The mechanism is known as X chromosome inactivation. The phenomenon was originally discovered by Mary Lyon (1925-2014) [see Calico Cats].

Friday, August 25, 2017

How much of the human genome is devoted to regulation?

All available evidence suggests that about 90% of our genome is junk DNA. Many scientists are reluctant to accept this evidence—some of them are even unaware of the evidence [Five Things You Should Know if You Want to Participate in the Junk DNA Debate]. Many opponents of junk DNA suffer from what I call The Deflated Ego Problem. They are reluctant to concede that humans have about the same number of genes as all other mammals and only a few more than insects.

One of the common rationalizations is to speculate that while humans may have "only" 25,000 genes they are regulated and controlled in a much more sophisticated manner than the genes in other species. It's this extra level of control that makes humans special. Such speculations have been around for almost fifty years but they have gained in popularity since publication of the human genome sequence.

In some cases, the extra level of regulation is thought to be due to abundant regulatory RNAs. This means there must be tens of thousand of extra genes expressing these regulatory RNAs. John Mattick is the most vocal proponent of this idea and he won an award from the Human Genome Organization for "proving" that his speculation is correct! [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research]. Knowledgeable scientists know that Mattick is probably wrong. They believe that most of those transcripts are junk RNAs produced by accidental transcription at very low levels from non-conserved sequences.

Monday, June 26, 2017

Debating alternative splicing (Part III)

Proponents of massive alternative splicing argue that most human genes produce many different protein isoforms. According to these scientists, this means that humans can make about 100,000 different proteins from only ~20,000 protein-coding genes. They tend to believe humans are considerably more complex than other animals even though we have about the same number of genes. They think alternative splicing accounts for this complexity [see The Deflated Ego Problem].

Opponents (I am one) argue that most splice variants are due to splicing errors and most of those predicted protein isoforms don't exist. (We also argue that the differences between humans and other animals can be adequately explained by differential regulation of 20,000 protein-coding genes.) The controversy can only be resolved when proponents of massive alternative splicing provide evidence to support their claim that there are 100,000 functional proteins.

Wednesday, February 22, 2017

Sloppiness in translation initiation

There are two competing worldviews in the fields of biochemistry and molecular biology. The distinction was captured a few years ago by Laurence Hurst commenting on pervasive transcription when he said, "So there are two models; one, the world is messy and we're forever making transcripts we don't want. Or two, the genome is like the most exquisitely designed Swiss watch and we don't understand its working. We don't know the answer—which is what makes genomics so interesting." (Hopkins, 2009).

I refer to these two world views as the Swiss watch analogy and the Rube Goldberg analogy.

The distinction is important because, depending on your worldview, you will interpret things very differently. We see it in the debate over junk DNA where those in the Swiss watch category have trouble accepting that we could have a genome full of junk. Those in the Rube Goldberg category (I am one) tend to dismiss a lot of data as just noise or sloppiness.

Sunday, February 12, 2017

ENCODE workshop discusses function in 2015

A reader directed me to a 2015 ENCODE workshop with online videos of all the presentations [From Genome Function to Biomedical Insight: ENCODE and Beyond]. The workshop was sponsored by the National Human Genome Research Institute in Bethesda, Md (USA). The purpose of the workshop was ...

  1. Discuss the scientific questions and opportunities for better understanding genome function and applying that knowledge to basic biological questions and disease studies through large-scale genomics studies.
  2. Consider options for future NHGRI projects that would address these questions and opportunities.
The main controversy concerning the human genome is how much of it is junk DNA with no function. Since the purpose of ENCODE is to understand genome function, I expected a lively discussion about how to distinguish between functional elements and spurious nonfunctional elements.