More Recent Comments

Showing posts with label Junk DNA. Show all posts
Showing posts with label Junk DNA. Show all posts

Friday, July 14, 2017

Revisiting the genetic load argument with Dan Graur

The genetic load argument is one of the oldest arguments for junk DNA and it's one of the most powerful arguments that most of our genome must be junk. The concept dates back to J.B.S. Haldane in the late 1930s but the modern argument traditionally begins with Hermann Muller's classic paper from 1950. It has been extended and refined by him and many others since then (Muller, 1950; Muller, 1966).

Saturday, June 24, 2017

Debating alternative splicing (part II)

Mammalian genomes are very large. It looks like 90% of it is junk DNA. These genomes are pervasively transcribed, meaning that almost 90% of the bases are complementary to a transcript produced at some time during development. I think most of those transcripts are due to inappropriate transcription initiation. They are mistakes in transcription. The genome is littered with transcription factor binding sites but only a small percentage are directly involved in regulating gene expression. The rest are due to spurious binding—a well-known property of DNA binding proteins. These conclusions are based, I believe, on a proper understanding of evolution and basic biochemistry.

If you add up all the known genes, they cover about 30% of the genome sequence. Most of this (>90%) is intron sequence and introns are mostly junk. The standard mammalian gene is transcribed to produce a precursor RNA that is subsequently processed by splicing out introns to produce a mature RNA. If it's a messenger RNA (mRNA) then it will be translated to produce a protein (technically, a polypeptide). So far, the vast majority of protein-coding genes produce a single protein but there are some classic cases of alternative splicing where a given gene produces several different protein isoforms, each of which has a specific function.

Thursday, June 22, 2017

Are most transcription factor binding sites functional?

The ongoing debate over junk DNA often revolves around data collected by ENCODE and others. The idea that most of our genome is transcribed (pervasive transcription) seems to indicate that genes occupy most of the genome. The opposing view is that most of these transcripts are accidental products of spurious transcription. We see the same opposing views when it comes to transcription factor binding sites. ENCODE and their supporters have mapped millions of binding sites throughout the genome and they believe this represent abundant and exquisite regulation. The opposing view is that most of these binding sites are spurious and non-functional.

The messy view is supported by many studies on the biophysical properties of transcription factor binding. These studies show that any DNA binding protein has a low affinity for random sequence DNA. They will also bind with much higher affinity to sequences that resemble, but do not precisely match, the specific binding site [How RNA Polymerase Binds to DNA; DNA Binding Proteins]. If you take a species with a large genome, like us, then a typical DNA protein binding site of 6 bp will be present, by chance alone, at 800,000 sites. Not all of those sites will be bound by the transcription factor in vivo because some of the DNA will be tightly wrapped up in dense chromatin domains. Nevertheless, an appreciable percentage of the genome will be available for binding so that typical ENCODE assays detect thousand of binding sites for each transcription factor.

This information appears in all the best textbooks and it used to be a standard part of undergraduate courses in molecular biology and biochemistry. As far as I can tell, the current generation of new biochemistry researchers wasn't taught this information.

Wednesday, June 21, 2017

John Mattick still claims that most lncRNAs are functional

Most of the human genome is transcribed at some time or another in some tissue or another. The phenomenon is now known as pervasive transcription. Scientists have known about it for almost half a century.

At first the phenomenon seemed really puzzling since it was known that coding regions accounted for less than 1% of the genome and genetic load arguments suggested that only a small percentage of the genome could be functional. It was also known that more than half the genome consists of repetitive sequences that we now know are bits and pieces of defective transposons. It seemed unlikely back then that transcripts of defective transposons could be functional.

Part of the problem was solved with the discovery of RNA processing, especially splicing. It soon became apparent (by the early 1980s) that a typical protein coding gene was stretched out over 37,000 bp of which only 1300 bp were coding region. The rest was introns and intron sequences appeared to be mostly junk.

Thursday, May 18, 2017

Jonathan Wells illustrates zombie science by revisiting junk DNA

Jonathan Wells has written a new book (2017) called Zombie Science: More Icons of Evolution. He revisits his famous Icons of Evolution from 2000 and tries to show that nothing has changed in 17 years.

I wrote a book in 2000 about ten images images, ten "icons of evolution," that did not fit the evidence and were empirically dead. They should have been buried, but they are still with us, haunting our science classrooms and stalking our children. They are part of what I call zombie science.
I won't bore you with the details. The icons fall into two categories: (1) those that were meaningless and/or trivial in 2000 and remain so today, and (2) those that Wells misunderstood in 2000 and are still misunderstood by creationists today.

Saturday, February 11, 2017

What did ENCODE researchers say on Reddit?

ENCODE researchers answered a bunch of question on Reddit a few days ago. I asked them to give their opinion on how much junk DNA is in our genome but they declined to answer that question. However, I think we can get some idea about the current thinking in the leading labs by looking at the questions they did choose to answer. I don't think the picture is very encouraging. It's been almost five years since the ENCODE publicity disaster of September 2012. You'd think the researchers might have learned a thing or two about junk DNA since that fiasco.

The question and answer session on Reddit was prompted by award of a new grant to ENCODE. They just received 31.5 million dollars to continue their search for functional regions in the human genome. You might have guessed that Dan Graur would have a few words to say about giving ENCODE even more money [Proof that 100% of the Human Genome is Functional & that It Was Created by a Very Intelligent Designer @ENCODE_NIH].

Thursday, February 09, 2017

NIH and UCSF ENCODE researchers are on Reddit right now!

Check out Science AMA Series: We’re Drs. Michael Keefer and James Kobie, infectious .... (Thanks to Paul Nelson for alerting me to the discussion.)

Here's part of the introduction ...
Yesterday NIH announced its latest round of ENCODE funding, which includes support for five new collaborative centers focused on using cutting edge techniques to characterize the candidate functional elements in healthy and diseased human cells. For example, when and where does an element function, and what exactly does it do.

UCSF is host to two of these five new centers, where researchers are using CRISPR gene editing, embryonic stem cells, and other new tools that let us rapidly screen hundreds of thousands of genome sequences in many different cell types at a time to learn which sequences are biologically relevant — and in what contexts they matter.

Today’s AMA brings together the leaders of NIH’s ENCODE project and the leaders of UCSF’s partner research centers.

Your hosts today are:

Nadav Ahituv, UCSF professor in the department of bioengineering and therapeutic sciences. Interested in gene regulation and how its alteration leads to morphological differences between organisms and human disease. Loves science and juggling.
Elise Feingold: Lead Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since its start in 2003. I came up with the project’s name, ENCODE!
Dan Gilchrist, Program Director, Computational Genomics and Data Science, NHGRI. I joined the ENCODE Project Management team in 2014. Interests include mechanisms of gene regulation, using informatics to address biological questions, surf fishing.
Mike Pazin, Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since 2011. My background is in chromatin structure and gene regulation. I love science, learning about how things work, and playing music.
Yin Shen: Assistant Professor in Neurology and Institute for Human Genetics, UCSF. I am interested in how genetics and epigenetics contribute to human health and diseases, especial for the human brain and complex neurological diseases. If I am not doing science, I like experimenting in the kitchen.

Thursday, January 19, 2017

The pervasive transcription controversy: 2002

I'm working on a chapter about pervasive transcription and how it relates to the junk DNA debate. I found a short review in Nature from 2002 so I decided to see how much progress we've made in the past 15 years.

Most of our genome is transcribed at some time or another in some tissue. That's a fact we've known about since the late 1960s (King and Jukes, 1969). We didn't know it back then, but it turns out that a lot of that transcription is introns. In fact, the observation of abundant transcription led to the discovery of introns. We have about 20,000 protein-coding genes and the average gene is 37.2 kb in length. Thus, the total amount of the genome devoted to these genes is about 23%. That's the amount that's transcribed to produce primary transcripts and mRNA. There are about 5000 noncoding genes that contribute another 2% so genes occupy about 25% of our genome.

Wednesday, December 14, 2016

The ENCODE publicity campaign of 2007

ENCODE1 published the results of a pilot project in 2007 (Birney et al., 2007). They looked at 1% (30Mb) of the genome with a view to establishing their techniques and dealing with large amounts of data from many different groups. The goal was to "provide a more biologically informative representation of the human genome by using high-throughput methods to identify and catalogue the functional elements encoded."

The most striking result of this preliminary study was the confirmation of pervasive transcription. Here's what the ENCODE Consortium leaders said in the abstract,
Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap with one another.
ENCODE concluded that 93% of the genome is transcribed in one tissue or another. There are two possible explanations that account for pervasive transcription.

Friday, December 09, 2016

Using conservation to determine whether splice variants are functional

We've been having a discussion about function and how to recognize it. This is important when it comes to determining how much junk is in our genome [see Restarting the function wars (The Function Wars Part V)]. There doesn't seem to be any consensus on how to define "function" although there's general agreement on using sequence conservation as a first step. If some sequence under investigation is conserved in other species then that's a good sign that it's under negative selection and has a biological function. What if it's not conserved? Does that rule out function? The correct answer is "no" because one can always come up with explanations/excuses for such an observation. We discussed the example of de novo genes, which, by definition, are not conserved.

Let's look at another example: splice variants. Splice variants are different forms of RNA produced from the same gene. If they are biologically relevant then they will produce different forms of the protein (for protein-coding genes). This is an example of alternative splicing if, and only if, relevance has been proven.

Tuesday, December 06, 2016

Restarting the function wars (The Function Wars Part V)

The term "function wars" refers to debates over the meaning of the word "function" in biology. It refers specifically to the discussion about junk DNA because junk DNA is defined as DNA that does not have a biological function. The wars were (re-)started when the ENCODE Consortium decided to use a stupid definition of function in order to prove that most of our genome was functional. This prompted a number of papers attempting to create a more meaningful definition.

None of them succeeded, in my opinion, because biology is messy and doesn't lend itself to precise definitions. Look how difficult it is to define a "gene," for example. Or "evolution."

Nevertheless, some progress was made. Dan Graur has recently posted a summary of the two most important definitions of function [What does “function” mean in the context of evolution & what absurd situations may arise by using the wrong definition?]. The two definitions are "selected-effect" and "causal-role" (there are synonyms).

Wednesday, August 03, 2016

More junk science in Science

The latest issue of the journal Science (Aug. 1, 2016) has an article on a recent paper by Aires et al. (2016) published in Developmental Cell. Here's the abstract of the paper ...

Vertebrates exhibit a remarkably broad variation in trunk and tail lengths. However, the evolutionary and developmental origins of this diversity remain largely unknown. Posterior Hox genes were proposed to be major players in trunk length diversification in vertebrates, but functional studies have so far failed to support this view. Here we identify the pluripotency factor Oct4 as a key regulator of trunk length in vertebrate embryos. Maintaining high Oct4 levels in axial progenitors throughout development was sufficient to extend trunk length in mouse embryos. Oct4 also shifted posterior Hox gene-expression boundaries in the extended trunks, thus providing a link between activation of these genes and the transition to tail development. Furthermore, we show that the exceptionally long trunks of snakes are likely to result from heterochronic changes in Oct4 activity during body axis extension, which may have derived from differential genomic rearrangements at the Oct4 locus during vertebrate evolution.
... those ignorant of history are not condemned to repeat it; they are merely destined to be confused.

Stephen Jay Gould
Ontogeny and Phylogeny (1977)
The results were written up by a freelance journalist named Diana Crow [‘Junk DNA’ tells mice—and snakes—how to grow a backbone]. She writes ...
‘Junk DNA’ tells mice—and snakes—how to grow a backbone

Why does a snake have 25 or more rows of ribs, whereas a mouse has only 13? The answer, according to a new study, may lie in "junk DNA," large chunks of an animal’s genome that were once thought to be useless. The findings could help explain how dramatic changes in body shape have occurred over evolutionary history.

Scientists began discovering junk DNA sequences in the 1960s. These stretches of the genome—also known as noncoding DNA—contain the same genetic alphabet found in genes, but they don’t code for the proteins that make us who we are. As a result, many researchers long believed this mysterious genetic material was simply DNA debris accumulated over the course of evolution. But over the past couple decades, geneticists have discovered that this so-called junk is anything but. It has important functions, such as switching genes on and off and setting the timing for changes in gene activity.
Sandwalk readers will see all the mistakes and misconceptions in these paragraphs. She's talking about regulatory sequences that were never, ever, thought to be junk. The paper being discussed has nothing to do with junk DNA and the results do not in any way alter our understanding of developmental gene regulation.

If you look carefully at the abstract, you'll see the word "heterochronic." This is one of Stephen Jay Gould's favorite words. He wrote about it in Ontogeny and Phylogeny.
I wish to emphasize one other distinction. Evolution occurs when ontogeny is altered in one of two ways: when new characters are introduced at any stage of development with varying effects upon subsequent stages, or when characters already present undergo changes in developmental timing. Together, these two processes exhaust the formal concept of phyletic change.; the second process is heterochrony. [my emphasis ... LAM] If change in developmental timing is important in evolution, then this second process must be very common.
This was written in 1977—that's almost 40 years ago! These ideas were around for decades before Gould wrote his book1 and they have been shown to be correct by numerous studies in the 1980s.

What's going on here? Science is supposed to be one of the leading science journals. How could it publish an article that misrepresents the field so badly? Do the editors send these "Latest News" articles out for review?


1. Ed Lewis shared the Nobel Prize in 1995 for his contribution to "the genetic control of early embryonic development" [The Nobel Prize in Physiology or Medicine 1995].

Monday, July 11, 2016

A genetics professor who rejects junk DNA

Praveen Sethupathy is a genetics professor at the University of North Carolina in Chapel Hill, North Carolina, USA.

He explains why he is a Christian and why he is "more than his genes" in Am I more than my genes? Faith, identity, and DNA.

Here's the opening paragraph ...
The word “genome” suggests to many that our DNA is simply a collection of genes from end-to-end, like books on a bookshelf. But it turns out that large regions of our DNA do not encode genes. Some once called these regions “junk DNA.” But this was a mistake. More recently, they have been referred to as the “dark matter” of our genome. But what was once dark is slowly coming to light, and what was once junk is being revealed as treasure. The genome is filled with what we call “control elements” that act like switches or rheostats, dialing the activation of nearby genes up and down based on whatever is needed in a particular cell. An increasing number of devastating complex diseases, such as cancer, diabetes, and heart disease, can often be traced back, in part, to these rheostats not working properly.

Thursday, June 30, 2016

Do Intelligent Design Creationists still think junk DNA refutes ID?

I'm curious about whether Intelligent Design Creationists still think their prediction about junk DNA has been confirmed.


Here's what Stephen Meyer wrote in Darwin's Doubt (p. 400).
The noncoding regions of the genome were assumed to be nonfunctional detritus of the trial-and-error mutational process—the same process that produced the functional code in the genome. As a result, these noncoding regions were deemed "junk DNA," including by no less a scientific luminary than Francis Crick.

Because intelligent design asserts that an intelligent cause produced the genome, design advocates have long predicted that most of the nonprotein-coding sequences in the genome should perform some biological function, even if they do not direct protein synthesis. Design theorists do not deny that mutational processes might have degraded some previously functional DNA, but we have predicted that the functional DNA (the signal) should dwarf the nonfunctional DNA (the noise), and not the reverse. As William Dembski, a leading design proponent, predicted in 1998, "On an evolutionary view we expect a lot of useless DNA. If, on the other hand, organisms are designed, we DNA, as much as possible, to exhibit function."
I'm trying to write about this in my book and I want to be as fair as possible.

Do most ID proponents still believe this is an important prediction from ID theory?

Do most ID proponents still think that most of the human genome is functional?


Monday, May 16, 2016

Tim Minchin's "Storm," the animated movie, and another no-so-good Minchin cartoon

I've mentioned this before but it bears repeating. If you haven't listened to "Storm" then you are in for a treat because now you can listen AND watch. If you've heard it before, then hear it again. The message never gets old.


A word of caution. Minchin may be very good at recognizing pseudoscience and quacks but he can be a bit gullible when listening to scientists. He was completely take in by the ENCODE hype back in 2012. This cartoon is also narrated by Tim Minchin but it's not so good.



Monday, May 02, 2016

The Encyclopedia of Evolutionary Biology revisits junk DNA

The Enclyopedia of Evolutionary Biology is a four volume set of articles by leading evolutionary biologists. An online version is available at ScienceDirect. Many universities will have free access.

I was interested in what they had to say about junk DNA and the evolution of large complex genomes. The only article that directly addressed the topic was "Noncoding DNA Evolution: Junk DNA Revisited" by Michael Z. Ludwig of the Department of Ecology and Evolution at the University of Chicago. Ludwig is a Research Associate (Assistant Professor) who works with Martin Kreitman on "Developmental regulation of gene expression and the genetic basis for evolution of regulatory DNA."

As you could guess from the title of the article, Michael Ludwig divides the genome into two fractions; protein-coding genes and noncoding DNA. The fact that organismal complexity doesn't correlate with the number of genes (protein-coding) is a problem that requires an explanation, according to Ludwig. He assumes that the term "junk DNA" was used in the past to account for our lack of knowledge about noncoding DNA.
Eukaryotic genomes mostly consist of DNA that is not translated into protein sequence. However, noncoding DNA (ncDNA) has been little studied relative to proteins. The lack of knowledge about its functional significance has led to hypotheses that much nongenic DNA is useless "junk" (Ohno, 1972) or that it exists only to replicate itself (Doolittle and Sapienza, 1980; Orgel and Crick, 1980).
Ludwig says that we now know some of the functions of non-coding DNA and one of them is regulation of gene expression.
These regulatory sequences are distributed among selfish transposons and middle or short repetitive DNAs. The genome is an extremely complex machine; functionally as well as structurally it is generally not possible to disentangle the regulatory function from the junk selfish activity. The idea of junk DNA needs to be revisited.
Of course we all know about regulatory sequences. We've known about this function of non-coding DNA for half a century. The question that interests us is not whether non-coding DNA has a function but whether a large proportion of noncoding DNA is junk.

Ludwig seems to be arguing that a significant fraction of the mammalian genome is devoted to regulation. He doesn't ever specify what this fraction is but apparently it's large enough to "revisit" junk DNA.

The biggest obstacle to his thesis is the fact that only 8% of the human genome is conserved (Rands et al., 2014). Ludwig says that 1% of the genome is coding DNA and 7% "has a functional regulatory gene expression role" according to the Rands et al. study. This is somewhat misleading since Rands et al. specifically mention that not all of this conserved DNA will be regulatory.

All of this is consistent with a definition of function specifying that it must be under negative selection (i.e. conserved). It leads to the conclusion that about 90% of the human genome is junk. That doesn't require a re-evaluation of junk.

In order to "revisit" junk DNA, the proponents of the "complex machine" view of evolution must come up with plausible reasons why lack of sequence conservation does not rule out function. Ludwig offers up the standard rationales ...
  1. Some ultra-conserved sequences don't seem to have a function and this "shows that the extent of sequence conservation is not a good predictor of the functional importance of a sequence."
  2. The amount of conserved sequence depends on the alignment and alignment is difficult.
  3. About 40%-70% of the noncoding DNA in Drosophila melanogaster is under functional constraint within the species but not between D. melanogaster and D. simulans. Therefore, some large fraction of functional regulatory sequences might only be conserved in the human lineage and it won't show up in comparisons between species. (Does this explain onions?)
The idea here is that there is rapid turnover of functional DNA binding sites required for regulation but the overall fraction of DNA devoted to regulation remains large. This explains why there doesn't seem to be a correlation between the amount of conserved DNA and the amount that can possibly be devoted to regulating gene expression. The argument implies that much more than 7% of the genome is required for regulation. The amount has to be >50% or so in order to justify overthrowing the concept of junk DNA.

That's a ridiculous number, but so is 7%. Imagine that "only" 7% of the genome is functionally involved in regulating expression of the protein-coding genes. That's 224 million base pairs of DNA or approximately 10 thousand base pairs of cis-regulatory elements (CREs) for every protein-coding gene.

There is no evidence whatsoever that even this amount (7%) of DNA is required for regulation but Ludwig would like to think that the actual amount is much greater. The lack of conservation is dismissed by assuming rapid turnover while conserving function and/or stabilizing selection on polymorphic sequences.

The problem here is that Ludwig is constructing a just-so evolutionary story to explain something that doesn't require an explanation. If there's no evidence that a large fraction of the genome is required for regulation then there's no problem that needs explaining. Ludwig does not tell us why he believes that most of our genome is required for regulation. Maybe it's because of ENCODE?

Since this is published in the Encyclopedia of Evolutionary Biolgoy, I assume that this sort of evolutionary argument resonates with many evolutionary biologists. That's sad.


Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLoS Genetics, 10(7), e1004525. [doi: 10.1371/journal.pgen.1004525]

Sunday, March 27, 2016

Georgi Marinov reviews two books on junk DNA

The December issue of Evolution: Education and Outreach has a review of two books on junk DNA. The reviewer is Georgi Marinov, a name that's familiar to Sandwalk readers. He is currently working with Michael Lynch at Indiana University in Bloomington, Indiana, USA. You can read the review at: A deeper confusion.

The books are ...
The Deeper Genome: Why there is more to the human genome than meets the eye, by John Parrington, (Oxford, United Kingdom: Oxford University Press), 2015. ISBN:978-0-19-968873-9.

Junk DNA: A Journey Through the Dark Matter of the Genome, by Nessa Carey, (New York, United States: Columbia University Press), 2015. ISBN:978-0-23-117084-0.
You really need to read the review for yourselves but here's a few teasers.
If taken uncritically, these texts can be expected to generate even more confusion in a field that already has a serious problem when it comes to communicating the best understanding of the science to the public.
Parrington claims that noncoding DNA was thought to be junk and Georgi replies,
However, no knowledgeable person has ever defended the position that 98 % of the human genome is useless. The 98 % figure corresponds to the fraction of it that lies outside of protein coding genes, but the existence of distal regulatory elements, as nicely narrated by the author himself, has been at this point in time known for four decades, and there have been numerous comparative genomics studies pointing to a several-fold larger than 2% fraction of the genome that is under selective constraint.
I agree. That's a position that I've been trying to advertise for several decades and it needs to be constantly reiterated since there are so many people who have fallen for the myth.

Georgi goes on to explain where Parringtons goes wrong about the ENCODE results. This critique is devastating, coming, as it does, from an author of the most relevant papers.1 My only complaint about the review is that George doesn't reveal his credentials. When he quotes from those papers—as he does many times—he should probably have mentioned that he is an author of those quotes.

Georgi goes on to explain four main arguments for junk DNA: genetic load, the C-value Paradox, transposons (selfish DNA), and modern evolutionary theory. I like this part since it's similar to the Five Things You Should Know if You Want to Participate in the Junk DNA Debate. The audience of this journal is teachers and this is important information that they need to know, and probably don't.

His critique of Nessa Carey's book is even more devastating. It begins with,
Still, despite a few unfortunate mistakes, The Deeper Genome is well written and gets many of its facts right, even if they are not interpreted properly. This is in stark contrast with Nessa Carey’s Junk DNA: A Journey Through the Dark Matter of the Genome. Nessa Carey has a PhD in virology and has in the past been a Senior Lecturer in Molecular Biology at Imperial College, London. However, Junk DNA is a book not written at an academic level but instead intended for very broad audience, with all the consequences that the danger of dumbing it down for such a purpose entails.
It gets worse. Nessa Carey claims that scientists used to think that all noncoding DNA was junk but recent discoveries have discredited that view. Georgi sets her straight with,
Of course, scientists have had a very good idea why so much of our DNA does not code for proteins, and they have had that understanding for decades, as outlined above. Only by completely ignoring all that knowledge could it have been possible to produce many of the chapters in the book. The following are referred to as junk DNA by Carey, with whole chapters dedicated to each of them (Table 3).


The inclusion of tRNAs and rRNAs in the list of “previously thought to be junk” DNA is particularly baffling given that they have featured prominently as critical components of the protein synthesis machinery in all sorts of basic high school biology textbooks for decades, not to mention the role that rRNAs and some of the other noncoding RNAs on that list play in many “RNA world” scenarios for the origin of life. How could something that has so often been postulated to predate the origin of DNA as the carrier of genetic information (Jeffares et al. 1998; Fox 2010) and that must have been of critical importance both before and after that be referred to as “junk”?
You would think that this is something that doesn't have to be explained to biology teachers but the evidence suggests otherwise. One of those teachers recently reviewed Nessa Carey's book very favorably in the journal The American Biology Teacher and another high school teacher reveals his confusion about the subject in the comments to my post [see Teaching about genomes using Nessa Carey's book: Junk DNA].

It's good that Georgi Marinov makes this point forcibly.

Now I'm going to leave you with an extended quote from Georgi Marinov's review. Coming from a young scientist, this is very potent and it needs to be widely disseminated. I agree 100%.
The reason why scientific results become so distorted on their way from scientists to the public can only be understood in the socioeconomic context in which science is done today. As almost everyone knows at this point, science has existed in a state of insufficient funding and ever increasing competition for limited resources (positions, funding, and the small number of publishing slots in top scientific journals) for a long time now. The best way to win that Darwinian race is to make a big, paradigm shifting finding. But such discoveries are hard to come by, and in many areas might actually never happen again—nothing guarantees that the fundamental discoveries in a given area have not already been made. ... This naturally leads to a publishing environment that pretty much mandates that findings are framed in the most favorable and exciting way, with important caveats and limitations hidden between the lines or missing completely. The author is too young to have directly experienced those times, but has read quite a few papers in top journals from the 1970s and earlier, and has been repeatedly struck by the difference between the open discussion one can find in many of those old articles and the currently dominant practices.

But that same problem is not limited to science itself, it seems to be now prevalent at all steps in the chain of transmission of findings, from the primary literature, through PR departments and press releases, and finally, in the hands of the science journalists and writers who report directly to the lay audience, and who operate under similar pressures to produce eye-catching headlines that can grab the fleeting attention of readers with ever decreasing ability to concentrate on complex and subtle issues. This leads to compound overhyping of results, of which The Deeper Genome is representative, and to truly surreal distortion of the science, such as what one finds in Nessa Carey’s Junk DNA.

The field of functional genomics is especially vulnerable to these trends, as it exists in the hard-to-navigate context of very rapid technological changes, a potential for the generation of truly revolutionary medical technologies, and an often difficult interaction with evolutionary biology, a controversial for a significant portion of society topic. It is not a simple subject to understand and communicate given all these complexities while in the same time the potential and incentives to mislead and misinterpret are great, and the consequences of doing so dire. Failure to properly communicate genomic science can lead to a failure to support and develop the medical breakthroughs it promises to deliver, or what might be even worse, to implement them in such a way that some of the dystopian futures imagined by sci-fi authors become reality. In addition, lending support to anti-evolutionary forces in society by distorting the science in a way that makes it appear to undermine evolutionary theory has profound consequences that given the fundamental importance of evolution for the proper understanding of humanity’s place in nature go far beyond making life even more difficult for teachers and educators of even the general destruction of science education. Writing on these issues should exercise the needed care and make sure that facts and their best interpretations are accurately reported. Instead, books such as The Deeper Genome and Junk DNA are prime examples of the negative trends outlined above, and are guaranteed to only generate even deeper confusion.
It's not easy to explain these things to a general audience, especially an audience that has been inundated with false information and false ideas. I'm going to give it a try but it's taking a lot more effort than I imagined.


1. Georgi Marinov is an author on the original ENCODE paper that claimed 80% of our genome is functional (ENCODE Project Consortium, 2012) and the paper where the ENCODE leaders retreated from that claim (Kellis et al., 2014).

ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 48957-74. [doi: 10.1038/nature11247]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Tuesday, March 22, 2016

How do you characterize these scientists?

We've been having a discussion on another thread about ID proponents. Are some of them acting in good faith or are they all lying and deceiving their followers?

I have similar problems about many scientists. I've been reading up on pervasive transcription and the potential number of genes for noncoding, functional, RNAs in the human genome. As far as I can tell, there are only a few hundred examples that have any supporting evidence. There are good scientific reasons to believe that most of the detected transcripts are junk RNA produced as the result of accidental, spurious, transcription.

There are about 20,000 protein-coding genes in the human genome. I think it's unlikely that there are more than a few thousand genes for functional RNAs for a total of less than 25,000 genes.

Here's one of the papers I found.
Guil, S. and Esteller, M. (2015) RNA–RNA interactions in gene regulation: the coding and noncoding players. Trends in Biochemical Sciences 40:248-256. [doi: 10.1016/j.tibs.2015.03.001]
Trends in Biochemical Sciences is a good journal and this is a review of the field by supposed experts. The authors are from the Department of Physiological Sciences II at the University of Barcelona School of Medicine in Barcelona, Catalonia, Spain. The senior author, Manel Esteller, has a Wikipedia entry [Manel Esteller].

Here's the first paragraph of the introduction.
There are more genes encoding regulatory RNAs than encoding proteins. This evidence, obtained in recent years from the sum of numerous post-genomic deep-sequencing studies, give a good clue of the gigantic step we have taken from the years of the central dogma: one gene gives rise to one RNA to produce one protein.
The first sentence is not true by any stretch of the imagination. The best that could be said is that there "may" be more genes for regulatory RNAs (> 20,000) but there's no strong consensus yet. Since the first sentence is an untruth, it follows that it is incorrect to say that the evidence supports such a claim.

It's also untrue to distort the real meaning of the Central Dogma of Molecular Biology, which never said that all genes have to encode proteins. The authors don't understand the history of their field in spite of the fact they are writing a review of that field.

Here's the problem. Are these scientists acting in good faith when they say such nonsense? Does acting in "good faith" require healthy criticism and critical thinking or is "honesty" the only criterion? The authors are clearly deluded about the controversy since they assume that it has been resolved in favor of their personal biases but they aren't lying. Can we distinguish between competent science and bad science based on such statements? Can we say that these scientists are incompetent or is that too harsh?

Furthermore, what ever happened to peer review? Isn't the system supposed to prevent such mistakes?


Wednesday, March 09, 2016

A 2004 kerfuffle over pervasive transcription in the mouse genome

The first drafts of the human genome sequence were published in 2001. There was still work to do on "finishing" the sequence but a lot of the International Human Genome Project (IHGP) team shifted to work on the mouse genome. The FANTOM Consortium and the RIKEN Genome Exploration Groups (I and II) published an analysis of mouse transcripts in December 2002.
Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., Kondo, S., Nikaido, I., Osato, N., Saito, R., Suzuki, H. et al. (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420:563-573. [doi: 10.1038/nature01266]

Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 ‘transcriptional units’, contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense–antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

Wednesday, November 25, 2015

Selfish genes and transposons

Back in 1980, the idea that large fractions of animal and plant genomes could be junk was quite controversial. Although the idea was consistent with the latest developments in population genetics, most scientists were unaware of these developments. They were looking for adaptive ways of explaining all the excess DNA in these genomes.

Some scientists were experts in modern evolutionary theory but still wanted to explain "junk DNA." Doolittle & Sapienza, and Orgel & Crick, published back-to-back papers in the April 17, 1980 issue of Nature. They explained junk DNA by claiming that most of it was due to the presence of "selfish" transposons that were being selected and preserved because they benefited their own replication and transmission to future generations. They have no effect on the fitness of the organism they inhabit. This is natural selection at a different level.

This prompted some responses in later editions of the journal and then responses to the responses.

Here's the complete series ...