More Recent Comments

Showing posts sorted by relevance for query encode. Sort by date Show all posts
Showing posts sorted by relevance for query encode. Sort by date Show all posts

Friday, December 13, 2019

The "standard" view of junk DNA is completely wrong

I was browsing the table of contents of the latest issue of Cell and I came across this ....
For decades, the miniscule protein-coding portion of the genome was the primary focus of medical research. The sequencing of the human genome showed that only ∼2% of our genes ultimately code for proteins, and many in the scientific community believed that the remaining 98% was simply non-functional “junk” (Mattick and Makunin, 2006; Slack, 2006). However, the ENCODE project revealed that the non-protein coding portion of the genome is copied into thousands of RNA molecules (Djebali et al., 2012; Gerstein et al., 2012) that not only regulate fundamental biological processes such as growth, development, and organ function, but also appear to play a critical role in the whole spectrum of human disease, notably cancer (for recent reviews, see Adams et al., 2017; Deveson et al., 2017; Rupaimoole and Slack, 2017).

Slack, F.J. and Chinnaiyan, A.M. (2019) The Role of Non-coding RNAs in Oncology. Cell 179:1033-1055 [doi: 10.1016/j.cell.2019.10.017]
Cell is a high-impact, refereed journal so we can safely assume that this paper was reviewed by reputable scientists. This means that the view expressed in the paragraph above did not raise any alarm bells when the paper was reviewed. The authors clearly believe that what they are saying is true and so do many other reputable scientists. This seems to be the "standard" view of junk DNA among scientists who do not understand the facts or the debate surrounding junk DNA and pervasive transcription.

Here are some of the obvious errors in the statement.
  1. The sequencing of the human genome did NOT show that only ~2% of our genome consisted of coding region. That fact was known almost 50 years ago and the human genome sequence merely confirmed it.
  2. No knowledgeable scientist ever thought that the remaining 98% of the genome was junk—not in 1970 and not in any of the past fifty years.
  3. The ENCODE project revealed that much of our genome is transcribed at some time or another but it is almost certainly true that the vast majority of these low-abundance, non-conserved, transcripts are junk RNA produced by accidental transcription.
  4. The existence of noncoding RNAs such as ribosomal RNA and tRNA was known in the 1960s, long before ENCODE. The existence of snoRNAs, snRNAs, regulatory RNAs, and various catalytic RNAS were known in the 1980s, long before ENCODE. Other RNAs such as miRNAs, piRNAS, and siRNAs were well known in the 1990s, long before ENCODE.
How did this false view of our genome become so widespread? It's partially because of the now highly discredited ENCODE publicity campaign orchestrated by Nature and Science but that doesn't explain everything. The truth is out there in peer-reviewed scientific publications but scientists aren't reading those papers. They don't even realize that their standard view has been seriously challenged. Why?


Friday, December 14, 2012

Fallout from the ENCODE Fiasco Makes It into the Globe & Mail

Most of us are aware of the ENCODE publicity fiasco. The leaders of the project made some outlandish claims about the function of most of our genome in order to attract attention and make their work seem much more significant than it really is [see Sean Eddy on Junk DNA and ENCODE].

Many scientists tried to set the record straight and they pretty much succeeded, at least in the scientific community. Most scientist now know that the case for junk DNA is a lot stronger than they thought.

Unfortunately, the criticisms didn't get much publicity and the average person is left with the impression that most of our genome has an important function, even if we don't know exactly what that function is. This means that good science writers have to work harder to educate the public about the true state of our genome.

Timothy Caulfield is a Professor in the Faculty of Law and the School of Public Health at the University of Alberta in Edmonton, Alberta, Canada. He's also Canada Research Chair in Health Law and Policy, a very prestigious award. He writes the following in today's issue of The Globe & Mail, Canada's most important newspaper [We’re overselling the health-care 'revolution' of personal genomics].
The relationship between our genome and disease is far more complicated than originally anticipated. Indeed, the more we learn about the human genome, the less we seem to know. For example, results from a major international initiative to explore all the elements of our genome (the ENCODE project) found that, despite decades-old conventional wisdom that much of our genome was nothing but “junk DNA,” as much as 80 per cent of our genome likely has some biological function. This work hints that things are much more convoluted than expected. So much so that one of ENCODE’s lead researchers, Yale’s Mark Gerstein, was quoted as saying that it’s “like opening a wire closet and seeing a hairball of wires.”
Statements like this from someone who is supposed to be knowledgeable about such issues show us that the ENCODE fiasco has far-reaching consequences. The misleading statements by Ewan Birney and others will take years to undo. It's all the more reason to criticize Nature and Science for aiding and abetting the spread of this false information.

How can we expect people like Timothy Caulfield to understand the science if the leading journals get it wrong?


[Hat Tip: Ryan Gregory, "The Bullshit Continues to Spread" on Facebook.]

Tuesday, July 01, 2014

The Function Wars: Part II

This is Part II of several "Function Wars"1 posts. The first one is on Quibbling about the meaning of the word "function" [The Function Wars: Part I].

The ENCODE legacy

I addressed the meaning of "function" in Part I It is apparent that philosophers and scientists are a long way from agreeing on an acceptable definition. There has been a mini-explosion of papers on this topic in the past few years, stimulated by the ENCODE Consortium publicity campaign where the ENCODE leaders clearly picked a silly definition of "function" in order to attract attention.

Unfortunately, the responses to this mistake have not clarified the issue at all. Indeed, some philosophers have even defended the ENCODE Consortium definition (Germain et al., 2014). Some have opposed the ENCODE definition but come under attack from other scientists and philosophers for using the wrong definition (see Elliott et al, 2014). The net effect has been to lend credence to the ENCODE Consortium’s definition, if only because it becomes one of many viable alternatives.

Thursday, December 31, 2020

On the importance of controls

When doing an exeriment, it's important to keep the number of variables to a minimum and it's important to have scientific controls. There are two types of controls. A negative control covers the possibility that you will get a signal by chance; for example, if you are testing an enzyme to see whether it degrades sugar then the negative control will be a tube with no enzyme. Some of the sugar may degrade spontaneoulsy and you need to know this. A positive control is when you deliberately add something that you know will give a positive result; for example, if you are doing a test to see if your sample contains protein then you want to add an extra sample that contains a known amount of protein to make sure all your reagents are working.

Lots of controls are more complicated than the examples I gave but the principle is important. It's true that some experiments don't appear to need the appropriate controls but that may be an illusion. The controls might still be necessary in order to properly interpret the results but they're not done because they are very difficult. This is often true of genomics experiments.

Friday, July 03, 2015

The fuzzy thinking of John Parrington: The Central Dogma

My copy of The Deeper Genome: Why there's more to the human genome than meets the eye has arrived and I've finished reading it. It's a huge disappointment. Parrington makes no attempt to describe what's in your genome in more than general hand-waving terms. His main theme is that the genome is really complicated and so are we. Gosh, golly, gee whiz! Re-write the textbooks!

You will look in vain for any hard numbers such as the total number of genes or the amount of the genome devoted to centromeres, regulatory sequences etc. etc. [see What's in your genome?]. Instead, you will find a wishy-washy defense of ENCODE results and tributes to the views of John Mattick.

John Parrington is an Associate Professor of Cellular & Molecular Pharmacology at the University of Oxford (Oxford, UK). He works on the physiology of calcium signalling in mammals. This should make him well-qualified to write a book about biochemistry, molecular biology, and genomes. Unfortunately, his writing leaves a great deal to be desired. He seems to be part of a younger generation of scientists who were poorly trained as graduate students (he got his Ph.D. in 1992). He exhibits the same kind of fuzzy thinking as many of the ENCODE leaders.

Let me give you just one example.

Thursday, September 13, 2012

James Shapiro Claims Credit for Predicting That Junk DNA Is Actually Part of a "highly sophisticated information storage organelle"

Do you remember James Shaprio? He's the University of Chicago scientist who claims to have discovered a new theory of evolution in his book evolution: A View from the 21st Century [see my review in NCSE Reports]. The book criticizes the old hardened version of the Modern Synthesis and never mentions things like random genetic drift or Nearly-Neutral Theory. It's difficult to imagine how someone could criticize evolutionary theory without understanding population genetics but he managed to pull it off.

You might also recall that he's the scientist who criticized the Central Dogma of Molecular Biology when he clearly didn't understand it [Revisiting the Central Dogma in the 21st Century]. I was shocked to learn that he had published a paper with the title "Revisiting the Central Dogma in the 21st Century" without ever bothering to read the literature to find out how Francis Crick actually defined the Central Dogma. (In fact, Shapiro misrepresented Crick's view.) It goes to show you how silly you look when you criticize something you don't understand.

Tuesday, May 24, 2011

Junk & Jonathan: Part 6—Chapter 3

This is part 6 of my review of The Myth of Junk DNA. For a list of other postings on this topic see the link to Genomes & Junk DNA in the "theme box" below or in the sidebar under "Themes."

We learn in Chapter 9 that Wells has two categories of evidence against junk DNA. The first covers evidence that sequences probably have a function and the second covers specific known examples of functional sequences. In the first category there are two lines of evidence: transcription and conservation. Both of them are covered in Chapter 3 making this one of the most important chapters in the book. The remaining category of specific examples is described in Chapters 4-7.

The title of Chapter 3 is Most DNA Is Transcribed into RNA. As you might have anticipated, the focus of Wells' discussion is the ENCODE pilot project that detected abundant transcription in the 1% of the genome that they analyzed (ENCODE Project Consortium, 2007). Their results suggest that most of the genome is transcribed. Other studies support this idea and show that transcripts often overlap and many of them come from the opposite strand in a gene giving rise to antisense RNAs.

The original Nature paper says,
... our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another.
The authors of these studies firmly believe that evidence of transcription is evidence of function. This has even led some of them to propose a new definition of a gene [see What is a gene, post-ENCODE?]. There's no doubt that many molecular biologists take this data to mean that most of our genome has a function and that's the same point that Wells makes in his book. It's evidence against junk DNA.

What are these transcripts doing? Wells devotes a section to "Specific Functions of Non-Protein-Coding RNAs." These RNAs may be news to most readers but they are well known to biochemists and molecular biologists. This is not the place to describe all the known functional non-coding RNAs but keep in mind that there are three main categories: ribosomal RNA (rRNA), transfer RNA (tRNA), and a heterogeneous category called small RNAs. There are dozens of different kinds of small RNAs including unique ones such as the 7SL RNA of signal recognition factor, the P1 RNA of RNAse P and the guide RNA in telomerase. Other categories include the spliceosome RNAs, snoRNAs, piRNAs, siRNAs, and miRNAs. These RNAs have been studied for decades. It's important to note that the confirmed examples are transcribed from genes that make up less than 1% of the genome.

One interesting category is called "long noncoding RNAs" or lncRNAs. As the name implies, these RNAs are longer that the typical small RNAs. Their functions, if any, are largely unknown although a few have been characterized. If we add up all the genes for these RNAs and assume they are functional it will account for about 0.1% of the genome so this isn't an important category in the discussion about junk DNA.

Theme

Genomes
& Junk DNA
So, we're left with a puzzle. If more than 90% of the genome is transcribed but we only know about a small number of functional RNAs then what about the rest?

Opponents of junk DNA—both creationists and scientists—would have you believe that there's a lot we don't know about genomes and RNA. They believe that we will eventually find functions for all this RNA and prove that the DNA that produces them isn't junk. This is a genuine scientific controversy. What do their scientific opponents (I am one) say about the ENCODE result?

Criticisms of the ENCODE analysis take two forms ...
  • The data is wrong and only a small fraction of the genome is transcribed
  • The data is mostly correct but the transcription is spurious and accidental. Most of the products are junk RNA.
Criticisms of the Data

Several papers have appeared that call into question the techniques used by the ENCODE consortium. They claim that many of the identified transcribed regions are artifacts. This is especially true of the repetitive regions of the genome that make up more than half of the total content. If any one of these regions is transcribed then the transcript will likely hybridize to the remaining repeats giving a false impression of the amount of DNA that is actually transcribed.

Of course, Wells doesn't mention any of these criticisms in Chapter 3. In fact, he implies that every published paper is completely accurate in spite of the fact that most of them have never been replicated and many have been challenged by subsequent work. The readers of The Myth of Junk DNA will assume, intentionally or otherwise, that if a paper appears in the scientific literature it must be true.

But criticism of the ENCODE results are so widespread that they can't be ignored so Wells is forced to deal with them in Chapter 8. (Why not in Chapter 3 when they are first mentioned?) In particular, Wells has to address the van Bakel et al. (2010) paper from Tim Hughes' lab here in Toronto. This paper was widely discussed when it came out last year [see: Junk RNA or Imaginary RNA?]. We'll deal with it when I cover Chapter 9 but, suffice to say, Wells dismisses the criticism.

Criticisms of the Interpretation

The other form of criticism focuses on the interpretation of the data rather than its accuracy. Most of us who teach transcription take pains to point out to our students that RNA polymerase binds non-specifically to DNA and that much of this binding will result in spurious transcription at a very low frequency. This is exactly what we expect from a knowledge of transcription initiation [How RNA Polymerase Binds to DNA]. The ENCODE data shows that most of the genome is "transcribed" at a frequency of once every few generations (or days) and this is exactly what we expect from spurious transcription. The RNAs are non-functional accidents due to the sloppiness of the process [Useful RNAs?].

Wells doesn't mention any of this. I don't know if that's because he's ignorant of the basic biochemistry and hasn't read the papers or whether he is deliberately trying to mislead his readers. It's probably a bit of both.

It's not as if this is some secret known only to the experts. The possibility of spurious transcription has come up frequently in the scientific literature in the past few years. For example, Guttmann et al. (2009) write,
Genomic projects over the past decade have used shotgun sequencing and microarray hybridization to obtain evidence for many thousands of additional non-coding transcripts in mammals. Although the number of transcripts has grown, so too have the doubts as to whether most are biologically functional. The main concern was raised by the observation that most of the intergenic transcripts show little to no evolutionary conservation. Strictly speaking, the absence of evolutionary conservation cannot prove the absence of function. But the remarkably low rate of conservation seen in the current catalogues of large non-coding transcripts (less than 5% of cases) is unprecedented and would require that each mammalian clade evolves its own distinct repertoire of non-coding transcripts. Instead, the data suggest that the current catalogues may consist largely of transcriptional noise, with a minority of bona fide functional lincRNAs hidden amid this background.
This paper is in the Wells reference list so we know that he has read it.

What these authors are saying is that the data is consistent with spurious transcription (noise). Part of the evidence is the lack of any sequence conservation among the transcripts. It's as though they were mostly derived from junk DNA.

Sequence Conservation

Recall that the purpose of Chapter 3 is to show that junk DNA is probably functional. The first part of the chapter reportedly shows that most of our genome is transcribed. The second part addresses sequence conservation.

Here's what Wells says about sequence conservation.
Widespread transcription of non-protein-coding DNA suggests that the RNAs produced from such DNA might serve biological functions. Ironically, the suggestion that much non-protein-coding DNA might be functional also comes from evolutionary theory. If two lineages diverge from a common ancestor that possesses regions of non-protein-coding DNA, and these regions are really nonfunctional, then they will accumulate random mutations that are not weeded out by natural selection. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will probably be very different. [Due to fixation by random genetic drift—LAM] On the other hand, if the original non-protein-coding DNA was functional, then natural selection will tend to weed out mutations affecting that function. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will still be similar. (In evolutionary terminology, the sequences will be "conserved.") Turning the logic around, Darwinian theory implies that if evolutionarily divergent organisms share similar non-protein-coding DNA sequences, those sequences are probably functional.
Wells then references a few papers that have detected such conserved sequences, including the Guttmann et al. (2009) paper mentioned above. They found "over a thousand highly conserved large non-coding RNAs in mammals." Indeed they did, and this is strong evidence of function.1 Every biochemist and molecular biologist will agree. One thousand lncRNAs represent 0.08% of the genome. The sum total of all other conserved sequences is also less than 1%. Wells forgets to mention this in his book. He also forgets to mention the other point that Guttman et al. make; namely, that the lack of sequence conservation suggests that the vast majority of transcripts are non-functional. (Oops!)

There's irony here. We know that the sequences of junk DNA are not conserved and this is taken as evidence (not conclusive) that the DNA is non-functional. The genetic load argument makes the same point. We know that the vast majority of spurious RNA transcripts are also not conserved from species to species and this strongly suggests that those RNAs are not functional. Wells ignores this point entirely—it never comes up anywhere in his book. On the other hand, when a small percentage of DNA (and transcripts) are conserved, this gets prominent mention.

Wells doesn't believe in common ancestry so he doesn't believe that sequences are "conserved." (Presumably they reflect common design or something like that.) Nevertheless, when an evolutionary argument of conservation suits his purpose he's happy to invoke it, while, at the same time, ignoring the far more important argument about lack of conservation of the vast majority of spurious transcripts. Isn't that strange behavior?

The bottom line hear is that Jonathan Wells is correct to point to the ENCODE data as a problem for junk DNA proponents. This is part of the ongoing scientific controversy over the amount of junk in our genome. Where I fault Wells is his failure to explain to his readers that this is disputed data and interpretation. There's no slam-dunk case for function here. In fact, the tide seems to turning more and more against the original interpretation of the data. Most knowledgeable biochemists and molecular biologists do not believe that >90% of our genome is transcribed to produce functional RNAs.

UPDATE: How much of the genome do we expect to be transcribed on a regular basis? Protein-encoding genes account for about 30% of the genome, including introns (mostly junk). They will be transcribed. Other genes produce functional RNAs and together they cover about 3% of the genome. Thus, we expect that roughly a third of the genome will be transcribed at some time during development. We also expect that a lot more of the genome will be transcribed on rare occasions just because of spurious (accidental) transcription initiation. This doesn't count. Some pseudogenes, defective transposons, and endogenous retroviruses have retained the ability to be transcribed on a regular basis. This may account for another 1-2% of the genome. They produce junk RNA.


1. Conservation is not proof of function. In an effort to test this hypothesis Nöbrega et al. (2004) deleted two large regions of the mouse genome containing large numbers of sequences corresponding to conserved non-coding RNAs. They found that the mice with the deleted regions showed no phenotypic effects indicating that the DNA was junk. Jonathan Wells forgot to mention this experiment in his book.

Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly conserved non-coding RNAs in mammals. Nature 458:223-227. [NIH Public Access]

Nörega, M.A., Zhu, Y., Plajzer-Frick, I., Afzal, V. and Rubin, E.M. (2004) Megabase deletions of gene deserts result in viable mice. Nature 431:988-993. [Nature]

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

Saturday, January 16, 2016

Brandeis professor demonstrates his ignorance about junk DNA

Judge Starling (Dan Graur) has alerted me to yet another young biologist who hasn't bothered to study the subject of genomes and junk DNA [An Ignorant Assistant Professor at @BrandeisU Explains “Junk DNA”].

This time it's Assistant Professor of Biology Nelson Lau. He studies Piwi proteins and PiRNAs.

Lau was interviewed by Lawrence Goodman, a science communication officer at Brandeis University: DNA dumpster diving. The subject is junk DNA and you will be astonished at how ignorant Nelson Lau is about a subject that's supposed to be important in his work.

How does this happen? Aren't scientists supposed to be up-to-date on the scientific literature before they pass themselves off as experts? How can an Assistant Professor make such blatantly false and misleading statements about his own area of research expertise? Has he never encountered graduate students, post-docs, or mentors who would have corrected his misconceptions?

Here's the introduction to the interview,
Since the 1960s, it's largely been assumed that most of the DNA in the human genome was junk. It didn't encode proteins -- the main activity of our genes-- so it was assumed to serve no purpose. But Assistant Professor of Biology Nelson Lau is among a new generation of scientists questioning that hypothesis. His findings suggest we've been wrong about junk DNA and it may be time for a reappraisal. If we want to understand how our bodies work, we need to start picking through our genetic garbage.

BrandeisNow sat down with Lau to ask him about his research.
There's nothing wrong with being a "new generation" who questions the wisdom of their elders. That's what all scientists are supposed to do.

But there are certain standards that apply. The most important standard is that when you are challenging other experts you'd better be an expert yourself.
First off, what is junk DNA?
About two percent of our genome carries out functions we know about, things like building our bones or keeping the heart beating. What the rest of our DNA does is still a mystery. Twenty years ago, for want of a better term, some scientists decided to call it junk DNA.
Dan has already addressed this response but let me throw in my own two cents.

There was never, ever, a time when knowledgeable scientists said that all 98% of the DNA that wasn't part of a gene was junk. Not today, not twenty years ago (1996), and not 45 years ago.

There has never been at time since the 1960s when all non-gene DNA was a mystery. It certainly isn't a mystery today. If you don't know this then you better do some reading ... quickly. Google could be your friend, Prof. Lau, it will save you from further embarrassment. Search on "junk DNA" and read everything ... not just the entries that you agree with.

I added a bunch of links at the bottom of this post to help you out.
Is it really junk?
There’s two camps in the scientific community, one that believes it doesn’t do anything and another that believes it’s there for a purpose.

And you’re in the second camp?
Yes. It's true that sometimes organisms carry around excess DNA, but usually it is there for a purpose. Perhaps junk DNA has been coopted for a deeper purpose that we have yet to fully unravel.
It is possible that the extra DNA in our genome has an unknown deeper purpose but right now we have more than enough information to be confident that it's junk. You have to refute or discredit all the work that's been done in the past 40 years in order to be in the second camp.

I strongly suspect that Prof. Lau has not done his homework and he doesn't know the Five Things You Should Know if You Want to Participate in the Junk DNA Debate.

What possible "deep purpose" could this DNA have?
Maybe when junk DNA moves to the right place in our DNA, this could cause better or faster evolution. Maybe when junk genes interacts with the non-junk ones, it causes a mutation to occur so humans can better adapt to changes in the environment.
Most of the undergraduates who took my course could easily refute that argument. I'm guessing that undergraduates in biology at Brandeis aren't as smart. Or maybe they're just too complacent to challenge a professor?

We've got a serious problem here folks. There are scientists being hired at respectable universities who aren't keeping up with the scientific literature in their own field. How does this happen? Are there newly hired biology professors who don't understand evolution?

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]


Sunday, April 06, 2014

The American Society of Plant Biologists embarrasses itself by publishing "New functions for 'junk' DNA?"

Theme Genomes & Junk DNAThe American Society of Plant Biologists has put out a press release with the title New functions for 'junk' DNA?.
Non-coding DNA sequences found in all plants may have undiscovered roles in basic plant development and response to the environment.

DNA is the molecule that encodes the genetic instructions enabling a cell to produce the thousands of proteins it typically needs. The linear sequence of the A, T, C, and G bases in what is called coding DNA determines the particular protein that a short segment of DNA, known as a gene, will encode. But in many organisms, there is much more DNA in a cell than is needed to code for all the necessary proteins. This non-coding DNA was often referred to as "junk" DNA because it seemed unnecessary. But in retrospect, we did not yet understand the function of these seemingly unnecessary DNA sequences.

We now know that non-coding DNA can have important functions other than encoding proteins. Many non-coding sequences produce RNA molecules that regulate gene expression by turning them on and off. Others contain enhancer or inhibitory elements. Recent work by the international ENCODE (Encyclopedia of DNA Elements) Project (1, 2) suggested that a large percentage of non-coding DNA, which makes up an estimated 95% of the human genome, has a function in gene regulation. Thus, it is premature to say that "junk" DNA does not have a function—we just need to find out what it is!
I've sent a link to this post to Tyrone Spady [tspady@aspb.org] who is listed as the contact person at The American Society of Plant Biologists and to Gregory Bertoni [gbertoni@aspb.org] who is listed as Science Editor, The Plant Cell.

I'll keep it simple for them.
  1. "This non-coding DNA was often referred to as "junk" DNA ..." No reputable group of scientists ever said that all non-coding DNA is junk. No scientist who understands genomes would ever say that today. [Stop Using the Term "Noncoding DNA:" It Doesn't Mean What You Think It Means]
  2. "We now know that non-coding DNA can have important functions other than encoding proteins." We have known that for fifty years. Is that what American plant biologists think of as a recent discovery worthy of mention in a 2014 press release? [What's in Your Genome?]
  3. "Recent work by the international ENCODE (Encyclopedia of DNA Elements) Project (1, 2) suggested that a large percentage of non-coding DNA, which makes up an estimated 95% of the human genome, has a function in gene regulation." It is true that the ENCODE Consortium claimed that most of our genome is functional. However, good scientists know that this claim is disputed and the best scientists know that it is wrong. Where does that leave American plant biologists? [Science still doesn't get it] [Ford Doolittle's Critique of ENCODE ]
  4. "Thus, it is premature to say that "junk" DNA does not have a function—we just need to find out what it is!" There is abundant evidence that most of that extra DNA in our genome really is junk. It is not some mysterious black box as you imply. [Non-Darwinian Evolution in 1969: The Case for Junk DNA ] [Five Things You Should Know if You Want to Participate in the Junk DNA Debate]
It's bad enough having to teach biology to creationists but when you also have to teach it to biologists, you know we're in big trouble.


Sunday, October 15, 2023

Only 10.7% of the human genome is conserved

The Zoonomia project aligned the genome sequences of 240 mammalian species and determined that only 10.7% of the human genome is conserved. This is consistent with the idea that about 90% of our genome is junk.

The April 28, 2023 issue of science contains eleven papers reporting the results of a massive study comparing the genomes of 240 mammalian species. The issue also contains a couple of "Perspectives" that comment on the work.

Monday, December 30, 2013

1001 Ideas that Changed the Way We Think - "Not-junk DNA"

Glenn Branch of NCSE alerted me to this book: 1001 Ideas the Changed the Way We Think. He wrote some of the articles [see Creationism and Evolution in 1001 Ideas].

The last great idea that changed the way we think (#1001) is written by Simon Adams, a "historian and writer living and working in London." Simon Adams thinks that the discovery that most of our genome is not junk counts as a big idea. To his credit, Glenn Branch realizes that this is somewhat controversial.

That's putting it mildly. Knowledgeable scientists agree that most (~90%) of our DNA is junk in spite of what the ENCODE publicity campaign might have said back in September 2012. I'm reproducing the article that Simon Adams wrote to show you just how successful that publicity campaign was and how difficult it is for the corrections and rebuttals to make an impact on a gullible public. With apologies to Glenn, whose articles are probably accurate, you should not buy a book that makes such a serious mistake by allowing an amateur to write about genomes, a subject he knows nothing about.
NOT-JUNK DNA
Far more of the human genome has vital functions than was first realized


The ribbons of DNA (deoxyribonucleic acid) in our cells carry instructions for building proteins and thus continuing life, but it was long believed that stretches of them are useless. The idea of "junk DNA" was first formulated by the Japanese-American geneticist Susumu Ohno (1928-2000), writing in the Brookhaven Symposium in Biology in 1972. He argued that the human genome can only sustain a very limited number of genes and that, for the rest, "the importance of doing nothing" was crucial. In effect, he dismissed 98 percent of the total genetic sequence that lies between the 20,000 or so protein-coding genes.

Yet scientists always thought that such junk must have a purpose. And indeed, a breakthrough in 2012 revealed that this junk is in fact crucial to the way our human genome, that is the complete set of genetic information in our cells, actually works.

After mapping of the entire human genome was completed in 2003, scientists focused on the so-called junk DNA. Nine years later, in 2012, the international ENCODE (Encyclopedia of DNA Elements) project published the largest single genome update in Nature and other journals. It found that, far from useless, the so-called junk contained 10,000 genes—around 18% of the total—that help control how the protein-coding genes work. Also found were 4 million regulatory switches that turn genes on and off (it is the failure of these switches that leads to diseases such as type 2 diabetes and Crohn's disease). In total, ENCODE predicted that up to 80 percent of our DNA has some sort of biochemical function.

The discovery of these functioning genes will help scientists to understand common diseases and also to explain why diseases affect some people and not others. If that can be achieved, drugs can be devised to treat those diseases. Much work still needs to be done, but the breakthrough has been made.
The editor of this book is Robert Arp, a philosopher specializing in the philosophy of biology and evolutionary psychology. I assume that he approved of the article by Simon Adams, which means that even philosophers of biology were duped by the ENCODE leaders.1


The book was published on Oct. 29, 2013. That means there was plenty of time to read the critiques of the ENCODE publicity campaign and even the scientific articles that were published last winter and early spring. There's really no excuse for making such a mistake.

Wednesday, September 05, 2012

ENCODE Leader Says that 80% of Our Genome Is Functional

Ed Yong is a science journalist and usually he's a very good one. This time, however, he should have gotten the other side of the story.

Ed interviewed Ewan Birney for a story on the function of sequences in the human genome [ENCODE: the rough guide to the human genome].
According to ENCODE’s analysis, 80 percent of the genome has a “biochemical function”. More on exactly what this means later, but the key point is: It’s not “junk”. Scientists have long recognised that some non-coding DNA probably has a function, and many solid examples have recently come to light. But, many maintained that much of these sequences were, indeed, junk. ENCODE says otherwise. “Almost every nucleotide is associated with a function of some sort or another, and we now know where they are, what binds to them, what their associations are, and more,” says Tom Gingeras, one of the study’s many senior scientists.

And what’s in the remaining 20 percent? Possibly not junk either, according to Ewan Birney, the project’s Lead Analysis Coordinator and self-described “cat-herder-in-chief”. He explains that ENCODE only (!) looked at 147 types of cells, and the human body has a few thousand. A given part of the genome might control a gene in one cell type, but not others. If every cell is included, functions may emerge for the phantom proportion. “It’s likely that 80 percent will go to 100 percent,” says Birney. “We don’t really have any large chunks of redundant DNA. This metaphor of junk isn’t that useful.”
The creationists are going to love this.

You blew it Ed Yong. Why didn't you ask him about the 50% of our genome containing DEFECTIVE transposons and the 2% that's pseudogenes, just for starters? Then you could ask him why he believes that all intron sequences (about 20% of our genome) are functional [What's in Your Genome?].

"Almost every nucleotide ..."? Gimme a break. Don't these guys read the scientific literature?

This is going to make my life very complicated.


Friday, March 27, 2015

Plant biologists are confused about the meanings of junk DNA and genes

A recent issue of Nature contains a report on plant micro-RNAs (Lauressergues et al., 2015). The authors found that certain genes for plant micro-RNAs encoded short peptides in the micro-RNA precursors and those peptides seemed to have a biological function. What this means is that part of the longer precursor RNA that is cleaved to produce the final micro-RNA may have a function that wasn't recognized. If you thought that the part of the precursor that was thought to be discarded as useless junk was, in fact, junk, then you were wrong—at least for some genes.

This is not a big deal and the authors of the paper don't even mention junk DNA.

The paper was reviewed by Peter M. Waterhouse and Roger P. Hellens in the same issue (Waterhouse and Hellens, 2015). They think it's a big deal. Here's what they say,

Friday, August 25, 2017

How much of the human genome is devoted to regulation?

All available evidence suggests that about 90% of our genome is junk DNA. Many scientists are reluctant to accept this evidence—some of them are even unaware of the evidence [Five Things You Should Know if You Want to Participate in the Junk DNA Debate]. Many opponents of junk DNA suffer from what I call The Deflated Ego Problem. They are reluctant to concede that humans have about the same number of genes as all other mammals and only a few more than insects.

One of the common rationalizations is to speculate that while humans may have "only" 25,000 genes they are regulated and controlled in a much more sophisticated manner than the genes in other species. It's this extra level of control that makes humans special. Such speculations have been around for almost fifty years but they have gained in popularity since publication of the human genome sequence.

In some cases, the extra level of regulation is thought to be due to abundant regulatory RNAs. This means there must be tens of thousand of extra genes expressing these regulatory RNAs. John Mattick is the most vocal proponent of this idea and he won an award from the Human Genome Organization for "proving" that his speculation is correct! [John Mattick Wins Chen Award for Distinguished Academic Achievement in Human Genetic and Genomic Research]. Knowledgeable scientists know that Mattick is probably wrong. They believe that most of those transcripts are junk RNAs produced by accidental transcription at very low levels from non-conserved sequences.

Thursday, December 22, 2022

Junk DNA, TED talks, and the function of lncRNAs

Most of our genome is transcribed but so far only a small number of these transcripts have a well-established biological function.

The fact that most of our genome is transcribed has been known for 50 years but that fact only became widely known with the publication of ENCODE's preliminary results in 2007 (ENCODE, 2007). The ENOCDE scientists referred to this as "pervasive transription" and this label has stuck.

By the end of the 1970s we knew that much of this transcription was due to introns. The latest data shows that protein coding genes and known noncoding genes occupy about 45% of the genome and most of that is intron sequences that are mostly junk. That leaves 30-40% of the genome that is transcribed at some point producing something like one million transcripts of unknown function.

Thursday, April 11, 2013

Educating an Intelligent Design Creationist: The Specificity of DNA Binding Proteins

I'm replying to a post by andyjones (More and more) Function, the evolution-free gospel of ENCODE. This was the fourth post in a series and I'm working my way through five issues that Intelligent Design Creationists need to understand. The first two were "Pervasive Transcription" and "Rare Transcripts."

Educating an Intelligent Design Creationist: Introduction
Educating an Intelligent Design Creationist: Pervasive Transcription
Educating an Intelligent Design Creationist: Rare Transcripts

The Specificity of DNA Binding Proteins

It is absolutely essential that you understand the basic biochemistry of DNA binding proteins if you want to interpret the ENCODE results and the controversy surrounding junk DNA. You might think this is a given since almost everyone involved in the discussion has had some exposure to biochemistry in undergraduate courses. Unfortunately, most of these courses don't teach that stuff anymore1 so we've raised a generation of scientists who were never exposed to the facts.

Friday, May 08, 2015

Ford Doolittle talks about transposons, junk DNA, ENCODE, and how science should work

Here's more from the interview with Ford Doolittle [The Philosophical Approach: An Interview with Ford Doolittle].
Gitschier: I want to close with what you describe as your “latest rant.” How did you get on function?

Doolittle: Well, I’ve always been on that.

Back in 1980, people were talking about transposable elements as if their function was to speed evolution; that they exist because of their future utility. And I’ve never liked that kind of idea. I didn’t like it in terms of introns. And Dawkins had just published The Selfish Gene in 1978.

Carmen Sapienza, a student of mine who now works on eukaryotic imprinting, and I wrote a paper which was rejected by Science after seven referees. But we heard that Leslie Orgel and Francis Crick were working on something like this, so we sent it to them. They said, “If you submit it to Nature, we will tell Nature not to publish ours without publishing yours, and to publish yours first,” etc., which was very nice.

That paper, seemingly now very simplistic, said you don’t need to suppose that transposable elements are there for the purpose of speeding evolution. These are selfish things, and natural selection will favor such elements that can make copies of themselves in genomes and then spread horizontally to other genomes within the species. These are basically parasites. I think many people would now accept this, but it was radical at the time.

People don’t like to think that the human genome has junk in it. This came back when the ENCODE papers came out a few years ago and were touted as spelling the “demise of junk DNA.” That got my dander up.

I wrote a perspective in PNAS, and Dan Graur had a much more vituperative thing in Genome Biology and Evolution. I don’t think the ENCODE people have given up; they had a kind of semi-apology in PNAS, which wasn’t really an apology.

It is the same as the tree of life issue, but until we actually have some agreement about what we mean by words we are going to get into these arguments, and in my mind, there are two devastating things you can say about the ENCODE people.

One is that they completely ignored all that history about junk DNA and selfish DNA. There was a huge body of evidence that excess DNA might serve some structural role in the chromosomes, but not informational. They also ignored what philosophers of biology have spent a lot of time asking: what do you mean by “function?” And you can mean one of two things: we might mean either what natural selection favored, which is what I think most biologists mean, or we might mean what it does. Some people might say, “Well the function of this gene is in the development of cancer,” but they don’t really mean that natural selection put it there so that it would cause cancer. These are not-so-subtle differences.

I think many molecular biologists and genomicists, in particular, think that each and every nucleotide is there for a reason, that we are perfect organisms. It is almost as if we were still theists thinking God doesn’t make junk; we just now think natural selection doesn’t make junk. I think there is a deep issue about the extent to which we are noisy creatures and the extent to which we are finely honed machines. I think the latter view informs much of genomics, and I think it is false.

ENCODE wouldn’t have got funded had they said 80% of the human genome is just junk, transposable elements.

Gitschier: It is justifying itself, post hoc. They are the big players with a lot of money. It’s like a machine—“We can do it, so let’s just do it!”

Doolittle: It’s a juggernaut is what you are saying.

My other objection is that it is false ontology. I think all of our science suffers not only from the big science motivation, but from what I call “positivism.”

A couple of times we submitted papers saying, “Everybody’s doing something this way, and it doesn’t work, and it is wrong to do it this way.” And Nature would write back, “We’re not interested in negative reports like this. What does work?” And we say, “We don’t give a damn what does work, it is important to know that what people are doing now is not working.”

There is no critique in science, very little. You can’t actually say, “This doesn’t mean what people say it means.” You’ve got to be “positive;” you’ve got to be moving the program forward all the time. I don’t think that is right.

Now, and down the road, we’re going to tackle directly relevant questions, like what is the meaning of function in the concept of genomics? There are legitimate evolutionary constructs in which you can address transposable elements, and people have not really explored that. Questions about the tree of life, again, and some of the questions we’ve been through are things that continue to interest me and which have a strong philosophical component as well as a data-related component. That’s what I’m interested in pursuing.


Thursday, June 22, 2017

Are most transcription factor binding sites functional?

The ongoing debate over junk DNA often revolves around data collected by ENCODE and others. The idea that most of our genome is transcribed (pervasive transcription) seems to indicate that genes occupy most of the genome. The opposing view is that most of these transcripts are accidental products of spurious transcription. We see the same opposing views when it comes to transcription factor binding sites. ENCODE and their supporters have mapped millions of binding sites throughout the genome and they believe this represent abundant and exquisite regulation. The opposing view is that most of these binding sites are spurious and non-functional.

The messy view is supported by many studies on the biophysical properties of transcription factor binding. These studies show that any DNA binding protein has a low affinity for random sequence DNA. They will also bind with much higher affinity to sequences that resemble, but do not precisely match, the specific binding site [How RNA Polymerase Binds to DNA; DNA Binding Proteins]. If you take a species with a large genome, like us, then a typical DNA protein binding site of 6 bp will be present, by chance alone, at 800,000 sites. Not all of those sites will be bound by the transcription factor in vivo because some of the DNA will be tightly wrapped up in dense chromatin domains. Nevertheless, an appreciable percentage of the genome will be available for binding so that typical ENCODE assays detect thousand of binding sites for each transcription factor.

This information appears in all the best textbooks and it used to be a standard part of undergraduate courses in molecular biology and biochemistry. As far as I can tell, the current generation of new biochemistry researchers wasn't taught this information.

Saturday, April 27, 2013

DNA: Nature Celebrates Ignorance

Some freelance science writer named Philip Ball has published an article in the April 25, 2013 issue of Nature: Celebrate the Unknowns.

The main premise of the article is revealed in the short blurb under the title: "On the 60th anniversary of the double helix, we should admit that we don't fully understand how evolution works at the molecular level, suggests Philip Ball."

What nonsense! We understand a great deal about how evolution works at the molecular level. Perhaps Philip Ball meant to say that we don't understand the historical details of how a particular genome evolved, but even that's misleading.

I've commented before on articles written by Philip Ball. In the past, he appeared to be in competition with Elizabeth Pennisi of Science for some kind of award for misunderstanding the human genome.

SEED and the Central Dogma of Molecular Biology - I Take Back My Praise
Shoddy But Not "Junk"?

Let's look at what the article says ...

Sunday, November 01, 2015

Florabama speaks

I've been trying to argue a few points on the creationist blogs but I have to admit that I'm not making any progress at all. Even the simplest, most obvious, points are vigorously contested by the ID crowd over there.

My latest attempt was on the post, Suzan Mazur’s Paradigm Shifters is now available from Amazon, where I tried to explain that Denyse O'Leary's version of Darwinism is not the best description of evolutionary theory and that many of Suzan Mazur's "Paradigm Shifters" have missed the revolution that occurred in the late 1960s.

Didn't work.

Now someone named "Florabama" has posted a comment that illustrates the problem we're up against. I thought I'd share it with Sandwalk readers. It may not be possible to teach such a person anything about science.