Sean Eddy is a old—well not too old—talk.origins fan. (Hi Sean!).
Because he's had all that training in how to think correctly, he gets the difference between junk DNA and functional DNA. Read his post at: ENCODE says what? (C'est what?).
Think about your answer to the Random Genome Project thought experiment.
So a-ha, there’s the real question. The experiment that I’d like to see is the Random Genome Project. Synthesize a hundred million base chromosome of entirely random DNA, and do an ENCODE project on that DNA. Place your bets: will it be transcribed? bound by DNA-binding proteins? chromatin marked?
Of course it will.
The Random Genome Project is the null hypothesis, an essential piece of understanding that would be lovely to have before we all fight about the interpretation of ENCODE data on genomes. For random DNA (not transposon-derived DNA, not coding, not regulatory), what’s our null expectation for all these “functional” ENCODE features, by chance alone, in random DNA?
(Hat tip to The Finch and Pea blog, a great blog that I hadn’t seen before the last few days, where you’ll find essentially the same idea.)
Most of a person’s genetic risk for common diseases such as diabetes, asthma and hardening of the arteries appears to lie in the shadowy part of the human genome once disparaged as “junk DNA.”
Indeed, the vast majority of human DNA seems to be involved in maintaining individuals’ well being — a view radically at odds with what biologists have thought for the past three decades.
Those are among the key insights of a nine-year project to study the 97 percent of the human genome that’s not, strictly speaking, made up of genes.
The Encyclopedia of DNA Elements Project, nicknamed Encode, is the most comprehensive effort to make sense of the totality of the 3 billion nucleotides that are packed into our cells.
The project’s chief discovery is the identification of about 4 million sites involved in regulating gene activity. Previously, only a few thousand such sites were known. In all, at least 80 percent of the genome appears to be active at least sometime in our lives. Further research may reveal that virtually all of the DNA passed down from generation to generation has been kept for a reason.
“This concept of ‘junk DNA’ is really not accurate. It is an outdated metaphor,” said Richard Myers of the HudsonAlpha Institute for Biotechnology in Alabama.
Myers is one of the leaders of the project, involving more than 400 scientists at 32 institutions.
Another Encode leader, Ewan Birney of the European Bioinformatics Institute in Britain, said: “The genome is just alive with stuff. We just really didn’t realize that beforehand.”
“What I am sure of is that this is the science for this century,” he said. “In this century, we will be working out how humans are made from this instruction manual.”
This is wrong. Most of our genome is still junk in spite of what the ENCODE Consortium says.
Who is Richard Myers and where did he get the idea that the concept of junk DNA is an outdated metaphor? Does he have an explanation for all the evidence his statement refutes?
Here's the important question. Who is going to take responsibility for this PR fiasco?
Brendan Maher is a Feature Editor for Nature. He wrote a lengthy article for Nature when the ENCODE data was published on Sept. 5, 2012 [ENCODE: The human encyclopaedia]. Here's part of what he said,
After an initial pilot phase, ENCODE scientists started applying their methods to the entire genome in 2007. Now that phase has come to a close, signalled by the publication of 30 papers, in Nature, Genome Research and Genome Biology. The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes.
I expect encyclopedias to be much more accurate than this.
As most people know by now, there are many of us who challenge the implication that 80% of the genome has a function (i.e it's not junk).1 We think the Consortium was not being very scientific by publicizing such a ridiculous claim.
The main point of Maher's article was that the ENCODE results reveal a huge network of regulatory elements controlling expression of the known genes. This is the same point made by the ENCODE researchers themselves. Here's how Brendan Maher expressed it.
The real fun starts when the various data sets are layered together. Experiments looking at histone modifications, for example, reveal patterns that correspond with the borders of the DNaseI-sensitive sites. Then researchers can add data showing exactly which transcription factors bind where, and when. The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being.
I think that much of this hype comes from a problem I've called The Deflated Ego Problem. It arises because many scientists were disappointed to discover that humans have about the same number of genes as many other species yet we are "obviously" much more complex than a mouse or a pine tree. There are many ways of solving this "problem." One of them is to postulate that humans have a much more sophisticated network of control elements in our genome. Of course, this ignores the fact that the genomes of mice and trees are not smaller than ours.
For decades we've known that less than 2% of the human genome consists of exons and that protein encoding genes represent more than 20% of the genome. (Introns account for the difference between exons and genes.) [What's in Your Genome?]. There are about 20,500 protein-encoding genes in our genome and about 4,000 genes that encode functional RNAs for a total of about 25,000 genes [Humans Have Only 20,500 Protein-Encoding Genes]. That's a little less than the number predicted by knowledgeable scientists over four decades ago [False History and the Number of Genes]. The definition of "gene" is somewhat open-ended but, at the very least, a gene has to have a function [Must a Gene Have a Function?].
We've known about all kinds of noncoding DNA that's functional, including origins of replication, centromeres, genes for functional RNAs, telomeres, and regulatory DNA. Together these functional parts of the genome make up almost 10% of the total. (Most of the DNA giving rise to introns is junk in the sense that it is not serving any function.) The idea that all noncoding DNA is junk is a myth propagated by scientists (and journalists) who don't know their history.
We've known about the genetic load argument since 1968 and we've known about the C-Value "Paradox" and it's consequences since the early 1970's. We've known about pseudogenes and we've known that almost 50% of our genome is littered with dead transposons and bits of transposons. We've known that about 3% of our genome consists of highly repetitive DNA that is not transcribed or expressed in any way. Most of this DNA is functional and a lot of it is not included in the sequenced human genome [How Much of Our Genome Is Sequenced?]. All of this evidence indicates that most of our genome is junk. This conclusion is consistent with what we know about evolution and it's consistent with what we know about genome sizes and the C-Value "Paradox." It also helps us understand why there's no correlation between genome size and complexity.
I'm really interested in science education and I'd love to see improvements so that we can begin to create a scientifically literate society. Although I'm not an American, I'm quite interested in the views of American politicians because they can have a huge influence on science education.
That's why I was looking forward to seeing what Barack Obama and Mitt Romney had to say about science. Do they personally believe in evolution? Do they understand that homeopathy is useless? Do they think that science conflicts with their religious beliefs? Do they personally believe that the universe began almost 14 billion years ago with a Big Bang? Do they understand what causes earthquakes? Can they tell us why the discovery of the Higgs boson was important? Do they know what a gene is? Can they personally tell us in a few sentences how an eclipse of the sun occurs? Do they understand the concept of a chemical reaction?
I don't agree with everyhting Richard Dawkins says in this video but he's got the important parts right. Notice that he doesn't stoop to calling them IDiots, like I do. He uses other words.
David Ropeik identifies himself as an "international consultant in risk perception and risk communication, and an Instructor in the Environmental Management Program at the Harvard University Extension School." His blog is soapbox science on Nature Blogs.
In what should be another blow to the hubris of human intellect, we have a new entry in the long and ever growing list of “Really Big Things Scientists Believed” that turned out be wrong. This one is about DNA, that magical strand of just four amino acids, Adenine paired with Thymine, Cytosine paired with Guanine, millions of those A-T and C-G pairs linked together in various combinations to make the genes that spit out the blueprints for the proteins that make us. Or so science believed.
The problem was that, the ‘genes’ sections of DNA that coded for proteins only came to about 1.5% of the whole 2 meter-long strand. For decades molecular biologists didn’t know what the rest of the DNA…as in, nearly all of it…does. So, in a remarkable stroke of intellectual arrogance, they dismissed it as ‘junk’. Actually, the drier academics simply called it ‘non-coding DNA’. A Japanese scientist named Susumu Ohno called it junk, and the word stuck because, basically, scientists had no explanation for what most of DNA was for. So they assumed it was left over from evolution, had no current function, and was, literally, junk. As Francis Crick, one of the Nobel Prize winners for helping discover the structure of DNA, put it, non-coding DNA has “little specificity and conveys little or no selective advantage to the organism”. Right. As though nature would waste that much energy.
Well, there’s going to be a lot of editing on Wikipedia in the days and weeks to come, and it’s time to reprint the basic biology textbooks, because extensive research into the mystery of what most of DNA is doing there has discovered that the ‘junk’ isn’t junk at all. Most of it has all sorts of jobs. Science Journalist Ed Yong has written a wonderful summary of this work here.
As I said earlier, this is making my life very complicated. It's going to take a lot of effort to undo the damage caused by the ENCODE scientists and the science writers who fell for their scam.
Here's an excellent example of what's wrong with the way the ENCODE Consortium is interpreting their data. Congratulations to Michael Eisen! I wish I had said this: A neutral theory of molecular function.1
Read the whole thing very carefully and heed the lesson. Here's a excerpt,
I think a lot about Kimura, the neutral theory, and the salutary effects of clear null models every time I get involved in discussions about the function, or lack thereof, of biochemical events observed in genomics experiments, such as those triggered this week by publications from the ENCODE project.
It is easy to see the parallels between the way people talk about transcribed RNAs, protein-DNA interactions, DNase hypersensitive regions and what not, and the way people talked about sequence changes PK (pre Kimura). While many of the people carrying out RNA-seq, ChIP-seq, CLIP-seq, etc… have been indoctrinated with Kimura at some point in their careers, most seem unable to apply his lesson to their own work. The result is a field suffused with implicit or explicit thinking along the following lines:
I observed A bind to B. A would only have evolved to bind to B if it were doing something useful. Therefore the binding of A to B is “functional”.
One can understand the temptation to think this way. In the textbook view of molecular biology, everything is highly regulated. Genes are transcribed with a purpose. Transcription factors bind to DNA when they are regulating something. Kinases phosphorylate targets to alter their activity or sub-cellular location. And so on. Although there have always been lots of reasons to dismiss this way of thinking, until about a decade ago, this is what the scientific literature looked like. In the day where papers described single genes and single interactions, who would bother to publish a paper about a non-functional interaction they observed?
But experimental genomics blew this world of Mayberry molecular biology wide open. For example, when Mark Biggin and I started to do ChIP-chip experiments in Drosophila embryos, we found that factors were binding not just to their dozen or so non-targets, but the thousands, and in some cases tens of thousands of places across the genome. Having studied my Kimura, I just assumed that the vast majority of these interactions had evolved by chance – a natural, essential, consequence of the neutral fixation of nucleotide changes that happened to create transcription factor binding sites. And so I was shocked that almost everyone I talked to about this data assumed that every one of these binding events was doing something – we just hadn’t figured out what yet.
.....
Rather than assuming – as so many of the ENCODE researchers apparently do – that the millions (or is it billions?) of molecular events they observe are a treasure trove of functional elements waiting to be understood, they should approach each and every one of them with Kimurian skepticism. We should never accept the existence or a molecule or the observation that it interacts with something as prima facia evidence that it is important. Rather we should assume that all such interactions are non-functional until proven otherwise, and develop better, compelling, ways to reject this null hypothesis.
Read the comments, especially the one from former colleague Chris Hogue on how to interpret phosphorylation of proteins and signal transduction. That's not going to be popular in my department!
I just have one small quibble with Michael's post. Not all textbooks describe the cell as if it were a finely tuned Swiss watch and not all textbooks take an adaptationist approach to evolution. Mine doesn't.
1. As a result of this post I've now relegated Jonathan Eisen to "brother of Michael Eisen" rather than the other way around. Sorry, Jonathan.
The Nature issue containing the latest ENCODE Consortium papers also has a New & Views article called "Genomics: ENCODE explained" (Ecker et al., 2012). Some of these scientist comment on junk DNA.
For exampleshere's what Joseph Ecker says,
One of the more remarkable findings described in the consortium's 'entrée' paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA'. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA's transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles.
And here's what Inês Barroso, says,
The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of 'useless' DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.
If this were an undergraduate course I would ask for a show of hands in response to the question, "How many of you thought that there did not seem to be "defined gene-regulatory elements" in noncoding DNA?"
I would also ask, "How many of you have no idea how evolution could retain "useless" DNA in our genome?" Undergraduates who don't understand evolution should not graduate in a biological science program. It's too bad we don't have similar restrictions on senor scientists who write News & Views articles for Nature.
Jonathan Pritchard and Yoav Gilad write,
One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.
There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage.
However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a 'parts list' of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.
The problem here is the hype. While it's true that the ENCODE project has produced massive amounts of data on transcription binding sites etc., it's a bit of an exaggeration to say that "until now there has been little information about which genomic regions have regulatory activity." Twenty-five years ago, my lab published some pretty precise information about the parts of the genome regulating activity of a mouse hsp70 gene. There have been thousands of other papers on the the subject of gene regulatory sequences since then. I think we actually have a pretty good understanding of gene regulation in eukaryotes. It's a model that seems to work well for most genes.
The real challenge from the ENCODE Consortium is that they question that understanding. They are proposing that huge amounts of the genome are devoted to fine-tuning the expression of most genes in a vast network of binding sites and small RNAs. That's not the picture we have developed over the past four decades. If true, it would not only mean that a lot less DNA is junk but it would also mean that the regulation of gene expression is fundamentally different than it is in E. coli.
Retroviruses are RNA viruses that go though a stage where their RNA genomes are copied into DNA by reverse transcriptase. The virus may integrate into the host genome and be carried along for many generations producing low levels of virus particles [Retrotransposons/Endogenous Retroviruses ]. Most of these events will occur in somatic cells so the integrated virus is not passed along to progeny but from time to time the virus integrates into germ line DNA and this is heritable.
There are 31 such events in our lineage, meaning that we have copies of 31 different retroviruses in our genome. The retroviruses may have produced copies in germ line DNA such that each of the 31 retroviruses is now represented by a family of sequences scattered throughout the genome. Today, these retrovirus sequences represent a total of 8% of our genome! That's over 200,000,000 base pairs of DNA. There are about 100 thousand different sites.1
There's no selective pressure to maintaining the functionality of these retrovirus sequences so, as you might have guessed, most of them have accumulated mutations over millions of years. (The original insertion events took place at various times ranging from 100 million years ago to only a few million years ago.) Almost all of the 8% consists of defective retrovirus sequences. It's junk.2
But it's a special kind of junk because retrovirus DNA has strong promoters that bind various transcription factors and the flanking enhancers ensure that the region around these promoters will be in open chromatin regions that have all the characteristics of real promoter sites. A substantial proportion of the defective retroviruses will still produce transcripts because the promoter region may not be mutated even though there may be lethal mutations elsewhere in the sequence.
What does this mean? It means that there will be thousands of junk DNA sites that bind transcription factors and RNA polymerase and may even be transcribed. When you're doing whole genome analyses, like those in the ENCODE study, you need to be careful to distinguish between functional promoters and non-functional promoters.
1. The typical retrovirus genome is about 3,000 bp in length but many of the defective retrotransposon sequences have been are truncated by deletions.
2. Except for an extremely small number that might have acquired a secondary function such as enhancing expression of a nearby gene.
Welcome to the 51st Carnival of Evolution, henceforth known as Darwin’s Restaurant. You may have noticed that the motto of my blog is ‘Science—there’s something for everyone.’ Well, that’s also true of evolution. Whether your passion is transitional fossils or reducible complexity, you’ll find something tasty on this menu.
There's some cool stuff this month.
The next Carnival of Evolution (September) will be hosted by The Genealogical World of Phyogenetic Networks. If you want to volunteer to host others, contact Bjørn Østman. Bjørn is always looking for someone to host the Carnival of Evolution. He would prefer someone who has not hosted before. Contact him at the Carnival of Evolution blog. You can send articles directly to him or you can submit your articles at Carnival of Evolution.
CFI is sponsoring a conference in Ottawa (Ontario, Canada) on "Celebrating Reason at the End of the World." The title of the conference is Eschaton 2012. Go to the website to find out what "eschaton" means and how to pronounce it.
The meetings will take place from Friday Nov. 30 to Sunday Dec. 2 at a hotel in downtown Ottawa. The list of prominent speakers includes ...
PZ Myers (Biologist and author of the Pharyngula Blog)
Eugenie Scott (Executive Director of National Center for Science Education)
Ophelia Benson (Columnist for Free Inquiry magazine and author of Butterflies and Wheels Blog)
Christopher DiCarlo (Philosopher of Science and author of How to Become a Really Good Pain in the Ass
There's a whole bunch of less prominent speakers as well. My talk will be on Saturday morning. It's titled "Scientists vs IDiots."
Later on I'll be on a panel about science education with PZ Myers and Eugenie Scott. This should be lots of fun.
See you there! We'll eat poutine and beaver tails.
Readers will likely recall the ENCODE project, published in a series of papers in 2007, in which (among other interesting findings) it was discovered that, even though the vast majority of our DNA does not code for proteins, the human genome is nonetheless pervasively transcribed into mRNA. The science media and blogosphere is now abuzz with the latest published research from the ENCODE project, the most recent blow to the “junk DNA” paradigm. Since the majority of the genome being non-functional (as has been claimed by many, including notably Larry Moran, P.Z. Myers, Nick Matzke, Jerry Coyne, Kenneth Miller and Richard Dawkins) would be surprising given the hypothesis of design, ID proponents have long predicted that function will be identified for much of our DNA that was once considered to be useless. In a spectacular vindication of this hypothesis, six papers have been released in Nature, in addition to a further 24 papers in Genome Research and Genome Biology, plus six review articles in The Journal of Biological Chemistry.
...
This new research places a dagger through the heart of the junk DNA paradigm, and should give adherents to this out-dated assumption yet further cause for caution before they write off DNA, for which function has yet to be identified, as “junk".
Not much I can say right now. I'm up to my ears trying to convince sane people that the ENCODE papers are wrong. The IDiots are just going to have to wait.