More Recent Comments

Showing posts with label Genome. Show all posts
Showing posts with label Genome. Show all posts

Tuesday, February 09, 2016

Junk DNA doesn't exist according to "Conceptual Revolutions in Science"

The blog "Conceptual Revolutions in Science" only publishes "evidence-based, paradigm-shifting scientific news" according to their home page.

The man behind the website is Adam B. Dorfman (@DorfmanAdam). He has an MBA from my university and he currently works at a software company. Here's how he describes himself on the website.

Thursday, January 28, 2016

"The Selfish Gene" turns 40

Richard Dawkins published The Selfish Gene 40 years ago and Matt Ridley notes the anniversary in a Nature article published today (Jan. 28, 2016): In retrospect: The selfish gene.

I don't remember when I first read it—probably the following year when the paperback version came out. I found it quite interesting but I was a bit put off by the emphasis on adaptation (taken from George Williams) and the idea of inclusive fitness (from W.D. Hamilton). I also didn't much like the distinction between vehicles and replicators and the idea that it was the gene, not the individual, that was the unit of selection ("selection" not "evolution").
It is finally time to return to the problem with which we started, to the tension between individual organism and gene as rival candidates for the central role in natural selection...One way of sorting this whole matter out is to use the terms ‘replicator’ and ‘vehicle’. The fundamental units of natural selection, the basic things that survive or fail to survive, that form lineages of identical copies with occasional random mutations, are called replicators. DNA molecules are replicators. They generally, for reasons that we shall come to, gang together into large communal survival machines or ‘vehicles’.

Richard Dawkins

Tuesday, January 19, 2016

All about Craig

The sequence of the human genome was announced on June 26, 2000 although the actual sequence wasn't published until a year later. There were two sequences. One was the product of the International Human Genome Project led by Francis Collins who said,
"It is humbling for me and awe-inspiring to realize that we have caught the first glimpse of our own instruction book, previously known only to God."
The sequence was a composite of a number of individuals.

The second sequence was from Celera Genomics, led by Craig Venter. It was mostly his genome, making him the second being to know his own instruction book ... right after God.

It took another seven years to finish and publish the complete sequence of all of Craig Venter's chromosomes. The paper was published in PLoS Biology (Levy et al., 2007) and highlighted in a Nature News article: All about Craig: the first 'full' genome sequence.

What's unique about this genome sequence—other than the fact that it's God's Craig Venter's—is that all 46 chromosomes were sequenced. In other words, enough data was generated to put together separate sequences of each pair of chromosomes. That produces some interesting data.

There were 4.1 million differences between homologous chromosomes (22 autosomes). 78% of these events were single nucleotide polymorphisms (SNPs). The rest were indels (insertions and deletions) and these accounted for 0.9 million nucleotides. Thus, indels made up 74% of the total number of variant nucleotide sequence.

In addition, there were 62 copy number variants (duplication) accounting for an additional 10Mb of variation between haploid sets of chromosomes. The total number of nucleotide differences is 13.9Mb when you add up all the indels, SNPs, and duplications. The two haploid genomes differ by about 0.5% by this calculation (total amount sequenced was 2,895Mb).

When the two copies of all annotated genes were compared, it turned out that 44% were heterozygous—the two copies were not identical.

Craig Venter's genome sequence differs from the composite human reference genome at 4,118,889 positions. Most of these were already known as variants in the human population but 31% were new variants (in 2007).

Venter has written about his genome sequence in A Life Decoded. He has variants in his APOE gene sequence that are associated with Alzheimer's and cardiovascular diseases. He has variants in his SORL1 that also make him at risk for Alzheimer's according to 2007 data. Just about everyone who gets their genomes sequenced will find variants that put them at greater risk for some genetic disease.


Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., Walenz, B.P., Axelrod, N., Huang, J., Kirkness, E.F., Denisov, G., Lin, Y., MacDonald, J.R., Pang, A.W.C., Shago, M., Stockwell, T.B., Tsiamouri, A., Bafna, V., Bansal, V., Kravitz, S.A., Busam, D.A., Beeson, K.Y., McIntosh, T.C., Remington, K.A., Abril, J.F., Gill, J., Borman, J., Rogers, Y.-H., Frazier, M.E., Scherer, S.W., Strausberg, R.L., and Venter, J.C. (2007) The diploid genome sequence of an individual human. PLoS Biol, 5(10), e254. [doi: 10.1371/journal.pbio.0050254]

Sunday, January 17, 2016

Origin of de novo genes in humans

We know quite a lot about the origin of new genes (Carvunis et al., 2012; Kaessman, 2010; Long et al., 2003; Long et al., 2013; Näsvall et al., 2012); Neme and Tautz, 2013; Schlötterer, 2015; Tautz and Domazet-Lošo (2011); Wu et al., 2011). Most of them are derived from gene duplication events and subsequent divergence. A smaller number are formed de novo from sequences that were not part of a gene in the ancestral species.

In spite of what you might have read in the popular literature, there are not a large number of newly formed genes in most species. Genes that appear to be unique to a single species are called "orphan" genes. When a genome is first sequenced there will always be a large number of potential orphan genes because the gene prediction software tilts toward false positives in order to minimize false negatives. Further investigation and annotation reduces the number of potential genes.

Nelson Lau responds to my criticism of his comments about junk DNA

I criticized Nelson Lau for comments he made about the junk DNA debate [Brandeis professor demonstrates his ignorance about junk DNA].

Here is his response,
Dear Dr. Graur and Dr. Moran,

Thanks for reading the commentary on my university’s communication page, hastily written for brevity and digestibility by me and our science communication officer, Lawrence Goodman. I was originally hoping the piece could focus on my latest research, but it turned into this sort of general Q&A chat. The commentary was written rather quickly and meant for a general audience perusing Brandeis research, so it is obviously not a peer-reviewed scientific publication.

I am well aware of both your reputations as fiery critics and experts of evolutionary biology, and you have somewhat of a following on the internet. Some of your earlier blog posts have been entertaining and even on point regarding how big projects like ENCODE have over-hyped the functional proportions of our genomes. So, it does NOT surprise me one bit that I would become your latest vitriolic target in your posts here, and here.

Could I learn more from you two about evolutionary biology theory? Indeed, I could. Can we revise our Q&A commentary to be more scientifically accurate while still being digestible to a general audience? Perhaps, if we have the time and I survive my tenure review, we may do so and take your input into consideration. Why respond and risk another snarky post from you guys? I could care less about your trivial blog critiques when I’ve received plenty of grants and paper rejections that cut much deeper into my existence as a young academic struggling to survive when the academic track has never been more challenging (<10% grant success rates at NIH, NSF, CIHR, etc).

I’m responding to ask that both of you reflect on the message your posts are sending to students and postdocs. As a young scientist, having a chat with my university PR rep, I have to now think twice about two senior tenured professors slamming my scientific credibility on your internet soapbox without a single direct email to me. How passive-aggressive!

Your message is saying that Academic science even less inviting to young scientists as it is, with faculty positions and grants falling way short of demand, and the tough sacrifices every young scientist is already making for the craft that we love. If we condone this type of sniping behavior, why would any young scientist want to learn and discuss with the older scientists of your generation?

The Science Blogosphere, Twittersphere, and the Open Data movements are the next generation of platforms for science communication, and I commend you two for being vocal contributors to these platforms quite early on. However, I also recently wrote a guest post on Bjorn Bremb’s blog arguing that for open data and discussion to work, we scientists need to uphold decorum and civility.

A direct email from you to me expressing your scientific concerns of our commentary would have been a better way to go. I am willing to stand corrected. Your blog posts, however, are disappointing and appear petty to me. Let’s all set a better example here for our trainees.

If you wish to post this response verbatim on your blogs, go ahead, since I had thought of posting this response on your blog’s comments section. But to follow my own advice, I’ll try a direct email to you first. And if I don’t hear back from you, I may then ask my friend Bjorn to help me post this on his blog.

Thank you for reading this till the end,

Nelson

Nelson Lau, Ph.D.
Assistant Professor - Biology


Saturday, January 16, 2016

Brandeis professor demonstrates his ignorance about junk DNA

Judge Starling (Dan Graur) has alerted me to yet another young biologist who hasn't bothered to study the subject of genomes and junk DNA [An Ignorant Assistant Professor at @BrandeisU Explains “Junk DNA”].

This time it's Assistant Professor of Biology Nelson Lau. He studies Piwi proteins and PiRNAs.

Lau was interviewed by Lawrence Goodman, a science communication officer at Brandeis University: DNA dumpster diving. The subject is junk DNA and you will be astonished at how ignorant Nelson Lau is about a subject that's supposed to be important in his work.

How does this happen? Aren't scientists supposed to be up-to-date on the scientific literature before they pass themselves off as experts? How can an Assistant Professor make such blatantly false and misleading statements about his own area of research expertise? Has he never encountered graduate students, post-docs, or mentors who would have corrected his misconceptions?

Here's the introduction to the interview,
Since the 1960s, it's largely been assumed that most of the DNA in the human genome was junk. It didn't encode proteins -- the main activity of our genes-- so it was assumed to serve no purpose. But Assistant Professor of Biology Nelson Lau is among a new generation of scientists questioning that hypothesis. His findings suggest we've been wrong about junk DNA and it may be time for a reappraisal. If we want to understand how our bodies work, we need to start picking through our genetic garbage.

BrandeisNow sat down with Lau to ask him about his research.
There's nothing wrong with being a "new generation" who questions the wisdom of their elders. That's what all scientists are supposed to do.

But there are certain standards that apply. The most important standard is that when you are challenging other experts you'd better be an expert yourself.
First off, what is junk DNA?
About two percent of our genome carries out functions we know about, things like building our bones or keeping the heart beating. What the rest of our DNA does is still a mystery. Twenty years ago, for want of a better term, some scientists decided to call it junk DNA.
Dan has already addressed this response but let me throw in my own two cents.

There was never, ever, a time when knowledgeable scientists said that all 98% of the DNA that wasn't part of a gene was junk. Not today, not twenty years ago (1996), and not 45 years ago.

There has never been at time since the 1960s when all non-gene DNA was a mystery. It certainly isn't a mystery today. If you don't know this then you better do some reading ... quickly. Google could be your friend, Prof. Lau, it will save you from further embarrassment. Search on "junk DNA" and read everything ... not just the entries that you agree with.

I added a bunch of links at the bottom of this post to help you out.
Is it really junk?
There’s two camps in the scientific community, one that believes it doesn’t do anything and another that believes it’s there for a purpose.

And you’re in the second camp?
Yes. It's true that sometimes organisms carry around excess DNA, but usually it is there for a purpose. Perhaps junk DNA has been coopted for a deeper purpose that we have yet to fully unravel.
It is possible that the extra DNA in our genome has an unknown deeper purpose but right now we have more than enough information to be confident that it's junk. You have to refute or discredit all the work that's been done in the past 40 years in order to be in the second camp.

I strongly suspect that Prof. Lau has not done his homework and he doesn't know the Five Things You Should Know if You Want to Participate in the Junk DNA Debate.

What possible "deep purpose" could this DNA have?
Maybe when junk DNA moves to the right place in our DNA, this could cause better or faster evolution. Maybe when junk genes interacts with the non-junk ones, it causes a mutation to occur so humans can better adapt to changes in the environment.
Most of the undergraduates who took my course could easily refute that argument. I'm guessing that undergraduates in biology at Brandeis aren't as smart. Or maybe they're just too complacent to challenge a professor?

We've got a serious problem here folks. There are scientists being hired at respectable universities who aren't keeping up with the scientific literature in their own field. How does this happen? Are there newly hired biology professors who don't understand evolution?

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]


Thursday, December 10, 2015

How many human protein-coding genes are essential for cell survival?

The human genome contains about 20,000 protein-coding genes and about 5,000 genes that specify functional RNAs. We would like to know how many of those genes are essential for the survival of an individual and for long-term survival of the species.

It would be almost as interesting to know how many are required for just survival of a particular cell. This set is the group of so-called "housekeeping genes." They are necessary for basic metabolic activity and basic cell structure. Some of these genes are the genes for ribosomal RNA, tRNAs, the RNAs involved in splicing, and many other types of RNA. Some of them are the protein-coding genes for RNA polymerase subunits, ribosomal proteins, enzymes of lipid metabolism, and many other enzymes.

The ability to knock out human genes using CRISPR technology has opened to door to testing for essential genes in tissue culture cells. The idea is to disrupt every gene and screen to see if it's required for cell viability in culture.

Three papers using this approach have appeared recently:
Blomen, V.A., Májek, P., Jae, L.T., Bigenzahn, J.W., Nieuwenhuis, J., Staring, J., Sacco, R., van Diemen, F.R., Olk, N., and Stukalov, A. (2015) Gene essentiality and synthetic lethality in haploid human cells. Science, 350:1092-1096. [doi: 10.1126/science.aac7557 ]

Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E. S., and Sabatini, D.M. (2015) Identification and characterization of essential genes in the human genome. Science, 350:1096-1101. [doi: 10.1126/science.aac7041]

Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., and Sun, S. (2015) High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163:1515-1526. [doi: 10.1016/j.cell.2015.11.015]
Each group identified between 1500 and 2000 protein-coding genes that are essential in their chosen cell lines.

One of the annoying things about all three papers is that they use the words "gene" and "protein-coding gene" as synonyms. The only genes they screened were protein-coding genes but the authors act as though that covers ALL genes. I hope they don't really believe that. I hope it's just sloppy thinking when they say that their 1800 essential "genes" represent 9.2% of all genes in the genome (Wang et al. 2015). What they meant is that they represent 9.2% of protein-coding genes.

By looking only at genes that are essential for cell survival, they are ignoring all those genes that are specifically required in other cell types. For example, they will not identify any of the genes for olfactory receptors or any of the genes for keratin or collagen. They won't detect any of the genes required for spermatogenesis or embryonic development.

What they should detect is all of the genes required in core metabolism.

The numbers seen too low to me so I looked for some specific examples.

The HSP70 gene family encodes the major heat shock protein of molecular weight 70,000. The protein functions as a chaperone to help fold other proteins. They are among the most highly conserved genes in all of biology and they are essential. The three genes for the normal cellular proteins are HSPA5 (Bip, the ER protein); HSPA8 (the cytoplasmic version); and HSPA9 (mitochondrial version). All three are essential in the Blomen et al. paper. Only HSPA5 and HSPA9 are essential in Hunt et al. (This is an error.) (I can't figure out how to identify essential genes in the Wang et al. paper.)

There are two inducible genes, HSPA1A and HSPA1B. These are the genes activated by heat shock and other forms of stress and they churn out a lot of HSP70 chaperone in order to save the cells. There are not essential genes in the Blomen et al. paper and they weren't tested in the Hunt et al. paper. This is an example of the kind of gene that will be missed in the screen because the cells were not stressed during the screening.

I really don't like these genomics papers because all they do is summarize the results in broad terms. I want to know about specific genes so I can see if the results conform to expectations.

I looked first at the genes encoding the enzymes for gluconeogenesis and glycolysis. The results are from the Blomen et al. paper. In the figure below, the genes names in RED are essential and the ones in blue are not.


As you can see, at least one of the genes for the six core enzymes is essential. But none of the other genes is essential. This is a surprise since I expect both pathways (gluconeogenesis and glycolysis) to be active and essential in those cells. Perhaps the cells can survive for a few days without making these enzymes. It means they can't take up glucose because one of the hexokinase enzymes should be essential.

These result suggest that the Blomen et al. study is overlooking some important essential genes.

Now let's look at the citric acid cycle. All of the enzymes should be essential.


That's very strange. It's hard to imagine that cells in culture can survive without any of the genes for the subunits of the pyruvate dehydrogenase complex or the subunits of the succinyl C0A synthetase complex. Or malate dehydrogenase, for that matter.

Something is wrong here. The study must be missing some important essential genes. I wish the authors had looked at some specific sets of genes and told us the results for well-known genes. That would allow us to evaluate the results. Perhaps this sort of thing isn't done when you are in "genomics" mode?

The "core fitness" protein-coding genes that were identified are more highly conserved than the other genes and they tend to be more highly expressed. They also show lower levels of variation within the human population. This is consistent with basic housekeeping features.

Each group identified several hundred unannotated genes in their core sample. These are genes with no known function (yet).

The results of the three studies do not overlap precisely but most of the essential genes were common to all three analyses.


Wednesday, November 25, 2015

Selfish genes and transposons

Back in 1980, the idea that large fractions of animal and plant genomes could be junk was quite controversial. Although the idea was consistent with the latest developments in population genetics, most scientists were unaware of these developments. They were looking for adaptive ways of explaining all the excess DNA in these genomes.

Some scientists were experts in modern evolutionary theory but still wanted to explain "junk DNA." Doolittle & Sapienza, and Orgel & Crick, published back-to-back papers in the April 17, 1980 issue of Nature. They explained junk DNA by claiming that most of it was due to the presence of "selfish" transposons that were being selected and preserved because they benefited their own replication and transmission to future generations. They have no effect on the fitness of the organism they inhabit. This is natural selection at a different level.

This prompted some responses in later editions of the journal and then responses to the responses.

Here's the complete series ...

Friday, November 20, 2015

The truth about ENCODE

A few months ago I highlighted a paper by Casane et al. (2015) where they said ...
In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.
How did we get to this stage where the most publicized result of papers published by leading scientists in the best journals turns out to be wrong, but hardly anyone knows it?

Back in September 2012, the ENCODE Consortium was preparing to publish dozens of papers on their analysis of the human genome. Most of the results were quite boring but that doesn't mean they were useless. The leaders of the Consortium must have been worried that science journalists would not give them the publicity they craved so they came up with a strategy and a publicity campaign to promote their work.

Their leader was Ewan Birney, a scientist with valuable skills as a herder of cats but little experience in evolutionary biology and the history of the junk DNA debate.

The ENCODE Consortium decided to add up all the transcription factor binding sites—spurious or not—and all the chromatin makers—whether or not they meant anything—and all the transcripts—even if they were junk. With a little judicious juggling of numbers they came up with the following summary of their results (Birney et al., 2012) ..
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
See What did the ENCODE Consortium say in 2012? for more details on what the ENCODE Consortium leaders said, and did, when their papers came out.

The bottom line is that these leaders knew exactly what they were doing and why. By saying they have assigned biochemical functions for 80% of the genome they knew that this would be the headline. They knew that journalists and publicists would interpret this to mean the end of junk DNA. Most of ENCODE leaders actually believed it.

That's exactly what happened ... aided and abetted by the ENCODE Consortium, the journals Nature and Science, and gullible science journalists all over the world. (Ryan Gregory has published a list of articles that appeared in the popular press: The ENCODE media hype machine..)

Almost immediately the knowledgeable scientists and science writers tried to expose this publicity campaign hype. The first criticisms appeared on various science blogs and this was followed by a series of papers in the published scientific literature. Ed Yong, an experienced science journalist, interviewed Ewan Birney and blogged about ENCODE on the first day. Yong reported the standard publicity hype that most of our genome is functional and this interpretation is confirmed by Ewan Birney and other senior scientists. Two days later, Ed Yong started adding updates to his blog posting after reading the blogs of many scientists including some who were well-recognized experts on genomes and evolution [ENCODE: the rough guide to the human genome].

Within a few days of publishing their results the ENCODE Consortium was coming under intense criticism from all sides. A few journalists, like John Timmer, recongized right away what the problem was ...
Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors.

This was more than a matter of semantics. Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). But you can't entirely blame the press in this case. They were egged on by the journals and university press offices that promoted the work—and, in some cases, the scientists themselves.


[Most of what you read was wrong: how press releases rewrote scientific history]
Nature may have begun to realize that it made a mistake in promoting the idea that most of our genome was functional. Two days after the papers appeared, Brendan Maher, a Feature Editor for Nature, tried to get the journal off the hook but only succeeded in making matters worse [see Brendan Maher Writes About the ENCODE/Junk DNA Publicity Fiasco].

Meanwhile, two private for-profit companies, illumina and Nature, team up to promote the ENCODE results. They even hire Tim Minchin to narrate it. This is what hype looks like ...


Soon articles began to appear in the scientific literature challenging the ENCODE Consortium's interpretation of function and explaining the difference between an effect—such as the binding of a transcription factor to a random piece of DNA—and a true biological function.

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Morange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

By March 2013—six months after publication of the ENCODE papers—some editors at Nature decided that they had better say something else [see Anonymous Nature Editors Respond to ENCODE Criticism]. Here's the closest thing to an apology that they have ever written ....
The debate over ENCODE’s definition of function retreads some old battles, dating back perhaps to geneticist Susumu Ohno’s coinage of the term junk DNA in the 1970s. The phrase has had a polarizing effect on the life-sciences community ever since, despite several revisions of its meaning. Indeed, many news reports and press releases describing ENCODE’s work claimed that by showing that most of the genome was ‘functional’, the project had killed the concept of junk DNA. This claim annoyed both those who thought it a premature obituary and those who considered it old news.

There is a valuable and genuine debate here. To define what, if anything, the billions of non-protein-coding base pairs in the human genome do, and how they affect cellular and system-level processes, remains an important, open and debatable question. Ironically, it is a question that the language of the current debate may detract from. As Ewan Birney, co-director of the ENCODE project, noted on his blog: “Hindsight is a cruel and wonderful thing, and probably we could have achieved the same thing without generating this unneeded, confusing discussion on what we meant and how we said it”.
Oops! The importance of junk DNA is still an "important, open and debatable question" in spite of what the video sponsored by Nature might imply.

(To this day, neither Nature nor Science have actually apologized for misleading the public about the ENCODE results. [see Science still doesn't get it ])

The ENCODE Consortium leaders responded in April 2014—eighteen months after their original papers were published.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

In that paper they acknowledge that there are multiple meanings of the word function and their choice of "biochemical" function may not have been the best choice ....
However, biochemical signatures are often a consequence of function, rather than causal. They are also not always deterministic evidence of function, but can occur stochastically.
This is exactly what many scientists have been telling them. Apparently they did not know this in September 2012.

They also include in their paper a section on "Case for Abundant Junk DNA." It summarizes the evidence for junk DNA, evidence that the ENCODE Consortium did not acknowledge in 2012 and certainly didn't refute.

In answer to the question, "What Fraction of the Human Genome Is Functional?" they now conclude that ENCODE hasn't answered that question and more work is needed. They now claim that the real value of ENCODE is to provide "high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associate with diverse molecular functions."
We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.
There you have it, straight from the horse's mouth. The ENCODE Consortium now believes that you should NOT interpret their results to mean that 80% of the genome is functional and therefore not junk DNA. There is good evidence for abundant junk DNA and the issue is still debatable.

I hope everyone pays attention and stops referring to the promotional hype saying that ENCODE has refuted junk DNA. That's not what the ENCODE Consortium leaders now say about their results.


Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023]

Different kinds of pseudogenes: Polymorphic pseudogenes

There are three main kinds of pseudogenes: processed pseudogenes, duplicated pseudogenes, and unitary pseudogenes [Different kinds of pseudogenes - are they really pseudogenes?].

There's one sub-category of pseudogenes that deserves mentioning. It's called "polymorphic pseudogenes." These are pseudogenes that have not become fixed in the genome so they exist as an allele along with the functional gene at the same locus. Some defective genes might be detrimental, representing loss-of-function alleles that compromise the survival of the organism. Lots of genes for genetic diseases fall into this category. That's not what we mean by polymorphism. The term usually applies to alleles that have reached substantial frequency in the population so that there's good reason to believe that all alleles are about equal with respect to natural selection.

Polymorphic pseudogenes can be examples of pseudogenes that are caught in the act of replacing the functional gene. This indicates that the functional gene is not under strong selection. For example, a newly formed processed pseudogene can be polymorphic at the insertion site and newly duplicated loci may have some alleles that are still functional and others that are inactive. The fixation of a pseudogene takes a long time.

Different kinds of pseudogenes: Unitary pseudogenes

The most common types of pseudogenes are processed pseudogenes and those derived from gene duplication events [duplicated pseudogenes].

The third type of pseudogene is the "unitary" pseudogene. Unitary pseudogenes are genes that have no parent gene. There is no functional gene in the genome that's related to the pseudogene.

Unitary psedogenes arise when a normally functional gene becomes inactivated by mutation and the loss of function is not detrimental to the organism. Thus, the mutated, inactive, gene can become fixed in the population by random genetic drift.

The classic example is the gene for L-glucono-γ-lactone oxidase (GULO), a key enzyme in the synthesis of vitamin C (L-ascorbate, ascorbic acid). This gene is functional in most vertebrate species because vitamin C is required as a cofactor in several metabolic reactions; notably, the processing of collagen [Vitamin C]. This gene has become inactive in primates so primates cannot synthesize Vitamin C and must obtain it from the food they eat.

A pseudogene can be found at the locus for the L-glucono-γ-lactone oxidase gene[GULOP = GULO Pseudogene]. It is a highly degenerative pseudogene with multiple mutations and deletions [Human GULOP Pseudogene]


This is a unitary pseudogene. Unitary pseudogenes are rare compared to processed pseudogenes and duplicated pseudogenes but they are distinct because they are not derived from an existing, functional, parent gene.

Note: Intelligent design creationists will go to great lengths to discredit junk DNA. They will even attempt to prove that the GULO pseudogene is actually functional. Jonathan Wells devoted an entire chapter in The Myth of Junk DNA to challenging the idea that the GULO pseudogene is actually a pseudogene. A few years ago, Jonathan McLatchie proposed a mechanism for creating a functional enzyme from the bits and pieces of the human GULOP pseudogene but that proved embarrasing and he retracted [How IDiots Would Activate the GULO Pseudogene] Although some scientists are skeptical about the functionality of some pseudogenes, they all accept the evidence showing that most psuedogenes are nonfunctional.


Different kinds of pseudgogenes: Duplicated pseudogenes

Of the three different kinds of pseudogenes, the easiest kind of pseudogene formation to understand is simple gene duplication followed by inactivation of one copy. [see: Processed pseudogenes for another type]

I've assumed, in the example shown below, that the gene duplication event happens by recombination between sister chromosomes when they are aligned during meiosis. That's not the only possibility but it's easy to understand.

These sorts of gene duplication events appear to be quite common judging from the frequency of copy number variations in complex genomes (Redon et al., 2006; MacDonald et al., 2013).


Wednesday, November 18, 2015

Different kinds of pseudogenes: Processed pseudogenes

Let's look at the formation of a "processed" pseudogene. They are called "processed" because they are derived from the mature RNA produced by the functional gene. These mature RNAs have been post-transcriptionally processed so the pseudogene resembles the RNA more closely than it resembles the parent gene.

This is most obvious in the case of processed pseudogenes derived from eukaryotic protein-coding genes so that's the example I'll describe first.

In the example below, I start with a simple, hypothetical, protein-coding gene consisting of two exons and a single intron. The gene is transcribed from a promoter (P) to produce the primary transcript containing the intron. This primary transcript is processed by splicing to remove the intron sequence and join up the exons into a single contiguous open reading frame that can be translated by the protein synthesis machinery (ribosomes plus factors etc.).1 [See RNA Splicing: Introns and Exons.]

Friday, November 13, 2015

Cornelius Hunter predicts that there's going to be function found for the vast majority of the genome according to the intelligent design paradigm

Listen to this Podcast where Casey Luskin interviews Dr. Cornelius Hunter.1 It's only 9 minutes long.

Dr. Cornelius Hunter on ENCODE and "Junk" DNA, Part 2

Here's part of the transcript.
Casey Luskin: ... and, as we all know, or many ID the Future listeners probably know, for years Darwinian theorists and evolutionary scientists have said that our genomes ought to be full of junk if evolution is true

.....

Casey Luskin: So, Dr. Hunter, you think, just for the record, that in the long term there is going to be function found for probably the vast majority of the genome and so, maybe, you might call it a prediction you would make coming out of an intelligent design paradigm. Is that correct?

Cornelius Hunter: Yes, that's correct Casey, I'll definitely go on the record on that. Not to say I have a hundred percent confidence and also I wanna be clear that from a metaphysical perspective, from my personal belief, I don't have a problem wherever it lands. It doesn't matter to me whether it's 10% or 90% or any where in between or 100% ... just from the scientific perspective and just from the history of science and the history of what we've found in biology, it really does look like it's gonna be closer to 100 than zero.

Casey Luskin: Okay, great, I always like it when people put clear and concrete predictions out there and and I think that's very helpful.

I predict that about 90% of our genome will turn out to be junk DNA—DNA with no function. I base my prediction on the scientific perspective and the history of what we've found in biology. That's interesting because Cornelius Hunter and I are apparently reaching opposite conclusions based on the same data.

I also love it when people make predictions. Will the intelligent design paradigm be falsified if I turn out to be right?

Does it sound to you that Cornelius Hunter personally doesn't care if the prediction coming out of the intelligent design paradigm is correct or not?


1. How come in the podcast they never refer to Dan Graur as Dr. Graur? Isn't that strange?

Monday, November 09, 2015

How many proteins do humans make?

There are several different kinds of genes. Some of them encode proteins, some of them specify abundant RNAs like tRNAs and ribosomal RNAs, some of them are responsible for making a variety of small catalytic RNAs, and some unknown fraction may specify regulatory RNAs (e.g. lncRNAs).

This jumble of different kinds of genes makes it difficult to estimate the total number of genes in the human genome. The current estimates are about 20,000 protein-coding genes and about 5,000 genes for functional RNAs.

Aside from the obvious highly conserved genes for ubiquitous RNAs (rRNA, tRNAs etc.), protein-coding genes are the easiest to recognize from looking at a genome sequence. If the protein is expressed in many different species then the exon sequences will be conserved and it's easy for a computer program to identify the gene. The tough part comes when the algorithm predicts a new protein-coding gene based on an open reading frame spanning several presumed exons. Is it a real gene?

Monday, November 02, 2015

The birth and death of salmon genes

Modern Salmonidae (salmon and its relatives) have genomes that show clear evidence of an ancient duplication event. Berthelot et al. (2014) sequenced the rainbow trout genome and constructed a phylogenetic tree of all teleost fish. The genome duplication event in the Salmonidae lineage can be dated to approximately 96 million years ago (96 ± 5.5 Mya).

This event provides an opportunity to track the fate of the duplicated protein-coding genes. How many of the original duplicates are left and what happened to them?

There were able to get reliable data on 9,040 of the original genes in the ancestral genome. (That's about one third of the estimated 31,000 genes in the genome of the original species.) Of those 9,040 genes, 4,728 (52%) are now single copy genes because one of the duplicated genes has been lost. Many of these original genes are still detectable as pseudogenes at the right position in the genome.

By combining these results with studies of more ancient genome duplications in the vertebrate lineage, it looks like the average rate of gene loss is about 170 genes per million years (Berthelot et al., 2004). It's likely that in the majority of cases one of the duplicates will eventually become inactivated by mutation and that allele will become fixed in the genome by random genetic drift. (Some early inactivation events may be selected.)

4,312 (42%) of the original duplications have been retained in the trout genome as a small family consisting of two paralogues. In some cases the two paralogues have diverged and in some cases they are expressed in different tissues or at different stages of development. This suggests that the two copies have evolved different functions.

However, most of the duplicated genes seem to be performing similar functions and it's likely that there is no selective pressure to retain two copies. There just hasn't been enough time to inactivate one copy.

The trout genome contains 241 ancient microRNA genes and 233 of them still have two copies, one from each of the duplicated genomes. The authors suggest that this is significant and it indicates that multiple copies on these microRNA genes are needed. I'm not sure if this is true since these genes are quite a bit smaller than the average protein-coding gene so they will take longer to inactivate by mutation.

In any case, the big picture provides us with lots of data on the birth of new genes by duplication and death of genes by pseudogene formation.


Berthelot, C., Brunet, F., Chalopin, D., Juanchich, A., Bernard, M., Noël, B., Bento, P., Da Silva, C., Labadie, K., and Alberti, A. (2014) The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature communications, 5:3657 April 22, 2014 [doi:10.1038/ncomms4657]

Sunday, November 01, 2015

More stupid hype about lncRNAs

I've just posted an article about a group of scientists at UCLA who claimed to have discovered 3,000 new genes in the human genome [3,000 new genes discovered in the human genome - dark matter revealed].

They did no such thing. What they discovered was about 3,000 previously unidentified transcripts expressed at very low levels in human B cells and T cells. They declared that these low-level transcripts are lncRNAs and they assumed that the complementary DNA sequences were genes. Their actual result identifies 3,000 bits of the genome that may or may not turn out to be genes. They are PUTATIVE genes.

None of that deterred Karen Ring who blogs at The Stem Cellar: The Official Blog of CIRM, California's Stem Cell Agency. Her post on this subject [UCLA Scientists Find 3000 New Genes in “Junk DNA” of Immune Stem Cells] begins with ...

3,000 new genes discovered in the human genome - dark matter revealed

Let's look a a recent paper published by a large group of medical researchers at the University of California, Los Angeles (USA). The paper was published online a few days ago (Oct. 26, 2015) in Nature Immunology.

The authors clam to have discoverd 3,000 previously unknown genes in the human genome.

The complete reference is ...

Wednesday, October 21, 2015

The quality of the modern scientific literature leaves much to be desired

Lately I've been reading a lot of papers on genomes and I've discovered some really exceptional papers that discuss the existing scientific literature and put their studies in proper context. Unfortunately, these are the exceptions, not the rule.

I've discovered many more authors who seem to be ignorant of the scientific literature and far too willing to rely of the opinions of others instead of investigating for themselves. Many of these authors seem to be completely unaware of controversy and debate in the fields they are writing about. They act, and write, as if there was only one point of view worth considering, theirs.

How does this happen? It seems to me that it can only happen if they find themselves in an environment where skepticism and critical thinking are suppressed. Otherwise, how do you explain the way they write their papers? Are there no colleagues, post-docs, or graduate students who looked at the manuscript and pointed out the problems? Are there no referees who raised questions?

Friday, October 16, 2015

Human mutation rates

I was excited when I saw the cover of the Sept. 25th (2015) issue of Science because I'm very interested in human mutation rates. I figured there would have to be an article that discussed current views on the number of new mutations per generation even though I was certain that the focus would be on the medical relevance of mutations. I was right. There was one article that discussed germline mutations and the overall mutation rate.

The article by Shendure and Akay (2015) is the only one that addresses human mutation rates in any meaningful way. They begin their review with ...
Despite the exquisite molecular mechanisms that have evolved to replicate and repair DNA with high fidelity, mutations happen. Each human is estimated to carry on average ~60 de novo point mutations (with considerable variability among individuals) that arose in the germline of their parents (1–4). Consequently, across all seven billion humans, about 1011 germline mutations—well in excess of the number of nucleotides in the human genome—occurred in just the last generation (5). Furthermore, the number of somatic mutations that arise during development and throughout the lifetime of each individual human is potentially staggering, with proliferative tissues such as the intestinal epithelium expected to harbor a mutation at nearly every genomic site in at least one cell by the time an individual reaches the age of 60 (6).