Sandwalk: Non-coding half of human genome unlocked

This is another story about press releases. In this case, it's an article published by ScienceDaily: Non-coding half of human genome unlocked with novel sequencing technique. It's almost a direct copy of a press release put out by Texas A&M University (Texas, USA): Texas A&M Biologists Unlock Non-Coding Half of Human Genome with Novel DNA Sequencing Technique.

Let's begin by looking at the actual paper (Aldrich and Maggert, 2014). Here's the abstract.

Heterochromatin is a significant component of the human genome and the genomes of most model organisms. Although heterochromatin is thought to be largely non-coding, it is clear that it plays an important role in chromosome structure and gene regulation. Despite a growing awareness of its functional significance, the repetitive sequences underlying some heterochromatin remain relatively uncharacterized. We have developed a real-time quantitative PCR-based method for quantifying simple repetitive satellite sequences and have used this technique to characterize the heterochromatic Y chromosome of Drosophila melanogaster. In this report, we validate the approach, identify previously unknown satellite sequence copy number polymorphisms in Y chromosomes from different geographic sources, and show that a defect in heterochromatin formation can induce similar copy number polymorphisms in a laboratory strain. These findings provide a simple method to investigate the dynamic nature of repetitive sequences and characterize conditions which might give rise to long-lasting alterations in DNA sequence.

The authors looked at highly repetitive DNA on the Drosophila melanogaster Y chromosome. Most of this DNA consists of multiple tandem copies of simple sequences like AAGAG, AATAT, or AAGAGAG. The number of copies in each block ranges from dozens to hundreds or even thousands. A lot of this DNA isn't included in published genomes because it's very difficult to determine the exact number of copies.

Aldrich and Maggert developed a PCR technique that allows them to estimate the number of copies of each repetitive sequence in a block. They discovered that different strains of Drosophila melanogaster have different numbers of copies of repetitive sequences. This is not news—it's exactly what you expect. But it's nice to have better data.

Now let's look at the press release on ScienceDaily.

Non-coding half of human genome unlocked with novel sequencing technique

An obscure swatch of human DNA once thought to be nothing more than biological trash may actually offer a treasure trove of insight into complex genetic-related diseases such as cancer and diabetes, thanks to a novel sequencing technique developed by biologists at Texas A&M University.

The game-changing discovery was part of a study led by Texas A&M biology doctoral candidate John C. Aldrich and Dr. Keith A. Maggert, an associate professor in the Department of Biology, to measure variation in heterochromatin. This mysterious, tightly packed section of the vast, non-coding section of the human genome, widely dismissed by geneticists as "junk," previously was thought by scientists to have no discernable function at all.

The first thing you notice is that the press release talks about the human genome but the study focused on the small Y chromosome of Drosophila. In fairness, the abstract of the paper also mentions the human genome.

The second thing you might notice is that the repetitive sequences are dismissed as "junk" even though we've known for decades that much of it is part of the centromere and therefore required for chromsome segregation. That's not junk.

The third thing you notice is that there's no connection between what's published in the paper (on the Drosophila Y chromosome) and human diseases such as cancer and diabetes.

In the course of his otherwise routine analysis of DNA in fruit flies, Aldrich was able to monitor dynamics of the heterochromatic sequence by modifying a technique called quantitative polymerase chain reaction (QPCR), a process used to amplify specific DNA sequences from a relatively small amount of starting material. He then added a fluorescent dye, allowing him to monitor the fruit-fly DNA changes and to observe any variations.

Aldrich's findings, published today in the online edition of the journal PLOS ONE, showed that differences in the heterochromatin exist, confirming that the junk DNA is not stagnant as researchers originally had believed and that mutations which could affect other parts of the genome are capable of occurring.

"We know that there is hidden variation there, like disease proclivities or things that are evolutionarily important, but we never knew how to study it," Maggert said. "We couldn't even do the simplest things because we didn't know if there was a little DNA or a lot of it.

"This work opens up the other non-coding half of the genome."

It's true that the authors developed a clever technique to measure the number of repeats. But even though the block of sequence has a function, the exact number of copies is not fixed. That much has been known for a long time. If some of it is junk then it follows that during DNA replication you will invariably get deletions and duplications. Different populations will have variable numbers of repeats, although there's probably a minimum number that's required to maintain fitness.

This is what you expect and this is what they found. The press release makes it seem as though junk DNA was supposed to be "stagnant."

Maggert is quoted as saying, "This work opens up the other non-coding half of the genome." Highly repetitive DNA makes up about 1% of the human genome [What's in Your Genome]. By my rough calculation, that's 50 times less than half of the genome. I don't know what he means by "opens up."

Maggert explains that chromosomes are located in the nuclei of all human cells, and the DNA material in these chromosomes is made up of coding and non-coding regions. The coding regions, known as genes, contain the information necessary for a cell to make proteins, but far less is known about the non-coding regions, beyond the fact that they are not directly related to making proteins.

"Believe it or not, people still get into arguments over the definition of a gene," Maggert said. "In my opinion, there are about 30,000 protein-coding genes. The rest of the DNA -- greater than 90 percent -- either controls those genes and therefore is technically part of them, or is within this mush that we study and, thanks to John, can now measure. The heterochromatin that we study definitely has effects, but it's not possible to think of it as discrete genes. So, we prefer to think of it as 30,000 protein-coding genes plus this one big, complex one that can orchestrate the other 30,000."

I leave it to readers to draw their own conclusions.

In my opinion there are about 19,000 protein-encoding genes [How many genes do we have and what happened to the orphans?]. There may be as many as 10,000 other genes but Maggert seems to DEFINE genes as protein-encoding so those don't count. I agree that there's no perfect definition of a gene but most biologists accept the existence of genes that specify functional RNAs and most biologists don't include all regulatory sequences in their definition of a gene [see What Is a Gene?].

In light of the ENCODE publicity hype disaster and the subsequent debate, I'm surprised that a seemingly knowledgeable scientist would claim that 98% of the genome is required to "orchestrate" the known genes. We know that such a claim is almost certainly false.

Although other methods of measuring DNA are technically available, Aldrich notes that, as of yet, none has proven to be as cost-effective nor time-efficient as his modified-QPCR-fluorescence technique.

"There's some sequencing technology that can also be used to do this, but it costs tens of thousands of dollars," Aldrich said. "This enables us to answer a very specific question right here in the lab."

The uncharted genome sequences have been a point of contention in scientific circles for more than a decade, according to Maggert, a Texas A&M faculty member since 2004. It had long been believed that the human genome -- the blueprint for humanity, individually and as a whole -- would be packed with complex genes with the potential to answer some of the most pressing questions in medical biology.

Who were these scientists that believed the human genome was "packed with complex genes"? And what does this have to do with the Drosophila Y chromosome?

When human DNA was finally sequenced with the completion of the Human Genome Project in 2003, he says that perception changed. Based on those initial reports, researchers determined that only two percent of the genome (about 21,000 genes) represented coding DNA. Since then, numerous other studies have emerged debating the functionality, or lack thereof, of non-coding, so-called "junk DNA."

It's 2014. The debate is practically over. The three most important conclusions are:

Anyone who continues to refer to "non-coding" DNA as equivalent to "junk" DNA doesn't deserve to be taken seriously.
Anyone who still thinks that the small number of genes was a surprise to the experts hasn't been keeping up. [False History and the Number of Genes] [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome]
Most of our genome is junk.

Now, thanks to Aldrich's and Maggert's investigation of heterochromatin, the groundwork has been laid to study the rest of the genome. Once all of it is understood, scientists may finally find the root causes and possibly treatments for many genetic ailments.

"There is so much talk about understanding the connection between genetics and disease and finding personalized therapies," Maggert said. "However, this topic is incomplete unless biologists can look at the entire genome. We still can't -- yet -- but at least now, we're a step closer."

I think it's time to stop issuing silly press releases that have nothing to do with the paper that's published. Usually we can blame the press office of the university but in this case the lead author is complicit. He should be ashamed of this performance.

John C. Aldrich, Keith A. Maggert. Simple Quantitative PCR Approach to Reveal Naturally Occurring and Mutation-Induced Repetitive Sequence Variation on the Drosophila Y Chromosome. PLoS ONE, 2014; 9 (10): e109906 doi: 10.1371/journal.pone.0109906]

15 comments:

Ian BosdetFriday, October 10, 2014 6:26:00 PM
Since one cannot infer any positional information from the qPCR data, I fail to see how this is a new "sequencing" method. But maybe I'm just being overly picky about semantics.

Also, "blueprint' is a facile description for the genome but it's not very accurate, as I think Martin describes nicely here: http://blogs.scientificamerican.com/sa-visual/2014/08/19/a-monkeys-blueprint/
AnonymousFriday, October 10, 2014 10:06:00 PM
I know very little about junk DNA... I admit it... but I'm willing to bet few bucks that gradually the so called junk DNA will be shrinking to the point that some people will be shamed to a point that they will have to retire....
MarcoliSaturday, October 11, 2014 9:33:00 AM
I am continually conflicted about Science Daily. I use it as my browser home page b/c it provides a quick way to scan the latest science news. But I sure wish their people were less gullible about posting hype like this.
DiogenesSaturday, October 11, 2014 10:53:00 AM
Should scientists boycott universities whose press offices issue press releases full of bullshit?

I'm serious. There must be some kind of accountability to stop press release sociopaths.
Rolf AalbergMonday, October 13, 2014 5:36:00 PM
Quest, always eager to jump on any streetcar to *ID'esire"".

Friday, October 10, 2014

Non-coding half of human genome unlocked

15 comments: