More Recent Comments

Saturday, December 16, 2023

What is the "dark matter of the genome"?

The phrase "dark matter of the genome" is used by scientists who are skeptical of junk DNA so they want to convey the impression that most of the genome consists of important DNA whose function is just waiting to be discovered. Not surprisingly, the term is often used by researchers who are looking for funding and investors to support their efforts to use the latest technology to discover this mysterious function that has eluded other scientists for over 50 years.

The term "dark matter" is often applied to the human genome but what does it mean? We get a clue from a BBC article published by David Cox last April: The mystery of the human genome's dark matter. He begins the article by saying,

Twenty years ago, an enormous scientific effort revealed that the human genome contains 20,000 protein-coding genes, but they account for just 2% of our DNA. The rest of was written off as junk – but we are now realising it has a crucial role to play.

That's a clue to where he's headed. He repeats the myth that all non-coding DNA was once declared to be junk DNA by the leading experts in molecular biology.1 Cox then explains what he means by "dark matter."

The remaining 98% of our DNA became known as dark matter, or the dark genome, a mysterious melee of letters with no obvious meaning or purpose. Initially some geneticists suggested that the dark genome was simply junk DNA or the rubbish bin of human evolution – the remnants of broken genes which had long ceased to be relevant.

For others though, it was always obvious that the dark genome was crucial to our understanding of humanity. "Evolution has absolutely no tolerance for junk," says Kári Stefánsson, chief executive of the Icelandic company deCODE genetics, which has sequenced more whole genomes than any other institution in the world. "There must be an evolutionary reason to maintain the size of the genome."

From statements like this, we can assume that people who use the term "dark matter" are referring mostly to non-coding DNA. I don't know who invented the phrase but it was promoted by the Harvard philosopher of biology Everlyn Fox Keller.2 She explained her view in a 2015 essay (see Evelyn Fox Keller (1936 - 2023) and junk DNA).

When the [Human Genome Project] was first launched, it was widely assumed that extragenic DNA was "junk" and need not be taken into account. And indeed, it was not. Ten years later, a new metaphor began to make its appearance. Instead of "junk," exteragenic DNA became the "dark matter of the genome," with the clear implication that its exploration promised discoveries that would revolutionize biology just as the study of the dark matter of the universe would had revolutionized cosmology.

This shift in metaphor—from junk DNA to dark matter—well captures the transformation in conceptual framework that is at the heart of my subject. It was neatly described in a 2003 article on "The Unseen Genome" in Scientific American, where the author, W. Wayt Gibbs, wrote, "Journals and conferences have been buzzing with the new evidence that contradicts conventional notions that genes, those sections of DNA that encode proteins, are the sole mainstay of heredity and the complete blueprint for all life. Much as dark matter influences the fate of galaxies, dark parts of the genome exert control over the development and the distinctive traits of all organisms, from bacteria to humans. The genome is home to many more actors than just the protein-coding genes." Of course, changes in conceptual framework do not occur overnight, nor do they proceed without controversy, and this case is no exception. The question of just how important non-protein-coding DNA is to development, evolution, or medical genetics remains under dispute. For biologists as for physicists, the term "dark matter" remains a placeholder for ignorance. Yet reports echoing, updating, and augmenting Gibbs's brief summary seem to be appearing in the literature with ever-increasing frequency.

Evelyn Fox Keller is correct when she says that the term "dark matter" is a placeholder for ignorance but not in the sense she means. She thinks that non-coding DNA may be full of undiscovered regulatory sequences waiting to be identified and, in that sense, dark matter is just DNA whose function is currently unknown. However, many of us think that we already know a great deal about that non-coding DNA and most of it is junk. In that sense, the term "dark matter" reflects the ignorance of those who use the term—they haven't done their homework.

"Dark matter" was also used extensively by Elizabeth Pennisi in the articles she wrote for Science. She wanted to convey the notion that the discovery of 'only' 20,000 genes in the human genome was shocking and, according to her, it must mean that there was a huge amount of unknown function in the human genome that made us special compared to other animals with the same number of genes. Here's how she explained it in 2010 in an article titled " Shining a light on the genome's' dark matter'" (Pennisi, 2010).

The scope of this “dark genome” became apparent in 2001, when the human genome was first published. Scientists expected to find as many as 100,000 genes packed into the 3 billion bases of human DNA; they were startled to learn that there were fewer than 35,000. (The current count is 21,000.) Protein-coding regions accounted for just 1.5% of the genome. Could the rest of our DNA really just be junk?

The deciphering of the mouse genome in 2002 showed that there must be more to the story. Mice and people turned out to share not only many genes but also vast stretches of noncoding DNA. To have been “conserved” throughout the 75 million years since the mouse and human lineages diverged, those regions were likely to be crucial to the organisms' survival.

This is another case where use of the term "dark matter" reveals the ignorance of the writer and not the mysterious genome. We know from decades of work that the non-coding fraction of the genome contains all kinds of functional DNA such as centromeres, regulatory sequences, non-coding genes, scaffold attachment regions, telomeres, and origins of replication. Together these functional elements account for about 4% of the genome. We know that only 10% of the human genome is conserved so that leaves only 6% with unknown functions. It's highly likely the rest (90%) is junk. None of this should have been a surprise when the human genome sequence was published in 2001 and it certainly shouldn't be a surprise to anyone in 2023. The only reason to use the term "dark matter" is to avoid mentioning junk DNA thus giving the impression that the junk DNA isn't junk after all; instead, it's mysterious dark matter hiding a huge amount of functional DNA just waiting to be discovered.

Image credit: The image is from a Cold Spring Harbor podcast on Dark Matter of the Genome. It's mostly an interview with two ENCODE researchers: "One scientist’s junk is apparently everyone’s treasure! They just haven’t realized it yet. . . In this episode of Base Pairs, we question the mythos that is “junk DNA” and explore how and why scientists are becoming enthralled by the mysterious non-coding portions of the genome."

1. He also makes the mistake of equating coding DNA in exons with genes. Protein-coding genes take up almost 40% of the human genome, not 2%. Most of those genes consist of introns.

2. She died recently (Sept. 22, 2023).

Pennisi, E. (2010) Shining a light on the genome's' dark matter', Science. 330: 1614. doi: 10.1126/science.330.6011.1614


Joe Felsenstein said...

But if it is considered "dark matter". think of all the billions of dollars of grants we all get to apply for, to find out what important function each and every base is carrying out. In the light of this delightful possibility, your skepticism is impeding science!

Larry Moran said...

@Joe Felsenstein: True, but a lot of science needs impeding.

Anonymous said...

I think there's a typo when you say that Keller wrote her essay in "2105".

John Harshman said...

Creationists are going in on this "dark matter' thing in a big way. Just today, I had a creationist explain to me that the amount of non-coding DNA in a genome is strongly correlated with the complexity of the organism.

Larry Moran said...

@ César: Thank-you.

Anonymous said...

Just out of curiosity where do you meet these creationists? I have yet to meet a creationist who follows any of these arguments

John Harshman said...

Anonymous: "Follows" is perhaps too strong. I'm assuming that some creationist web site, perhaps ENV, presented some garbled view of Mattick's dog's-ass plot and that the creationist is then parroting that. Creationists commonly mention ENCODE too. Maybe you need to meet a better class of creationist. But the interactions in question have occurred at Peaceful Science.