The misinformation covers all aspects of science but my particular bugaboos are evolution, genomes, and junk DNA.
I'm going to quote the first few paragraphs from an article on the Knowable Magazine website. It seems to be associated with Annual Reviews and it certainly looks like it should be a credible source of science information.
The article is The silent majority: RNAs that don’t make proteins. The author is Christina Szalinski and here's how she describes herself on her website.
I know science.
I became a science writer in 2013 after finishing my PhD in cell biology at the University of Pittsburgh. So when it comes to writing, I can shake out the molecular tangles, unravel the cellular threads, and wade through the formidable details of scientific studies.
Is it wrong to specifically identify science writers who are spreading misinformation? Is it cruel or mean to imply that they don't understand their subject?
Do other science writers and their organizations have any obligation to police their own discipline to ensure scientific accuracy?
Does anybody have any good ideas on how to clean up this mess?
Here's an excerpt from the article. I don't think I need to explain what's wrong.
When scientists first cracked the genetic code, they expected a simple story: DNA makes RNA, and that RNA, known as messenger RNA, makes proteins. Proteins would do all the important work — building tissues, fighting infections, digesting food.
But when the DNA of our genome was finally sequenced, researchers encountered a head-scratcher: The 20,000-plus genes that carry instructions for making our proteins account for less than 2 percent of our DNA. What was the rest of it good for?
For years, the remaining 98 percent was dismissed as “junk DNA” — evolutionary debris, filler. But as sequencing technology improved, a startling picture emerged. Our cells were busy making RNA copies of all those expanses, not just making messenger RNA — or mRNA — from the protein-coding genes. They were churning out vast quantities of RNA molecules with no known purpose.
The question became: Why would cells waste so much energy on copying that junk?
Today, however, the importance of this non-coding RNA — the catchall term for RNA molecules that don’t carry instructions for proteins — is undeniable. Non-coding RNAs turn out to regulate everything from embryonic development to immune responses to brain function. They help determine which genes get turned on and off, and when. They can promote cancer or suppress it.
I contacted the author last week to warn her that I was about to publish this post. I asked if she wished to comment or to provide the source of her information on the history of the field. I did not get a reply.
The problem with this kind of description is that it misrepresents the way science is done. Most scientific models are due to slow and steady, incremental advances building on previous studies. That kind of science is (usually) self-correcting—when new information becomes available, the old models are revised.
The picture that is being presented to the general public is that old scientists were pretty stupid because they thought there was only one kind of gene (protein coding) and that everything else in the genome (98%) had to be junk. According to that false history, the old fuddy-duddies were shown to be totally wrong when the human genome was sequenced and thousands of non-coding genes were discovered for the first time. That disproves junk DNA according to the false history.
Is there a way of writing the true history in a way that's accessible to the general public? I don't know but I thought I would give it a try in order to try and show modern science writers how it coould be done.
It's not easy. Read my attempt below and let me know if it works.
Scientists were actively working out the functions of DNA back in the 1950s and 1960s. By the mid-1960s they had discovered two kinds of genes. The majority encoded proteins but there were also non-coding genes that specified important RNAs such as ribosomal RNA (rRNA) and transfer RNA (tRNA) that were used in protein synthesis.
Scientists also established that DNA contained regulatory elements that controlled the expression of those two types of genes. Other functional DNA elements were also identified at this time.
Most of this work was done in bacteria and their viruses where genes took up a very large percentage of the DNA in their chromosomes. However, it soon became apparent that this was not the case in humans where the coding regions of the protein-coding genes seemed to account for only 2% of the genome. (The genome is the total amount of DNA in all chromosomes.) Other functional elements, such as non-coding genes and regulatory sequences only accounted for a bit more of the genome.
This gave rise to a model developed by the leading experts of the time, including several Nobel Laureates. They proposed that only 10% of the human genome is functional and 90% is junk DNA. Based on a lot of experimental data, they estimated that there were about 30,000 genes in the human genome.
Additional non-coding genes specifying regulatory RNAs were identified at this time (early 1970s) but the biggest advance in this area ocurred in the 1980s with the discovery of a host of genes specifying various new RNAs. Some of these new non-coding genes specified RNAs that acted like protein enzymes to catalyze biochemical reactions. Others were involved in regulating gene expression and still others were structural components of large cellular complexes.
These results, and others from the 1990s, raised the number of non-coding genes in humans to as many as several thousand but they still only accounted for a fraction of the total number of protein-coding genes.
The first draft of the human genome was published 25 years ago and it confirmed the model developed more than 50 years ago. There were about 30,000 genes, just as the experts had predicted, and most of the human genome was junk.
Subsequent work on identifying features of the human genome have, by and large, confirmed this model but there are scientists who are skeptical.
Most of the human genome is transcribed into RNAs—a fact that was known 50 years ago—but many of the leading experts concluded that most of those RNAs were probably junk RNA of various sorts. The idea here is that the human genome is very messy and it gives rise to lots of spurious, accidental RNAs that are not biologically relevant. Most of those RNAs are present in small amounts and they are rapidly degraded. They are not conserved in our closest relatives. (Sequence conservation is a good indication of function and lack of sequence conservation is a good indication of junk.)
The skeptics, on the other hand, argue that most of those RNAs have a function and there are far more non-coding genes than protein-coding genes. The debate continues to this day.
1. And, unfortutnately, in the legitimate scientific literature.

Larry, you wrote "Most of the human genome is transcribed into RNAs—a fact that was known 50 years ago"
ReplyDeleteHow did they know 25 years before the completion of the Human genome sequence that most of the human genome was transcribed?
The Human Genome Project did not establish transciption levels across the genome, it was a sequencing not a gene expression project. Not sure if they knew that the human genome was pervasively transcribed 50 years ago, but perhaps they had established enough knowledge of the basic mechanisms of eukaryotic transcription to confidently predict that outcome.
DeleteI would put greater emphasis on how we know it's junk, explaining what conservation is and how we recognize it. That acts to defuse the "dark matter" trope. You might also mention that junk can turn into functional elements through mutation, and that this has happened to a few members of various classes of junk, but that doesn't mean that the rest of the class is functional.
ReplyDeleteQuestion for anyone: do we know the functions of any or even most UCEs?
Scientists in the late 1960s isolated mRNA and hybridized it to genomic DNA. They learned that mRNA sequences only covered about 2% of human DNA leading to the idea that human protein coding genes represented only a small fraction of the genome.
ReplyDeleteWhen they purified total cellular RNA, then known as heterogeneous nuclear RNA or hnRNA, it covered a majority of the genome, including a lot of repetitive DNA. There were strong suggestions that mRNA was derived from longer transcripts in the hnRNA fraction and this was confirmed by the discovery of introns and splicing in the mid-1970s.
All biochemists and molecular biologists should know this.
Majority-wise there is some truth in The silent majority: RNAs that don’t make proteins if you separate RNAs on a gel. 18S and 28S are so abundant that they are visible as bands whereas mRNAs just form some background smear. One may argue that rRNAs are heavily involved in making protein, though.
ReplyDeleteI agree that it’s long past the time when we should have moved on from the “2% coding plus 98% junk – but aha, no it isn’t!” narrative. That’s why I prefer the way your history starts Larry. But I don’t think it is free from misrepresentation either. It won’t do to suggest that the consensus, pre-HGP, was that we have 30,000 genes. This is cherry-picking. There was a clear message sent to the public from the research community that the commonly expected gene count was more like 50,000-100,000. The HGP itself clearly stated at the outset that there were an estimated 100,000 human genes. You might say “Yes, but people who really knew their stuff said otherwise.” But it would be irresponsible science writing that failed to convey the common view within the HGP and more broadly.
ReplyDeleteCan you give a reference for where it was said 50 years ago that most of the human genome is transcribed into RNAs? I’m puzzled about how that could be known before extensive sequencing was possible. I am not suggesting there wasn’t the evidence you imply, but I’m not sure where it would be found. In your responses above you say that it was widely known by the 1970s that the amount of hnRNA considerably exceeds the among of mRNA. That is of course true. Estimates of that excess seem to vary quite a bit, but a figure of 10% is commonly given for how much hnRNA was thought to be converted to mRNA. In any event, that history is generally focused on the way that the excess of hnRNA was resolved with the discovery of splicing. Given that of course pre-mRNAs are typically about 2-6 times bigger than mRNAs, it’s not clear that all that hnRNA could be accounted for with introns and other excised parts of coding regions – but it seems that quite a lot of it can be. It certainly seems hard to find evidence of extensive discussion about there being masses of ncRNA before the late 1980s, and even less so to find suggestions that ncRNA (beyond tRNA & rRNA) might be functional – that is surely why those who discovered the first long and short functional ncRNAs were surprised by it. This is not simply my view – it is the view of other experts who know the history.
And there are arguments for why we should not necessarily expect strong sequence conservation in some functional ncRNAs. You might not be persuaded by those arguments, but it is misleading to ignore them. (1/2)
So I think that the later part of your history is misleading. Perhaps it was the history you experienced yourself, but the narrative of “oh it’s all pretty much as we thought all along” does not fit with that told by many people who did the pioneering work.
ReplyDeleteAnd this is the point. If the general narrative within science is that, for example, we have fewer coding genes than widely expected, or that people are surprised by the extent of functional ncRNAs, it is right for science writers to say that. Writers are not spreading misinformation or being ignorant just because they don’t tell the particular story that you believe.
I don’t know Christina Szalinksi, and I’d not have told the RNA story in quite the way she did, but you do her a disservice. As for science writers “policing their own discipline”: you’re making a meal out of this. Sometimes science writing gets things wrong – sometimes in little ways, sometimes in big ways. That’s not ideal, but it is inevitable. When it happens, it’s important to point out the errors. But this isn’t a “mess” – it’s just how things are for all nonfiction writing, everywhere and always. That includes the scientific literature. You have a little footnote saying “oh yes, and sometimes scientists get things wrong”, but often the things you object to are reporters reporting accurately and in good faith what other scientists have told them. And often that’s just because there are disagreements in a field, not because science writers (or scientists) are churning out misinformation.
The fact is, whenever you yourself read science writing outside your field, you are relying on the writer having fairly portrayed the prevailing views and the disagreements. You don’t want them to have decided that this scientist is the one who’s right and all those others must be wrong. But that is what you seem to want in areas on which you have strong views. In short, while I recognize that you genuinely think the history you have presented is the correct one, I really don’t think you would want science writers to be presenting such a partisan view in other fields as you seem to want them to do in this one. (2/2)
@Philip Ball: The proper narrative is that a number of knowledgeable scientists who studied the problem back in the 1960s and 1970s predicted that there would be about 30,000 genes.
ReplyDeleteThere might have been other scientists who speculated about larger numbers but they did not present or discuss the evidence for such speculations.
Along comes Wally Gilbert who came up with a back-of-the-envelope estimate of 100,000 genes at the beginning of the human genome project. He estimated that a typical gene was 30,000 bp (he knew the proper definition of a gene and he knew about introns) and he made the assumption that the human genome was packed full of genes. (3 billion/30,000 = 100,000).
But the idea that lots of the human genome is junk was also widely discussed so anyone with a brain knew that this was almost certainly a huge overestimate. Any science writer who could do a bit of critical thinking should have seen that. (And any science writer who could do a bit of critical thinking, or read my textbook, would know the difference between "gene" and "coding DNA.")
That big estimate got a lot of publicity even though most of us knew that it was an inflated value. It's okay for science writers to report that there was a controversy but the data pointed to a number closer to 30,000.
It would be even better if good science writers were able to admit that they might not have been as skeptical as they should have been.
In your book on page 81 you write, "What of that 2 percent of functional DNA though? Most geneticists estimated that it contained around fifty thousand to a hundred thousand genes." When the actual number came out to be only 30,000 genes, as the experts had predicted, you write on page 82, "This overestimate, by a factor of three or more, of the number of genes in the human genome is often presented as an amusing example of how wrong the "experts" can be."
I think you could have done a much better job of explaining the real history.
@Philip Ball: I might be sympathetic to science writers who didn't do their homework and were surprised by the "low" number of genes in 2001.
ReplyDeleteThat was 25 years ago. There's no excuse today for not knowing the proper history.
For example, you say on page 82 that, "Crick's definition of a gene as part of DNA that encodes a protein structure is no longer adequate, for reasons that will become more clear in the next two chapters." Do you really mean to imply that Crick didn't know about ribosomal RNA genes, tRNA genes, and introns?
@Philip Ball: I still have my copies of Benjamin Lewin's early textbooks - the ones that became "Genes" in later editions. The 1980 version has an extensive summary of all the Rot data from the previous 10 years (hybridize RNA to DNA and calculate the amount of DNA that is double-stranded).
ReplyDeleteBased on the amount of mRNA he concludes on page 718, "The general conclusion suggested by these results is that the somatic cells of higher eukaryotes, or individual cell lines perpetuated in culture, have of the order of 10,000-20,000 active structural genes."
Then he looked at the data of total RNA and published lots of tables showing that most of the genome of complex eukaryotes hybridized to these RNAs. In other words, most of the genome is transcribed.
I would hope that any science writer who wanted to get the history right would have consulted the early textbooks and the early literature on those types of experiments.
“Anyone with a brain knew that this was almost certainly a huge overestimate… Any science writer who could do a bit of critical thinking should have seen that.” This is where you undermine yourself Larry. The official forward-looking report of the HGP in 1990 stated baldly that there are an estimated 100,000 human genes. Are you saying that all the scientists involved in preparing that report were stupid? I suppose you must be, if you think they “had no brain”. And to suppose that a science writer is going to figure out that all those experts who made such suggestions at that time were incompetent, so that their numbers should be disregarded, is absurd. Many researchers in the field don’t even do that. To conceal that this is what the public were told at the time, meanwhile, would be bad science reporting.
ReplyDeleteYou quote my book in misleading ways. When I said “This overestimate, by a factor of three or more, of the number of genes in the human genome is often presented as an amusing example of how wrong the "experts" can be”, I went on to say that I don’t consider that this is the best way to frame it. (All the same, many people who can reasonably be considered experts (you really think Wally Gilbert could not?) evidently were wrong.)
And my comment about Crick clearly refers back to an earlier comment from him in 1958 that I quoted. So no, he really didn’t know about introns (or for that matter regulatory sequences) when he said it, of course. And the quote itself makes it clear that Crick was stating what he considered a generality, not an absolute, blanket truth. The question is then why you omitted that context.
As for the “pervasive transcription” question, I had previously read Lewin’s review papers from the 70s, which did not push this idea, and have previously discussed such matters with someone who works in this field, who cites Lewin’s work and others from that time (late 70s) as “providing the idea that transcriptional output in the nucleus is broader than the set of translated messages” and leading to “conjectures that dual stranded RNAs might be used as regulators of sense strand genes.” He adds “They did not have a sense of long non-coding RNAs (lncRNAs) and how much of the genome was transcribed. They [merely] concluded that the areas transcribed was larger than just the coding gene regions.” Now, doubtless one could get into a detailed debate about how convincing any such evidence was of truly pervasive transcription, which would surely get into technical issues that I certainly lack the expertise to assess. But this is the point. The idea that science writers writing brief articles for popular outlets should not take on trust what specialists tell them but should (and should be able to) reach back themselves into the technical literature of 50 years ago to work out if what they are being told is true is just silly and shows no awareness of what that business is about. As I say, there is certainly a lot more nuance than is conveyed in Szalinski’s piece, but it is utterly unreasonable to portray her as a no-nothing who is spreading misinformation on the basis of an issue like this one.
ReplyDelete@Philip Ball: You know full well that there were many opponents of the Human Genome Project back in 1990. You know full well (or you should know) that many of those opponents thought that it was a waste of time to sequence a lot of junk DNA.
ReplyDeleteIf the human genome was full of junk DNA then obviously Gilbert's calculation was incorrect because genes do not take up the entire genome.
You also know (or should know) that it was common knowledge that more than half of the human genome consisted of repetitive DNA of various sorts and it was highly unlikely that all this repetitive DNA was located within genes (introns). Back then we already knew about centromeres, telomeres, origins of replication, and regulatory sequences so anyone with a brain knew that the genome was not chock full of genes.
The National Human Genome Research Institute (NIH) has published some of the criticism of the HGP from 1990 [The Human Genome Project is simply a bad idea]. Here are some quotes from some of the letters.
"Never mind that 95% of the DNA doesn't code for proteins and is thought by many, including some of its advocates, to be 'junk.'"
"Most of the genome (about 95%) doesn't encode proteins and is junk as far as we can tell."
"The HGP is projected to cost at least $3,000,000,000 over 15 years. Of that amount, no less than $600,000 is expected to be required for computer databases to computer-warehouse sequences most of which will be those of so-called "junk" DNA!"
"It is well-known that about 95% of the genetic material in the human genome is basically 'filler' and a total sequencing of the human genome would involve a great deal of wasted time and effort."
Philip, are you telling me that you were completely unaware of those criticisms back then?
Here's the actual statement from the initial HDP proposal, "The human genome consists of 50,000 to 100,000 genes located on 23 pairs of chromosomes."
Anyone seriously interested in writing about the history of the human genome might want to say that the media picked up on the upper boundary of this estimate and ran with it but there were plenty of knowledgeable experts who thought there were closer to 30,000 genes.
In your book, you could have taken the opportunity to distinguish between the hype and the scientific reality. That would have been an important contribution to correcting the misinformation that was spread by the media back then, and now.
That is not the path you choose and now you are trying to defend it while at the same time posing as an expert who deeply understands the science.
You said, "The official forward-looking report of the HGP in 1990 stated baldly that there are an estimated 100,000 human genes. Are you saying that all the scientists involved in preparing that report were stupid? I suppose you must be, if you think they 'had no brain'."
I know some of the people who wrote that proposal and I know that they are not stupid. I also know they thought the genome was mostly junk and probably had less than 50,000 genes. This was a political document designed to convince Congress that it was worth funding the project. Hype was as normal back then as it is now.
What I'm saying is that some of the best science writers were able to see past the hype and recognize the real science but there weren't very many excellent science writers back then. Still aren't.
.
Philip Ball says, "The idea that science writers writing brief articles for popular outlets should not take on trust what specialists tell them but should (and should be able to) reach back themselves into the technical literature of 50 years ago to work out if what they are being told is true is just silly and shows no awareness of what that business is about."
ReplyDeleteI'm a big fan of Carl Zimmer. He makes a point of emphasizing that science writers should be skeptical of what "experts" are telling them. They should make sure to consult a wide variety of "experts" in order to get a true perspective on the subject. In particular, they should definitely consult experts who disagree with the ones you first encounter.
I assume, therefore, that it is not necessarily the job of everyday science writers to delve into the old literature but it is their job to at least read Wikipedia and to consult experts on the other side of the controversy. It is certainly their job to be aware of the fact that there IS a controversy.
There's a bigger onus on science writers who publish a book. They don't have the same excuses as science writers trying to meet a deadline for a popular magazine. When you write a book you better know your stuff and do the research and that means finding out and reporting the actual science instead of the hype.
In your case, there are dozens of papers in the scientific literature that challenge your scientific "facts" and opinions but many of those references don't appear in your reference list. You don't mention the views of those experts. Isn't that strange?
OK, so we’re passing over the issue of you misrepresenting what I said in my book? So be it.
ReplyDelete“Philip, are you telling me that you were completely unaware of those criticisms back then?” Well, back then (in 1990) I was two years out of completing a PhD in theoretical physics and was an editor for physical sciences at Nature, so while I can’t be sure what I knew then, I doubt if I was fully up to speed on the debates around the HGP. But I certainly knew about such objections to the HGP many years ago, and in fact have myself written several somewhat sceptical articles about its narratives and motivations.
It seems that the only reason you don’t in fact think the scientists behind the HGP were brainless idiots is that you think they were liars. (Now at least I can see why you kept asserting that I have such a low opinion of biologists, despite my insistence that this could not be further from the truth: it’s a textbook case of projection!) And that it’s the science writers who are idiots for not realising they are liars. (Again, I’ve been criticising the hype of the HGP for years, but whatever.) I’m not sure now what your complaint is – that science writers didn’t divine that all the statements these scientists were making on the number of genes were not what they really thought? (But of course this wasn’t just HGP hype, as you well know. I know you disagree with Steven Salzberg, but he cites some of the other sources for such figures here: https://pmc.ncbi.nlm.nih.gov/articles/PMC2898077/).
“It is the job of everyday science writers to consult experts on the other side of the controversy [and] to be aware of the fact that there IS a controversy.” Yes, we agree on that! And so, when talking about pre-HGP estimates of gene tallies, we should ideally note that, while estimates in the 50,000-100,000 range were pretty normal by the mid-1990s, some researchers put the number lower. That’s why I mentioned this fact in my book – something that folks reading your comments might be surprised to hear (more misrepresentation from which I guess I’ll be expected to move on). I did also express scepticism that we should privilege those lower estimates. I see no compelling reason to think that we should. And isn’t this what you want science writers to do – to read widely on the topic, be aware of different opinions, and form their own view? Or is that only OK when that view concurs with yours?
ReplyDeleteI too am a huge fan of Carl Zimmer. He is one of the best science writers in the business. When he says that science writers should be sceptical of what "experts" are telling them and should make sure to consult a wide variety of "experts" in order to get a true perspective on the subject, he is absolutely right. And that’s science writing 101. To clarify what I meant, this is certainly what we should do when for example writing about a new discovery or finding – not to simply trust the authors of the paper, but to get a range of views. And as you know, I myself was aware of (and mentioned) a range of views both on the gene count and on the ENCODE results that you also dispute, so again I’m not too sure what your point is. (Rest assured, by the way, that I take advice from others on the claims you’ve made here and don’t take your word for them.) All I wanted to say in this context is that of course we can’t be expected to do a deep dive on every single thing we read as background for the issue we’re writing about – I know and you know that scientists don’t generally do that either. So if Christina Szalinski saw the figure of 50,000-100,000 genes mentioned in several scientific articles, and perhaps in the HGP position paper of 1990, I’m not sure it was such an egregious mistake that she assumed it was a reliable estimate. It would have been preferable if she’d known about the debate, but it’s not fair to accuse her of spreading misinformation in this context.
(Incidentally, I was amused at your determination to rehabilitate Carl after your disappointment at his article in the NYT on AlphaGenome – not least because what you were evidently hoping for from him was much closer to what I wrote on the topic. But we all like to maintain our narratives, I guess.)
ReplyDelete“In your case, there are dozens of papers in the scientific literature that challenge your scientific "facts" and opinions but many of those references don't appear in your reference list. Isn't that strange?” What’s strange is to expect dozens of references for a remark that occupies two or so sentences in the book. If that’s the ratio you’re looking for, the bibliography would be way longer than the text. I mention the disagreements – I’m not going to exhaustively cite the literature on the matter.
I have never “posed as an expert”. It’s a frankly rather peculiar thing to say, albeit perhaps revealing.
Larry, I’ll continue to look at your blog from time to time in ny event, because I find it points to some very useful things, and because I even sometimes agree with what you say.