Monday, November 21, 2022

How not to write a Nature abstract

A friend recently posted a figure on Facebook that instructs authors in the correct way to prepare a summary paragraph (abstract) for publication in Nature. It uses a specific example and the advice is excellent [How to construct a Nature summary paragraph].

I thought it might be fun to annotate a different example so I randomly selected a paper on genomics to see how it compared. The one that popped up was An integrated encyclopedia of DNA elements in the human genome.


Sunday, November 20, 2022

Saturday, November 19, 2022

How many enhancers in the human genome?

In spite of what you might have read, the human genome does not contain one million functional enhancers.

The Sept. 15, 2022 issue of Nature contains a news article on "Gene regulation" [Two-layer design protects genes from mutations in their enhancers]. It begins with the following sentence.

The human genome contains only about 20,000 protein-coding genes, yet gene expression is controlled by around one million regulatory DNA elements called enhancers.

Sandwalk readers won't need to be told the reference for such an outlandish claim because you all know that it's the ENCODE Consortium summary paper from 2012—the one that kicked off their publicity campaign to convince everyone of the death of junk DNA (ENCODE, 2012). ENCODE identified several hundred thousand transcription factor (TF) binding sites and in 2012 they estimated that the total number of base pairs invovled in regulating gene expression could account for 20% of the genome.

How many of those transcription factor binding sites are functional and how many are due to spurious binding to sites that have nothing to do with gene regulation? We don't know the answer to that question but we do know that there will be a huge number of spurious binding sites in a genome of more than three billion base pairs [Are most transcription factor binding sites functional?].

The scientists in the ENCODE Consortium didn't know the answer either but what's surprising is that they didn't even know there was a question. It never occured to them that some of those transcription factor binding sites have nothng to do with regulation.

Fast forward ten years to 2022. Dozens of papers have been published criticizing the ENCODE Consortium for their stupidity lack of knowledge of the basic biochemical properties of DNA binding proteins. Surely nobody who is interested in this topic believes that there are one million functional regulatory elements (enhancers) in the human genome?

Wrong! The authors of this Nature article, Ran Elkon at Tel Aviv University (Israel) and Reuven Agami at the Netherlands Cancer Institute (Amsterdam, Netherlands), didn't get the message. They think it's quite plausible that the expression of every human protein-coding gene is controlled by an average of 50 regulatory sites even though there's not a single known example any such gene.

Not only that, for some reason they think it's only important to mention protein-coding genes in spite of the fact that the reference they give for 20,000 protein-coding genes (Nurk et al., 2022) also claims there are an additional 40,000 noncoding genes. This is an incorrect claim since Nurk et al. have no proof that all those transcribed regions are actually genes but let's play along and assume that there really are 60,000 genes in the human genome. That reduces the average number of enhancers to an average of "only" 17 enhancers per gene. I don't know of a single gene that has 17 or more proven enhancers, do you?

Why would two researchers who study gene regulation say that the human genome contains one million enhancers when there's no evidence to support such a claim and it doesn't make any sense? Why would Nature publish this paper when surely the editors must be aware of all the criticism that arose out of the 2012 ENCODE publicity fiasco?

I can think of only two answers to the first question. Either Elkon and Agami don't know of any papers challenging the view that most TF binding sites are functional (see below) or they do know of those papers but choose to ignore them. Neither answer is acceptable.

I think that the most important question in human gene regulation is how much of the genome is devoted to regulation. How many potential regulatory sites (enhancers) are functional and how many are spurious non-functional sites? Any paper on regulation that does not mention this problem should not be published. All results have to interpreted in light of conflicting claims about function.

Here are some example of papers that raise the issue. The point is not to prove that these authors are correct - although they are correct - but to show that there's a controvesy. You can't just state that there are one million regulatory sites as if it were a fact when you know that the results are being challenged.

"The observations in the ENCODE articles can be explained by the fact that biological systems are noisy: transcription factors can interact at many nonfunctional sites, and transcription initiation takes place at different positions corresponding to sequences similar to promoter sequences, simply because biological systems are not tightly controlled." (Morange, 2014)

"... ENCODE had not shown what fraction of these activities play any substantive role in gene regulation, nor was the project designed to show that. There are other well-studied explanations for reproducible biochemical activities besides crucial human gene regulation, including residual activities (pseudogenes), functions in the molecular features that infest eukaryotic genomes (transposons, viruses, and other mobile elements), and noise." (Eddy, 2013)

"Given that experiments performed in a diverse number of eukaryotic systems have found only a small correlation between TF-binding events and mRNA expression, it appears that in most cases only a fraction of TF-binding sites significantly impacts local gene expression." (Palazzo and Gregory, 2014)

One surprising finding from the early genome-wide ChIP studies was that TF binding is widespread, with thousand to tens of thousands of binding events for many TFs. These number do not fit with existing ideas of the regulatory network structure, in which TFs were generally expected to regulate a few hundred genes, at most. Binding is not necessarily equivalent to regulation, and it is likely that only a small fraction of all binding events will have an important impact on gene expression. (Slattery et al., 2014)

Detailed maps of transcription factor (TF)-bound genomic regions are being produced by consortium-driven efforts such as ENCODE, yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood. (White et al., 2013)

One outstanding issue is the fraction of factor binding in the genome that is "functional", which we define here to mean that disturbing the protein-DNA interaction leads to a measurable downstream effect on gene regulation. (Cusanovich et al., 2014)

... we expect, for example, accidental transcription factor-DNA binding to go on at some rate, so assuming that transcription equals function is not good enough. The null hypothesis after all is that most transcription is spurious and alterantive transcripts are a consequence of error-prone splicing. (Hurst, 2013)

... as a chemist, let me say that I don't find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant. (Jogalekar, 2012)


Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., et al. (2022) The complete sequence of a human genome. Science, 376:44-53. [doi:10.1126/science.abj6987]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74. [doi: 10.1038/nature11247]

Academic workers on strike at University of California schools

Graduate students, postdocs, and other "academic workers" are on strike for higher wages and better working conditions at University of California schools but it's very difficult to understand what's going on.

Several locals of the United Auto Workers union are on strike. The groups include Academic Researchers, Academic Student Employees (ASEs), Postdocs, and Student Researchers. The list of demands can be found on the UAW website: UAW Bargaining Highlights (All Units).

Here's the problem. At my university, graduate students can make money by getting a position as a TA (teaching assistant). This is a part-time job at an hourly rate. This may be a major source of income for humanities students but for most science students it's just a supplement to their stipend. The press reports on this strike keep referring to a yearly income and they make it sound like part-time employment as a TA should pay a living wage. For example, a recent Los Angeles Times article says,

The workers are demanding a base salary of $54,000 for all graduate student workers, child-care subsidies, enhanced healthcare for dependents, longer family leave, public transit passes and lower tuition costs for international scholars. The union said the workers earn an average current pay of about $24,000 a year.

I don't understand this concept of "base salary." In my experience, most TAs work part time. If they were paid $50 per hour then they would have to work about 30 hours per week over two semesters in order to earn $54,000 per year. That doesn't seem to leave much time for working on a thesis. Perhaps it includes a stipend that doesn't require teaching?

Our graduate students are paid a living allowance (currently about $28,000 Cdn) and their tuition and fees are covered by an extra $8,000. Most of them don't do any teaching. Almost all of this money comes from research grants and not directly from the university.

The University of California system seems to be very different from the one I'm accustomed to. Is the work of TAs obvious to most Americans? Do you understand the issues?

I also don't get the situation with postdocs. The union is asking for a $70,000 salary for postdocs and the university is offering an 8% increase in the first year and smaller increases in subsequent years. In Canada, postdocs are mostly paid from research grants and not from university funds. The average postdoc salary at the University of Toronto is $51,000 (Cdn) but the range is quite large ($40K - $100K). I don't think the University of Toronto can dictate to PIs the amount of money that they have to pay a postdoc but it does count them as employees and ensures that postdocs have healthcare and suitable working conditions. These postdocs are members of a union (CUPE 3902) and there is a minimum stipend of $36,000 (Cdn).

Can someone explain the situation at the University of California schools? Are they asking for a minimum salary of $70,000 (US) ($93,700 Cdn)? Will PIs have to pay postdocs more from their research grants if the union wins a wage increase but the postdocs are already earning more than 70,000?

It's all very confusing and the press doesn't seem to have a good handle on the situation.

Note: I know that the union doesn't expect the university to meet it's maximum demands. I'm sure they will settle for something less. That's not the point I'm trying to make. I'm just trying to understand how graduate students and postdocs are paid in University of California schools.


Friday, November 18, 2022

Higher education for all?

I discuss a recent editorial in Science that advocates expanding university education in order to prepare students for jobs.

I believe that the primary goal of a university education is to teach students how to think (critical thinking). This goal is usually achieved within the context of an in-depth study in a particular field such as history, English literature, geology, or biochemistry. The best way of achieving that goal is called student-centered learning. It involves, among other things, classes that focus on discussion and debate under the guidance of an experienced teacher.

Universities and colleges also have job preparation programs such as business management, medicine, computer technology, and, to a lesser extent, engineering. These programs may, or may not, teach critical thinking (usually not).

About 85% of students who enter high school will graduate. About 63% of high school graduates go to college or university. The current college graduation rate is 65% (within six years). What this means is that for every 100 students that begin high school roughly 35 will graduate from college.

Now let's look at an editorial written by Marcia McNutt, President of the United States National Academy of Sciences. The editorial appears in the Nov. 11 issue of Science [Higher education for all]. She begins by emphasizing the importance of a college degree in terms of new jobs and the wealth of nations.

Currently, 75% of new jobs require a college degree. Yet in the US and Europe, only 40% of young adults attend a 2-year or 4-year college—a percentage that has either not budged or only modestly risen in more than two decades— despite a college education being one of the proven ways to lift the socioeconomic status of underprivileged populations and boost the wealth of nations.

There's no question that well-educated graduates will contribute to society in many ways but there is a question about what "well-educated" really means. Is it teaching specific jobs skills or is it teaching students how to think? I vote for teaching critical thinking and not for job training. I think that creating productive citizens who can fill a variety of different jobs is a side-benefit of preparing them to cope with a complex society that requires critical thinking. I don't think my view is exactly the same as Marcia McNutt's because she emphasizes training as a main goal of college education.

Universities, without building additional facilities, could expand universal and life-long access to higher education by promoting more courses online and at satellite community-college campuses.

Statements like that raise two issues that don't get enough attention. The first one concerns the number of students who should graduate from college in an ideal world. What is that number and at what stage should it be enhanced? Should here be more high school graduates going to college? If so, does that require lowering the bar for admission or is the cost of college the main impediment? Is there a higher percentage of students entering college in countries with free, or very low, tuition? Should there be more students graduating? If so, one easy way to do that is to make university courses easier. Is that what we want?

The question that's never asked is what percentage of the population is smart enough to get a college degree? Is it much higher than 40%?

The second issue concerns the quality of education. The model that I suggested above is not consistent with online courses and there's a substantial number of papers in the pedagogical literature showing the student centered education doesn't work very well online. Does that mean that we should adopt a different way of teaching in order to make college education more accessible? If so, at what cost?

McNutt gives us an example of the kind of change she envisages.

At Colorado College, students complete a lab science course in only 4 weeks, attending lectures in the morning and labs in the afternoon. This success suggests that US universities could offer 2-week short courses that include concentrated, hands-on learning and teamwork in the lab and the field for students who already mastered the basics through online lectures. Such an approach is more common in European institutions of higher education and would allow even those with full-time employment elsewhere to advance their skills during vacations or employer-supported sabbaticals for the purpose of improving the skills of the workforce. Opportunities abound for partnerships with industry for life-long learning. The availability of science training in this format could also be a boon for teachers seeking to fill gaps in their science understanding.

This is clearly a form of college education that focuses on job skills and even goes as far as suggesting that industry could be a "partner" in education. (As an aside, it's interesting that government employers, schools, and nonprofits are never asked to be partners in education even though they hire a substantial number of college graduates.)

Do you agree that the USA should be expanding the number of students who graduate from college and do you agree that the goal is to give them the skills needed to get a job?


Tuesday, November 08, 2022

Science education in an age of misinformation

I just read an annoying article in Boston Review: The Inflated Promise of Science Education. It was written by Catarina Dutilh Novaes, a Professor of Philosophy, and Silvia Ivani, a teaching fellow in philosophy. Their main point was that the old-fashioned way of teaching science has failed because the general public mistrusts scientists. This mistrust stems, in part, from "legacies of scientific or medical racism and the commercialization of biomedical science."

The way to fix this, according to the authors, is for scientists to address these "perceived moral failures" by engaging more with society.

"... science should be done with and for society; research and innovation should be the product of the joint efforts of scientists and citizens and should serve societal interests. To advance this goal, Horizon 2020 encouraged the adoption of dialogical engagement practices: those that establish two-way communication between experts and citizens at various stages of the scientific process (including in the design of scientific projects and planning of research priorities)."

Clearly, scientific education ought to mean the implanting of a rational, sceptical, experimental habit of mind. It ought to mean acquiring a method – a method that can be used on any problem that one meets – and not simply piling up a lot of facts.

George Orwell

This is nonsense. It has nothing to do with science education; instead, the authors are focusing on policy decisions such as convincing people to get vaccinations.

The good news is that the Boston Review article links to a report from Stanford University that's much more intelligent: Science Education in an Age of Misinformation. The philosophers think that this report advocates "... a well-meaning but arguably limited approach to the problem along the lines of the deficit model ...." where "deficit model refers to a mode of science communication where scientists just dispense knowledge to the general public who are supposed to accept it uncritically.

I don't know of very many science educators who think this is the right way to teach. I think the prevailing model is to teach the nature of science (NOS) [The Nature of Science (NOS)]. That requires educating students and the general public about the way science goes about creating knowledge and why evidence-based knowledge is reliable. It's connected to teaching critical thinking, not teaching a bunch scientific facts. The "deficit model" is not the current standard in science education and it hasn't been seriously defended for decades.

"Appreciating the scientific process can be even more important than knowing scientific facts. People often encounter claims that something is scientifically known. If they understand how science generates and assesses evidence bearing on these claims, they possess analytical methods and critical thinking skills that are relevant to a wide variety of facts and concepts and can be used in a wide variety of contexts.”

National Science Foundation, Science and Technology Indicators, 2008

An important part of the modern approach as described in the Stanford report is teaching students (and the general public) how to gather information and determine whether or not it's reliable. That means you have to learn how to evalute the reliabiltiy of your sources and whether you can trust those who claim to be experts. I taught an undergraduate course on this topic for many years and I learned that it's not easy to teach the nature of science and critical thinking.

The Stanford Report is about the nature of science (NOS) model and how to implement it in the age of social media. Specifically, it's about teaching ways to evaluate your sources when you are inundated with misinformation.

The main part of this approach is going to seem controversial to many because it emphasizes the importance of experts and there's a growing reluctance in our society to trust experts. That's what the Boston Globe article was all about. The solutions advocated by the authors of that article are very different than the ones presented in the Sanford report.

The authors of the Standford report recognize that there's a widepread belief that well-educated people can make wise decisions based entirely on their own knowledge and judgement, in other words, that they can be "intellectually independent." They reject that false belief.

The ideal envisioned by the great American educator and philosopher John Dewey—that it is possible to educate students to be fully intellectually independent—is simply a delusion. We are always dependent on the knowledge of others. Moreover, the idea that education can educate independent critical thinkers ignores the fact that to think critically in any domain you need some expertise in that domain. How then, is education to prepare students for a context where they are faced with knowledge claims based on ideas, evidence, and arguments they do not understand?

The goal of science education is to teach students how to figure out which source of information is supported by the real experts and that's not an easy task. It seems pretty obvious that scientists are the experts but not all scientists are experts so how do you tell the difference between science quacks and expert scientists?

The answer requires some knowledge about how science works and how scientists behave. The Stanford reports says that this means acquiring an understanding of "science as a social practice." I think "social practice" is bad choice of terms and I would have preferred that they stick with "nature of science" but that was their choice."

The mechanisms for recognizing the real experts relies on critical thinking but it's not easy. Two of the lead authors1 of the Stanford Report published a short synopsis in Science last month (October 2022) [https://doi.org/10.1126/science.abq8-93]. Their heuristic is shown on the right.

The idea here is that you can judge the quality of scientific information by questioning the credentials of the authors. This is how "outsiders" can make judgements about the quality of the information without being experts themselves. The rules are pretty good but I wish there had been a bit more on "Unbiased scientific information" as a criterion. I think that you can make a judgement based on whether the "experts" take the time to discuss alternative hypotheses and explain the positions of those who disagree with them but this only applies to genuine scientific controversies and if you don't know that there's a controversy then you have no reason to apply this filter.

For example, if a paper is telling you about the wonderful world of regulatory RNAs and points out that there are 100,000 newly discovered genes for these RNAs, you would have no reason to question that if the scientists have no conflict of interest and come from prestigious universities. You would have to reply on the reviewers of the paper, and the journal, to insist that alternative explanations (e.g. junk RNA) were mentioned. That process doesn't always work.

There's no easy way to fix that problem. Scientists are biased all the time but outsiders (i.e. non-experts) have no way of recognizing bias. I used to think that we could rely on science journalists to alert us to these biases and point out that the topic is controversial and no consensus has been reached. That didn't work either.

At least the heuristic works some of the time so we should make sure we teach it to students of all ages. It would be even nicer if we could teach scientists how to be credible and how to recognize their own biases.


1. The third author is my former supervisor, Bruce Alberts, who has been interested in science education for more than fifty years. He did a pretty good job of educating me! :-)

Saturday, November 05, 2022

Nature journalist is confused about noncoding RNAs and junk

Nature Methods is one of the journals in Nature Portfolio published by Springer Nature. Its focus is novel methods in the life sciences.

The latest issue (October, 2022) highlights the issues with identifying functional noncoding RNAs and the editorial, Decoding noncoding RNAs, is quite good—much better than the comments in other journals. Here's the final paragraph.

Despite the increasing prominence of ncRNA, we remind readers that the presence of a ncRNA molecule does not always imply functionality. It is also possible that these transcripts are non-functional or products from, for example, splicing errors. We hope this Focus issue will provide researchers with practical advice for deciphering ncRNA’s roles in biological processes.

However, this praise is mitigated by the appearance of another article in the same journal. Science journalist, Vivien Marx has written a commentary with a title that was bound to catch my eye: How noncoding RNAs began to leave the junkyard. Here's the opening paragraph.

Junk. In the view of some, that’s what noncoding RNAs (ncRNAs) are — genes that are transcribed but not translated into proteins. With one of his ncRNA papers, University of Queensland researcher Tim Mercer recalls that two reviewers said, “this is good” and the third said, “this is all junk; noncoding RNAs aren’t functional.” Debates over ncRNAs, in Mercer’s view, have generally moved from ‘it’s all junk’ to ‘which ones are functional?’ and ‘what are they doing?’

This is the classic setup for a paradigm shaft. What you do is create a false history of a field and then reveal how your ground-breaking work has shattered the long-standing paradigm. In this case, the false history is that the standard view among scientists was that ALL noncoding RNAs were junk. That's nonsense. It means that these old scientists must have dismissed ribosomal RNA and tRNA back in the 1960s. But even if you grant that those were exceptions, it means that they knew nothing about Sidney Altman's work on RNAse P (Nobel Prize, 1989), or 7SL RNA (Alu elements), or the RNA components of spliceosomes (snRNAs), or PiWiRNAs, or snoRNAs, or microRNAs, or a host of regulatory RNAs that have been known for decades.

Knowledgeable scientists knew full well that there are many functional noncoding RNAS and that includes some that are called lncRNAs. As the editorial says, these knowledgeable scientists are warning about attributing function to all transcripts without evidence. In other words, many of the transcripts found in human cells could be junk RNA in spite of the fact that there are also many functional nonciding RNAs.

So, Tim Mercer is correct, the debate is over which ncRNAs are functional and that's the same debate that's been going on for 50 years. Move along folks, nothing to see here.

The author isn't going to let this go. She decides to interview John Mattick, of all people, to get a "proper" perspective on the field. (Tim Mercer is a former student of Mattick's.) Unfortunately, that perspective contains no information on how many functional ncRNAs are present and on what percentage of the genome their genes occupy. It's gonna take several hundred thousand lncRNA genes to make a significant impact on the amount of junk DNA but nobody wants to say that. With John Mattick you get a twofer: a false history (paradigm strawman) plus no evidence that your discoveries are truly revolutionary.

Nature Methods should be ashamed, not for presenting the views of John Mattick—that's perfectly legitimate—but for not putting them in context and presenting the other side of the controversy. Surely at this point in time (2022) we should all know that Mattick's views are on the fringe and most transcripts really are junk RNA?