Sandwalk: How to Evaluate Genome Level Transcription Papers

Tuesday, April 21, 2009

How to Evaluate Genome Level Transcription Papers

It's often very difficult to evaluate the results of large-scale genome studies. Part of the problem is that the technology is complicated and the controls are not obvious. Part of the problem is that the results depend a great deal on the software used to analyze the data and the limitations of the software are often not described.

But those aren't the only problems. We also have to take into consideration the biases of the people who write the papers. Some of those biases are the same ones we see in other situations except that they are less obvious in the case of large-scale genome studies.

Laurence Hurst has written up a nice summary of the problem and I'd like to quote from his recent paper (Hurst, 2009).

In the 1970s and 80s there was a large school of evolutionary biology, much of it focused on understanding animal behavior, that to a first approximation assumed that whatever trait was being looked at was the product of selection. Richard Dawkins is probably the most widely known advocate for this school of thought, John Maynard Smith and Bill (WD) Hamilton its main proponents. The game played in this field was one in which ever more ingenious selectionist hypotheses would be put forward and tested. The possibility that selection might not be the answer was given short shrift.

By contrast, during the same period non-selectionist theories were gaining ground as the explanatory principle for details seen at the molecular level. According to these models, chance plays an important part in determining the fate of a new mutation – whether it is lost or spreads through a population. Just as a neutrally buoyant particle of gas has an equal probability of diffusing up or down, so too in Motoo Kimura's neutral theory of molecular evolution an allele with no selective consequences can go up or down in frequency, and sometimes replace all other versions in the population (that is, it reaches fixation). An important extension of the neutral theory (the nearly-neutral theory) considers alleles that can be weakly deleterious or weakly advantageous. The important difference between the two theories is that in a very large population a very weakly deleterious allele is unlikely to reach fixation, as selection is given enough opportunity to weed out alleles of very small deleterious effects. By contrast, in a very small population a few chance events increasing the frequency of an allele can be enough for fixation. More generally then, in large populations the odds are stacked against weakly deleterious mutations and so selection should be more efficient in large populations.

In this framework, mutations in protein-coding genes that are synonymous – that is, that replace one codon with another specifying the same amino acid and, therefore, do not affect the protein – or mutations in the DNA between genes (intergene spacers) are assumed to be unaffected by selection. Until recently, a neutralist position has dominated thinking at the genomic/molecular level. This is indeed reflected in the use of the term 'junk DNA' to describe intergene spacer DNA.

These two schools of thought then could not be more antithetical. And this is where genome evolution comes in. The big question for me is just what is the reach of selection. There is little argument about selection as the best explanation for gross features of organismic anatomy. But what about more subtle changes in genomes? Population genetics theory can tell you that, in principle, selection will be limited when the population comprises few individuals and when the strength of selection against a deleterious mutation is small. But none of this actually tells you what the reach of selection is, as a priori we do not know what the likely selective impact of any given mutation will be, not least because we cannot always know the consequences of apparently innocuous changes. The issue then becomes empirical, and genome evolution provides a plethora of possible test cases. In examining these cases we can hope to uncover not just what mutations selection is interested in, but also to discover why, and in turn to understand how genomes work. Central to the issue is whether our genome is an exquisite adaption or a noisy error-prone mess.

Sandwalk readers will be familiar with this problem. In the context of genome studies, the adaptationist approach is most often reflected as a bias in favor of treating all observations as evidence of functionality. It you detect it, then it must have been selected. If it was selected, it must be important.

As Hurst points out, the real question in evaluating genome studies boils down to a choice between an exquisitely adapted genome or one that is messy and full of mistakes. The battlefields are studies on the frequency of alternative splicing, transcription, the importance of small RNAs, and binding sites for regulatory proteins.

Let's take transcription studies as an example.

Consider, for example, the problem of transcription. Although maybe only 5% of the human genome comprises genes encoding proteins, the great majority of the DNA in our genome is transcribed into RNA [1]. In this the human genome is not unusual. But is all this transcription functionally important? The selectionist model would propose that the transcription is physiologically relevant. Maybe the transcripts specify previously unrecognized proteins. If not, perhaps the transcripts are involved in RNA-level regulation of other genes. Or the process of transcription may be important in keeping the DNA in a configuration that enables or suppresses transcription from closely linked sites.

The alternative model suggests that all this excess transcription is unavoidable noise resulting from promiscuity of transcription-factor binding. A solid defense can be given for this. If you take 100 random base pairs of DNA and ask what proportion of the sequence matches some transcription factor binding site in the human genome, you find that upwards of 50% of the random sequence is potentially bound by transcription factors and that there are, on average, 15 such binding sites per 100 nucleotides. This may just reflect our poor understanding of transcription factor binding sites, but it could also mean that our genome is mostly transcription factor binding site. If so, transcription everywhere in the genome is just so much noise that the genome must cope with.

There is no definitive solution to this conflict. Both sides have passionate advocates and right now you can't choose one over the other. My own bias is that most of the transcription is just noise—it is not biologically relevant.

That's not the point, however. The point is that as a reader of the scientific literature you have to make up your mind whether the data and the interpretation are believable.

Here's two criteria that I use to evaluate a paper on genome level transcription.

I look to see whether the authors are aware of the adaptation vs noise controversy. If they completely ignore the possibility that what they are looking at could be transcriptional noise, then I tend to dismiss the paper. It is not good science to ignore alternative hypotheses. Furthermore, such papers will hardly ever have controls or experiments that attempt to falsify the adaptationist interpretation. That's because they are unaware of the fact that a controversy exists.¹
Does the paper have details about the abundance of individual transcripts? If the paper is making the case for functional significance then one of the important bits of evidence is reporting on the abundance of the rare transcripts. If the authors omit this bit of information, or skim over it quickly, then you should be suspicious. Many of these rare transcripts are present in less that one or two copies per cell and that's perfectly consistent with transcriptional noise—even if it's only one cell type that's expressing the RNA. There aren't many functional roles for an RNA whose concentration is in the nanomole range. Critical thinkers will have thought about the problem and be prepared to address it head-on.

1. Or, maybe they know there's a controversy but they don't want you to be thinking about it as you read their paper. Or, maybe they think the issue has been settled and the "messy" genome advocates have been routed. Either way, these are not authors you should trust.

Hurst, L.D. (2009) Evolutionary genomics and the reach of selection. Journal of Biology 8:12 [DOI:10.1186/jbiol113]

10 comments :

ERV said...: Question: What about 'translational' noise? Does that exist? Or do you think once you get to the protein stage, the cell has 'invested' too much for it to be waste, thus its 'doing something'?; Tuesday, April 21, 2009 9:24:00 PM
Eric Pedersen said...: Has anyone tried building a null model of expected rates of accidental transcription, given positional information on a chromosome, or something similar? My background in molecular biology is limited to a couple courses, but it still seems like this would be the kind of tool that someone would have worked on...; Tuesday, April 21, 2009 10:07:00 PM
DK said...: Question: What about 'translational' noise? Does that exist?Of course it exists. The noise always exists anywhere - it's just a matter of degree and its importance.

For normal cell physiology, the translational noise is a non-issue. At least as non-significant as polymerase errors.

Luckily, technology does not yet allow 'omics folks to quantify every mistranslated peptide at 10^(-6) abundance and claim its functional importance.

DK; Tuesday, April 21, 2009 10:53:00 PM
Anonymous said...: Hi I don't have access to the article: Hurst, L.D. (2009) Evolutionary genomics and the reach of selection. Journal of Biology 8:12

The sentence from this article is intresting:

"Although maybe only 5% of the human genome comprises genes encoding proteins, the great majority of the DNA in our genome is transcribed into RNA [1]".

Could anyone let me know what is that reference ([1])?

This kind of information turns my ideas about genes and genome upside-down. I have always thought that junk-DNA is not transcribed.; Wednesday, April 22, 2009 6:16:00 AM
-DG said...: Since I come at molecular evolution from a Protist angle I think an important point often missed in regards to neutrality is that not all Synonymous mutations are necessariloy neutral by definition. Especially in genomes undergoing reduction (or small genomes in general like viruses) there is a clear codon usage bias and even codon pair bias. These biases also seem to, in many cases, correlate with relative abundance of certain tRNAs suggesting an effect on transcriptional efficiency.

Right now I'm in the camp that leans towards much of this being transcriptional noise but I don't think the prevalence and relative important of small RNAs can be overlooked. Look at RNA editing as an example beyond just alternative splicing. Evolution has taken some bizarre twists and turns at the molecular level but interestingly enough the existence of all of these RNAs just leads me further in the "mess direction" as opposed to the perfectly adapted direction.; Wednesday, April 22, 2009 7:52:00 AM
-DG said...: Oh forgot to add to Abbie that I am sure translational noise probably exists ass well as transcriptional noise does.; Wednesday, April 22, 2009 7:53:00 AM
Larry Moran said...: lazyelephant asks,

Could anyone let me know what is that reference ([1])?The reference is Kapranov et al. (2007) Nat. Rev. Genet. 8:413.

There are lots of other papers published since then. We've known since the 1970s that significant amounts of the genome, including repetitive sequences, are transcribed at low levels.

The recent flurry of activity is based on chip technology and not Rot analysis. The interpretation of the results is very different from the consensus in the 1970's.

Many modern scientists seem to be worried about the low number of genes in the human genome and they are looking for ways to explain what they think is a paradox; namely, that the complexity of humans isn't reflected in the number of genes. That's what's behind many of the claims of massive alternative splicing, an adaptive role for transposons, and abundant functional non-coding RNA's.

It's what I call The Deflated Ego Problem.; Wednesday, April 22, 2009 10:44:00 AM
Anonymous said...: I thought the Deflated Ego Problem was older scientists pooh-poohing new ideas and data as "stuff we've known for the last 30 or 40 years" because the older scientists didn't come up with it 30 or 40 years ago and the thought that science goes on in important directions that differ from their pre-conceived expectations is difficult to bear. :)

"New scientific truth usually becomes accepted, not because opponents become convinced, but because opponents die, and because the rising generation is familiar with the new truth at the outset"
-- Max Planck, Naturwissenschaften 33 (1946), p. 230.; Wednesday, April 22, 2009 5:08:00 PM
Larry Moran said...: Anonymous says,

I thought the Deflated Ego Problem was older scientists pooh-poohing new ideas and data as "stuff we've known for the last 30 or 40 years" because the older scientists didn't come up with it 30 or 40 years ago and the thought that science goes on in important directions that differ from their pre-conceived expectations is difficult to bear. :)30 or 40 years ago we thought that noise was a property of living systems so we weren't all that surprised to find that much of the genome was transcribed at low levels every so often.

The data isn't new, except that now we can identify the exact regions that are being transcribed.

What's new is the interpretation. I don't object to a different interpretation (i.e. the rare RNAs are functional). What I object to is the fact that in advocating their pet hypothesis, many of the scientists are completely ignoring any other possibility—such as messy biology. That's not resisting change, that's objecting to bad science.; Wednesday, April 22, 2009 5:36:00 PM
zumb said...: How difficult it would be to engineer a genome (smaller than human genome, but still containing many UTR sequences) deleted of most of its "junk DNA" and testing for its feasibility?
I believe such an experiment would answer once and for all the question whether the "transcriptional noise" is necessary for the organism proper function or not.; Saturday, April 25, 2009 8:20:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Tuesday, April 21, 2009

How to Evaluate Genome Level Transcription Papers

10 comments :