But those aren't the only problems. We also have to take into consideration the biases of the people who write the papers. Some of those biases are the same ones we see in other situations except that they are less obvious in the case of large-scale genome studies.
Laurence Hurst has written up a nice summary of the problem and I'd like to quote from his recent paper (Hurst, 2009).
In the 1970s and 80s there was a large school of evolutionary biology, much of it focused on understanding animal behavior, that to a first approximation assumed that whatever trait was being looked at was the product of selection. Richard Dawkins is probably the most widely known advocate for this school of thought, John Maynard Smith and Bill (WD) Hamilton its main proponents. The game played in this field was one in which ever more ingenious selectionist hypotheses would be put forward and tested. The possibility that selection might not be the answer was given short shrift.Sandwalk readers will be familiar with this problem. In the context of genome studies, the adaptationist approach is most often reflected as a bias in favor of treating all observations as evidence of functionality. It you detect it, then it must have been selected. If it was selected, it must be important.
By contrast, during the same period non-selectionist theories were gaining ground as the explanatory principle for details seen at the molecular level. According to these models, chance plays an important part in determining the fate of a new mutation – whether it is lost or spreads through a population. Just as a neutrally buoyant particle of gas has an equal probability of diffusing up or down, so too in Motoo Kimura's neutral theory of molecular evolution an allele with no selective consequences can go up or down in frequency, and sometimes replace all other versions in the population (that is, it reaches fixation). An important extension of the neutral theory (the nearly-neutral theory) considers alleles that can be weakly deleterious or weakly advantageous. The important difference between the two theories is that in a very large population a very weakly deleterious allele is unlikely to reach fixation, as selection is given enough opportunity to weed out alleles of very small deleterious effects. By contrast, in a very small population a few chance events increasing the frequency of an allele can be enough for fixation. More generally then, in large populations the odds are stacked against weakly deleterious mutations and so selection should be more efficient in large populations.
In this framework, mutations in protein-coding genes that are synonymous – that is, that replace one codon with another specifying the same amino acid and, therefore, do not affect the protein – or mutations in the DNA between genes (intergene spacers) are assumed to be unaffected by selection. Until recently, a neutralist position has dominated thinking at the genomic/molecular level. This is indeed reflected in the use of the term 'junk DNA' to describe intergene spacer DNA.
These two schools of thought then could not be more antithetical. And this is where genome evolution comes in. The big question for me is just what is the reach of selection. There is little argument about selection as the best explanation for gross features of organismic anatomy. But what about more subtle changes in genomes? Population genetics theory can tell you that, in principle, selection will be limited when the population comprises few individuals and when the strength of selection against a deleterious mutation is small. But none of this actually tells you what the reach of selection is, as a priori we do not know what the likely selective impact of any given mutation will be, not least because we cannot always know the consequences of apparently innocuous changes. The issue then becomes empirical, and genome evolution provides a plethora of possible test cases. In examining these cases we can hope to uncover not just what mutations selection is interested in, but also to discover why, and in turn to understand how genomes work. Central to the issue is whether our genome is an exquisite adaption or a noisy error-prone mess.
As Hurst points out, the real question in evaluating genome studies boils down to a choice between an exquisitely adapted genome or one that is messy and full of mistakes. The battlefields are studies on the frequency of alternative splicing, transcription, the importance of small RNAs, and binding sites for regulatory proteins.
Let's take transcription studies as an example.
Consider, for example, the problem of transcription. Although maybe only 5% of the human genome comprises genes encoding proteins, the great majority of the DNA in our genome is transcribed into RNA [1]. In this the human genome is not unusual. But is all this transcription functionally important? The selectionist model would propose that the transcription is physiologically relevant. Maybe the transcripts specify previously unrecognized proteins. If not, perhaps the transcripts are involved in RNA-level regulation of other genes. Or the process of transcription may be important in keeping the DNA in a configuration that enables or suppresses transcription from closely linked sites.There is no definitive solution to this conflict. Both sides have passionate advocates and right now you can't choose one over the other. My own bias is that most of the transcription is just noise—it is not biologically relevant.
The alternative model suggests that all this excess transcription is unavoidable noise resulting from promiscuity of transcription-factor binding. A solid defense can be given for this. If you take 100 random base pairs of DNA and ask what proportion of the sequence matches some transcription factor binding site in the human genome, you find that upwards of 50% of the random sequence is potentially bound by transcription factors and that there are, on average, 15 such binding sites per 100 nucleotides. This may just reflect our poor understanding of transcription factor binding sites, but it could also mean that our genome is mostly transcription factor binding site. If so, transcription everywhere in the genome is just so much noise that the genome must cope with.
That's not the point, however. The point is that as a reader of the scientific literature you have to make up your mind whether the data and the interpretation are believable.
Here's two criteria that I use to evaluate a paper on genome level transcription.
- I look to see whether the authors are aware of the adaptation vs noise controversy. If they completely ignore the possibility that what they are looking at could be transcriptional noise, then I tend to dismiss the paper. It is not good science to ignore alternative hypotheses. Furthermore, such papers will hardly ever have controls or experiments that attempt to falsify the adaptationist interpretation. That's because they are unaware of the fact that a controversy exists.1
- Does the paper have details about the abundance of individual transcripts? If the paper is making the case for functional significance then one of the important bits of evidence is reporting on the abundance of the rare transcripts. If the authors omit this bit of information, or skim over it quickly, then you should be suspicious. Many of these rare transcripts are present in less that one or two copies per cell and that's perfectly consistent with transcriptional noise—even if it's only one cell type that's expressing the RNA. There aren't many functional roles for an RNA whose concentration is in the nanomole range. Critical thinkers will have thought about the problem and be prepared to address it head-on.
1. Or, maybe they know there's a controversy but they don't want you to be thinking about it as you read their paper. Or, maybe they think the issue has been settled and the "messy" genome advocates have been routed. Either way, these are not authors you should trust.
Hurst, L.D. (2009) Evolutionary genomics and the reach of selection. Journal of Biology 8:12 [DOI:10.1186/jbiol113]