Wednesday, November 25, 2015

Selfish genes and transposons

Back in 1980, the idea that large fractions of animal and plant genomes could be junk was quite controversial. Although the idea was consistent with the latest developments in population genetics, most scientists were unaware of these developments. They were looking for adaptive ways of explaining all the excess DNA in these genomes.

Some scientists were experts in modern evolutionary theory but still wanted to explain "junk DNA." Doolittle & Sapienza, and Orgel & Crick, published back-to-back papers in the April 17, 1980 issue of Nature. They explained junk DNA by claiming that most of it was due to the presence of "selfish" transposons that were being selected and preserved because they benefited their own replication and transmission to future generations. They have no effect on the fitness of the organism they inhabit. This is natural selection at a different level.

This prompted some responses in later editions of the journal and then responses to the responses.

Here's the complete series ...
Doolittle, W.F., and Sapienza, C. (1980) Selfish genes, the phenotype paradigm and genome evolution. Nature, 284(5757), 601-603. PDF

Orgel, L.E., and Crick, F.H.C. (1980) Selfish DNA: the ultimate parasite. Nature, 284:604-607. [doi: 10.1038/284604a0]

Dover, G. (1980) Ignorant DNA? Nature, 285:618-619.

Cavalier-Smith, T. (1980) How selfish is DNA? Nature, 285:617-618. [doi: 10.1038/285617a0]

Orgel, L.E., Crick, F.H.C., and Sapienza, C. (1980) Selfish DNA. Nature, 288:645-646. [doi: 10.1038/288645a]

Dover, G., and Doolittle, W.F. (1980) Modes of genome evolution. Nature, 288:646-647.

Jain, H.K. (1980) Incidental DNA. Nature, 288:647-648.
These papers are widely misunderstood by people who have not read them carefully. The two main papers (top) are NOT arguing in favor of junk DNA and they are not the first papers to promote the idea of junk DNA. They are actually arguing AGAINST junk DNA. They propose that that a large part of the extra DNA in the human genome actually consists of transposons that are there for a reason. They are examples of selfish genes.

The authors explain that there are multiple levels of selection. Transposons may not confer selective advantage at the level of the organism but they are selected at the level of the gene. They have a function. Most of them have genes and regulatory sequences and they make proteins. They are not junk, at least by my definition of junk.

All of the authors were aware of the fact that the selfish gene idea promoted by Richard Dawkins was inconsistent with junk DNA so they are looking for a way out.

We can see this most clearly in the article by Orgel and Crick who say,
In summary, there is a large amount of evidence which suggests, but does not prove, that much DNA in higher organisms is little better than junk. We shall assume, for the rest of this article, that this hypothesis is true. We therefore need to explain how such DNA arose in the first place and why it is not speedily eliminated, since, by definition, it contributes little of nothing to the fitness of the organism.
In other words, it looks like there might be a lot of junk DNA but this conflicts with a particular view of evolution so we need an explanation.

Orgel and Crick continue ...
The theory of natural selection, in its more general formulation, deals with the competition between replicating entities. It shows that, in such a competition, the more efficient replicators increase in number at the expense of their less efficient competitors. After a sufficient time, the most efficient replicators survive. The idea of selfish DNA is based firmly on this idea of natural selection, but it deals with selection in an unfamiliar context.

The familiar neo-darwinian theory is concerned with the competition between organisms in a population. At the level of molecular genetics it provides an explanation of the spread of 'useful' genes or DNA sequences within a population. ...

The idea of selfish DNA is different [i.e. non-Darwinian LAM]. It is again concerned with the spread of a given DNA within the genome. However, in the case of selfish DNA, the sequence which spreads makes no contribution to the phenotype of the organism, except insofar as it is a slight burden to the cell that contains it. Selfish DNA sequences may be transcribed in some cases and not in others. The spread of selfish DNA sequences within the genome can be compared to the spread of a not-too-harmful parasite within its host.
They are proposing an explanation of excess DNA based on the idea that much of it is selfish DNA and not junk. That idea is not part of traditional neo-Darwinism but it does involve natural selection, albeit, at a different level.

Doolittle & Sapienza make pretty much the same point. Here's what they say in the abstract of their paper ...
Natural selection operating within genomes will inevitably result in the appearance of DNAs with no phenotypic expression whose only "function" is survival within genomes.
I don't know whether Crick, Orgel, and Sapienza thought that active transposons counted as junk DNA but I do know that's the view of Ford Doolittle. Less than 1% of the human genome consists of active transposons that are capable of transposing and preserving themselves in the genome as selfish parasites. The 1980 papers can be interpreted as evidence that accounts for the "junk" of that 1%.

I disagree with this interpretation of "junk." I think that as long as the DNA has a function at any level then it's not junk. I think that the small percentage of active, functional, transposons count as functional DNA and not junk [The Function Wars: Part II].

If the genes are preserved by natural selection (selfish DNA) then that's an argument AGAINST junk DNA, in my opinion. But no matter how you define function, the main point is that that the presence of active transposons in the human genome does not explain junk DNA. It may explain where 50% of the junk came from since 50% of the genome is bits and pieces of defective (broken) transposons, but that doesn't explain why the junk is retained in some genomes and removed in others. And it certainly doesn't explain the origin and preservation of the remaining 40% of the genome that's junk but not related to transposons.

This point was made by Gabriel Dover (Dover, 1980) in a letter published in June 1980.
Considering the data in toto, we cannot escape the notion that the genome is subjected to a range of random processes that are capable of generating any conceivable pattern of organisation. Thus there is no clear predictive distinction of the outcome between selfish and stochastic processes. In addition, if the majority of the families have no effect on the phenotype and are also no 'parasitic' (sensu Orgel and Crick) then they are simply neutral and spread (or not) according to statistical processes of drift similar to neutral alleles.
Dover advocates that we use the term "ignorant" DNA to reflect the fact that there are many possible explanations for excess DNA and right now (1980) we are 'ignorant' of the correct explanation.

In another letter in the same issue, Thomas Cavalier-Smith argues for a different kind of adaptive explanation of "excess" DNA (Cavalier-Smith, 1980).
While it may be futile to look for a 'special function for every piece of DNA', it is a mistake to imply that selection within the genome for 'selfish DNA' (intragenomic selection) explains the C-value paradox. One should also not neglect existing evidence for the idea that the overall amount of DNA in the genome has definite (nucleotypic) effects on cellular and organismal phenotypes, which are of profound adaptive significance.
A few years later, Ford Doolittle commented on the motive behind his selfish gene paper ...
Having recklessly abandoned one long-held prejudice (that eukaryotic nuclear genomes are necessarily more "advanced" than eubacterial genomes), we were emboldened to discard another: that most or all of the sometimes huge amount of DNA which eukaryotes carry around with them arose though and is maintained by natural selection operating though phenotype, either at the level of the individual or the population. The suggestion that much of this excess DNA is "junk" (the product of no form of natural selection at all) had of course been made before. But C. Sapienza and I suggested, as did L. Orgel and F. Crick, that natural selection of a hitherto little-appreciated type ("nonphenotypic" or "intragenomic" selection) perforce must operate within genomes to produce and maintain DNAs which, by virtue of their sequences (or the products of those sequences) alone, are more likely to be perpetuated within genomes, regardless of their effect on phenotype. We described such DNAs as "selfish" or "parasitic," and considered transposable elements (which may comprise most or all of the so-called "middle-repetitive" component of most or all eukaryotic genomes) as the best examples, although there may well be others.
Doolittle, 1982 p. 86)
Note how this explanation differs from the current explanation of junk DNA, which supposes that junk DNA has no function (neutral) and is maintained not by natural selection but by random genetic drift. Note also, that we now distinguish between functional transposons—that are truly selfish, and not junk IMHO—and the vast majority of transposon-related sequences that are not functional by any definition.

The idea that active, functional, transposons could be maintained in a genome by natural selection of the selfish kind was fleshed out by Tomoko Ohta. She published a paper with the details according to population genetics (Ohta, 1983). The theory is complicated because too many transposons in a genome have to be deleterious because of mutations caused by insertions. Also, the ability to act as a functional, active, transposon will gradually be lost by mutation and drift over the course of thousands of generations. The only way to maintain active transposons is by introducing new copies by duplication or import but there has to be a limit to how many functional transposons can be added to a genome.

The various theories are reviewed by Roizic and Deceliere (2005). They conclude that the existence of transposons and their invasion of a genome has "a robust theoretical base," but there are still problems. According to Roizic and Decelieres there is still no sound theory on why genomes reach an equilibrium and what prevents selfish transposons from taking over a genome and driving the species extinct.

Some people think that the presence of transposons helps a species evolve by providing an extra source of variation. This variation consists of beneficial effects of new insertions, such as providing new transcription sites, or of promoting gene rearrangements by providing recombination hotspots (or both). Those ideas have been around since the 1970s. For example, Barbara McClintock—the person who discovered eukaryotic transposable elements—had long believed that their purpose was to save genomes when organisms were exposed to a "shock" or a "challenge" (see McClintock, 1984).

It was partly in order to counter these adaptationist models that Doolittle and Sapienza published their paper as Ford Doolittle explained in a recent interview.
Back in 1980, people were talking about transposable elements as if their function was to speed evolution; that they exist because of their future utility. And I’ve never liked that kind of idea. I didn’t like it in terms of introns. And Dawkins had just published The Selfish Gene in 1978.

Carmen Sapienza, a student of mine who now works on eukaryotic imprinting, and I wrote a paper which was rejected by Science after seven referees. But we heard that Leslie Orgel and Francis Crick were working on something like this, so we sent it to them. They said, "If you submit it to Nature, we will tell Nature not to publish ours without publishing yours, and to publish yours first," etc., which was very nice.

That paper, seemingly now very simplistic, said you don’t need to suppose that transposable elements are there for the purpose of speeding evolution. These are selfish things, and natural selection will favor such elements that can make copies of themselves in genomes and then spread horizontally to other genomes within the species. These are basically parasites. I think many people would now accept this, but it was radical at the time.
The idea that some stretch of DNA could be present for one reason but subsequently co-opted for another purpose was a long-standing interest of Stephen Jay Gould. He and Elizabeth Vrba decided those sequences needed a name so they called it an "exaptation" (Gould and Vrba, 1980). Transposons were one of their examples ....
The uses of repetitive DNA For a few years after Watson and Crick elucidated the structure of DNA, many evolutionists hoped that the architecture of genetic material might fit all their presuppositions about evolutionary processes. The linear order of nucleotides might be the beads on a string of classical genetics: one gene, one enzyme; one nucleotide substitution, one minute alteration for natural selection to scrutinize. We are now, not even 20 years later, faced with genes in pieces, complex hierarchies of regulation and, above all, vast amounts of repetitive DNA. Highly repetitive, or satellite, DNA can exist in millions of copies; middle-repetitive DNA, with its tens to hundreds of copies, forms about one quarter of the genome in both Drosophila and Homo. What is all the repetitive DNA for (if anything)? How did it get there?

A survey of previous literature (Doolittle and Sapienza 1980; Gould 1981) reveals two emerging traditions of argument, both based on the selectionist assumption that repetitive DNA must be good for something if so much of it exists. One tradition (see Britten and Davidson 1971 for its locus classicus) holds that repeated copies are conventional adaptations, selected for an immediate role in regulation (by bringing previously isolated parts of the genome into new and favorable combinations, for example, when repeated copies disperse among several chromosomes). We do not doubt that conventional adaptation explains the preservation of much repeated DNA in this manner.

But many molecular evolutionists now strongly suspect that direct adaptation cannot explain the existence of all repetitive DNA: there is simply too much of it. The second tradition therefore holds that repetitive DNA must exist because evolution needs it so badly for a flexible future—as in the favored argument that "unemployed," redundant copies are free to alter because their necessary product is still being generated by the original copy (see Cohen 1976; Lewin 1975; and Kleckner 1977, all of whom also follow the first tradition and argue both sides). While we do not doubt that such future uses are vitally important consequences of repeated DNA, they simply cannot be the cause of its existence, unless we return to certain theistic views that permit the control of present events by future needs. [my emphasis - LAM]

This second tradition expresses a correct intuition in a patently nonsensical (in its nonpejorative meaning) manner. [my emphasis - LAM] The missing thought that supplies sense is a well articulated concept of exaptation. Defenders of the second tradition understand how important repetitive DNA is to evolution, but only know the conventional language of adaptation for expressing this conviction. But since utility is a future condition (when the redundant copy assumes a different function or undergoes secondary adaptation for a new role), an impasse in expression develops. To break this impasse, we might suggest that repeated copies are nonapted features, available for cooptation later, but not serving any direct function at the moment. [my emphasis - LAM] When coopted, they will be exaptations in their new role (with secondary adaptive modifications if altered).

What then is the source of these exaptations? According to the first tradition, they arise as true adaptations and later assume their different function. The second tradition, we have argued, must be abandoned. A third possibility has recently been proposed (or, rather, better codified after previous hints): perhaps repeated copies can originate for no adaptive reason that concerns the traditional Darwinian level of phenotypic advantage (Orgel and Crick 1980; Doolittle and Sapienza 1980). Some DNA elements are transposable; if these can duplicate and move, what is to stop their accumulation as long as they remain invisible to the phenotype (if they become so numerous that they begin to exert an energetic constraint upon the phenotype, then natural selection will eliminate them)? Such "selfish DNA" may be playing its own Darwinian game at a genic level, but it represents a true nonaptation at the level of the phenotype. Thus, repeated DNA may often arise as a nonaptation. Such a statement in no way argues against its vital importance for evolutionary futures. When used to great advantage in that future, these repeated copies are exaptations.
My view, for what it's worth, is that the existence of active transposons is adaptive at the level of the gene (selfish DNA) and those transposons are NOT junk DNA. They are part of the functional fraction of the genome.

Over time, these active transposons acquire mutations so that they are no longer capable of self-catalyzed transposition. At that point they become pseudogenes and they qualify as junk DNA. This DNA, which accounts for almost 50% of the human genome, is not maintained by natural selection acting on selfish DNA. It will likely be purged from the genome by negative selection in species with large population sizes because it is slightly deleterious (bacteria). In species with smaller population sizes (mammals) it will eventually be purged by random genetic drift but this process is very slow—too slow to counter the creation of new defective transposon sequences.

The opposing view is that these sequences of defective transposons act as the raw material of innovative evolution since they still may contain active promoters (e.g. LTRs) or enhancers. Because of their frequency in certain genomes, they can enhance genome rearrangements. One of the most vocal proponents of this idea is James Shapiro who claims that it's "Natural Genetic Engineering" [see James Shapiro Responds to My Review of His Book].

I agree with Gould and Vrba that this is "patently nonsensical (in its nonpejorative meaning) manner" although I don't know too many people who think that it's "nonpejorative" when your ideas are called "patently nonsensical."

As usual, if yuu want to learn more about this subject consult Michael Lynch and his book The Origins of Genome Architecture. (This where my personal views come from.) (See also Lynch et al. 2011.) Chapter 7 covers all the theories and summarizes all the data on the evolution of transposons. I really like his argument against the beneficial effect of transposons. The point he's making is that if transposons are there to help evolution then it seems like a very sloppy way to do it. It would be much easier to select for host mutations that do the job.
Summarizing to this point, three independent sets of observations point to the net deleterious nature of mobile elements: direct observations on the average effects of insertions; population surveys of insertion frequencies and element age distributions; and the paucity of such elements in hosts with large population sizes. Nonetheless, a number of investigators prefer to emphasize the beneficial side of such elements (e.g. Kidwell and Lish 2000, 2002; Wessler 2006), a view derived in part from the ideas promoted by the discoverer of mobile elements, who suggested that element activation during times of stress enables host genomes to modify themselves in potentially favorable ways (McClintock 1984). Perhaps the simplest and most compelling argument opposing this view derives from the fact that transposition and retrotransposition factors are virtually always encoded by the elements themselves, not by the host genome. Because numerous nonautonomous elements are known to be activated in trans, it is clear that host-encoded mobilization proteins could be relied on to regulate overall element activity if selection favored such a mutation.

Doolittle, W.F. (1982) Evolutionary molecular biology: where is it going? Can. J. Biochem. 60:83-90.

Gould, S.J., and Vrba, E.S. (1980) Exaptation-A Missing Term in the Science of Form. Paleobiology, 6:119-130. [PDF]

Lynch, M., Bobay, L.-M., Catania, F., Gout, J.-F., and Rho, M. (2011) The repatterning of eukaryotic genomes by random genetic drift. Annu. Rev. genomics and human genetics, 12:347-366. [doi: 10.1146/annurev-genom-082410-101412]

McClintock, B. (1984) The Significance of Responses of the Genome to Challenge. Science, 226:792-801. [PDF]

Ohta, T. (1983) Theoretical study on the accumulation of selfish DNA. Genetical research, 41:1-15. [doi: 10.1017/S0016672300021029]

Rouzic, A.L., and Deceliere, G. (2005) Models of the population genetics of transposable elements. Genetical research, 85:171-181. [doi: 10.1017/S0016672305007585]


  1. @Larry
    My understanding is that non-autonomous elements (like Alu), although defective, can certainly proliferate by hijacking the endonuclease /RT activity of functional elements. They are, in effect, secondary parasites. I would guess that elements that can still be mobilised this way are still functional by your definition, right?

    1. No, Alu elements are not selfish genes and they do not have a function. It's true that some of them are transcribed and it's true that the transcript can be copied into DNA and reintegrated into the genome but that's also true of the transcripts of every gene and of many pseudogenes. That process is an accident. It doesn't make the pseudogene functional in any meaningful sense of the word "function."

      The best way to think about function is to view it as something that is subject to positive natural selection at some level.

    2. But not many pseudogenes are populating the genomes of primates in quite the numbers that Alu elements do. That doesn't look like random genetic drift to me. I wouldn't be surprised at all if Alu is in fact subject to positive selection for increased amplification efficiency. Is the Alu sequence evolving at neutral rates?

    3. OK, some googling turned up this (old) paper, that suggests that at least the secondary structure is conserved:

      All this points to the Alu RNA as a target of selection during Alu evolution and suggests that its secondary structure is an important factor in proliferation ofAlu sequences, consistent with their amplification by retroposition.

    4. I'm not aware of any convincing evidence that Alu RNAs have a biological function. Only a small percentage of Alu elements are transcribed (about 10%) and many of those are insertions within introns.

      Only about 100 different Alu's are actively generating new inserts and as far as I know there's nothing special about those families other than the fact they produce RNA. All of the older families appear to be acquiring and fixing mutations at a rate that's consistent with nonfunctional sequences although with such a small sequence (300 bp) there's a great deal of variation.

      The original Alu insertion occurred long before the primates split from rodents but there was a massive burst of Alu insertions about 60 Myr ago correlating with a burst of retrotransposon insertions. There's nothing about that event that makes me suspect natural selection was involved. Why do YOU think so?

    5. "Alu elements are not selfish genes and they do not have a function" - surely you mean that nontransposing _Alu_ elements have no function. We see _Alu_ in large quantities in primate genomes because of a property of its base-pair sequence, which means that this particular sequence is replicated more often than another arbitrary sequence of 300bp. Surely that's the *definition* of having a selective advantage?

    6. @Larry
      You chose to include selfish genetic elements as functional sequences, and I think that non-autonomous elements, like Alu, qualify. Even though individual elements suffer mutational inactivation, the lineage has spread and survived a long time. The secondary structure of the transcripts, required for mobilization, appears to be conserved, which suggests the action of purifying selection.
      In fact, I don't see why you chose to distinguish between autonomous and non-autonomous elements. Both need to parasitize on the replication machinery present in the cell, and as long as they are capable of mobilisation they will be subject to natural selection for increased transmission rates. If the former qualify as functional DNA, so do the latter.

    7. @Corneel, @Yan Wong

      You both make a good point. If active transposons are functional by virtue of the fact they act as selfish DNA then those non-autonomous elements that are propagating in the genome should also be "functional." That would include the 100 or so active AluY sequences.

      I tentatively agree with you that this is a gray area in the function wars and there's no correct answer.

      But let's not lose sight of the fact that AluY elements make up only a minuscule percentage of the genome so we're quibbling about details.

      BTW, I don't think there's any good evidence that Alu elements are being selected to preserve the secondary structure of their RNA. Do you have a reference?

    8. The link to the reference is embedded in my previous post from November 26, 11:31:00 AM (under the word "this"). Here is the naked link too:
      True, this makes only a small dent in the amount of junk-DNA. Personally, I would include all selfish elements in the junk-DNA anyway, but I guess that's a matter of taste.