Sandwalk: Vertebrate Complexity Is Explained by the Evolution of Long-Range Interactions that Regulate Transcription?

Friday, November 01, 2013

Vertebrate Complexity Is Explained by the Evolution of Long-Range Interactions that Regulate Transcription?

The Deflated Ego Problem is a very serious problem in molecular biology. It refers to the fact that many molecular biologists were puzzled and upset to learn that humans have about the same number of genes as all other multicellular eukaryotes. The "problem" is often introduced by stating that the experts working on the human genome project expected at least 100,000 genes but were "shocked' when the first draft of the human genome showed only 30,000 genes (now down to about 25,000). This story is a myth as I document in: Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome. Truth is, most knowledgeable experts expected that humans would have about the same number of genes as other animals. They realized that the differences between fruit flies and humans, for example, didn't depend on a host of new human genes but on the timing and expression of a mostly common set of genes.

This isn't good enough for many human chauvinists. They are still looking for something special that sets human apart from all other animals. I listed seven possibilities in my post on the deflated ego problem:

1. Alternative Splicing: We may not have many more genes than a fruit fly but our genes can be rearranged in many different ways and this accounts for why we are much more complex. We have only 25,000 genes but through the magic of alternative splicing we can make 100,000 different proteins. That makes us almost ten times more complex than a fruit fly. (Assuming they don't do alternative splicing.)
2. Small RNAs: Scientists have miscalculated the number of genes by focusing only on protein encoding genes. Our genome actually contains tens of thousands of genes for small regulatory RNAs. These small RNA molecules combine in very complex ways to control the expression of the more traditional genes. This extra layer of complexity, not found in simple organisms, is what explains the Deflated Ego Problem.
3. Pseudogenes: The human genome contains thousands of apparently inactive genes called pseudogenes. Many of these genes are not extinct genes, as is commonly believed. Instead, they are genes-in-waiting. The complexity of humans is explained by invoking ways of tapping into this reserve to create new genes very quickly.
4. Transposons: The human genome is full of transposons but most scientists ignore them and don't count them in the number of genes. However, transposons are constantly jumping around in the genome and when they land next to a gene they can change it or cause it to be expressed differently. This vast pool of transposons makes our genome much more complicated than that of the simple species. This genome complexity is what's responsible for making humans more complex.
5. Regulatory Sequences: The human genome is huge compared to those of the simple species. All this extra DNA is due to increases in the number of regulatory sequences that control gene expression. We don't have many more protein-encoding regions but we have a much more complex system of regulating the expression of proteins. Thus, the fact that we are more complex than a fruit fly is not due to more genes but to more complex systems of regulation.
6. The Unspecified Anti-Junk Argument: We don't know exactly how to explain the Deflated Ego Problem but it must have something to do with so-called "junk" DNA. There's more and more evidence that junk DNA has a function. It's almost certain that there's something hidden in the extra-genic DNA that will explain our complexity. We'll find it eventually.
7. Post-translational Modification: Proteins can be extensively modified in various ways after they are synthesized. The modifications, such as phosphorylation, glycosylation, editing, etc., give rise to variants with different functions. In this way, the 25,000 primary protein products can actually be modified to make a set of enzymes with several hundred thousand different functions. That explains why we are so much more complicated than worms even though we have similar numbers of genes.

You can see how many of these positions are related to ongoing controversies in molecular biology, especially the debate over junk DNA. Keep in mind that what's behind that debate (junk DNA) is not only differing views about the strength and importance of natural selection but also a sense on behalf of some scientists that the "specialness" of humans requires a special explanation.

The latest contribution is in a recent issue of Nature that contains a series of "Insight" reviews on "Transcription and Epigenetics." The review I want to discuss is by de Laat and Duboule (2013). The abstract outlines a new hypothesis concerning the evolution of high-order chromatin structure.

How a complex animal can arise from a fertilized egg is one of the oldest and most fascinating questions of biology, the answer to which is encoded in the genome. Body shape and organ development, and their integration into a functional organism all depend on the precise expression of genes in space and time. The orchestration of transcription relies mostly on surrounding control sequences such as enhancers, millions of which form complex regulatory landscapes in the non-coding genome. Recent research shows that high-order chromosome structures make an important contribution to enhancer functionality by triggering their physical interactions with target genes.

The opening paragraph (below) makes it clear that they are discussing the deflated ego problem.

Access to animal genome sequences has revealed that the level of complexity of an organism does not relate to its number of genes. Mammals are more complex in morphology and behaviour than roundworms, but their genomes both contain around 20,000 genes. Various parameters can contribute to increased complexity, such as the extent of protein modifications or the diversity of splicing patterns. Pleiotropy is another possible contributor, whereby genes acquire multiple functional tasks at different times and places either during development or in adult life. In this case, gene regulation, rather than function, had to evolve to associate regulatory alternatives to particular genes. Although gene transcription is initiated at promoters, which recruit the basal transcription machinery, these sequences have little impact on transcription control during development and hence this latter task mostly relies on enhancers.

The authors claim that there are millions of enhancers in the human genome. If we take "millions" to mean just two million then there are, on average, one hundred enhancers per gene. This means that expression of each gene in our genome is regulated, on average, by the binding of 100 transcription factors to 100 transcription factor binding sites (= enhancers). Thus, a lot of our genome (40%) is devoted to regulation.

Enhancers are sequence modules that contain binding motifs for transcription factors. They are preferentially located in the non-coding part of the genome, at various distances from their target genes. In mammals, more than 95% of the genome is non-coding and large gene deserts can sometimes span several megabases. The recent development of high-throughput methods has made it possible to systematically search for enhancers; millions of such regulatory modules have been predicted, with 40% of our genome now estimated to carry some regulatory potential.

So far, there's nothing that makes humans, or vertebrates, special. After all, fruit flies and roundworms may also have 100 enhancers per gene. Here's where it gets interesting because the authors are proposing that vertebrates (mammals?) have evolved something special.

Evolution of mammalian enhancer landscapes

Vertebrate genomes are unique in that they contain large gene deserts with enhancers acting over distances in the megabase range (see ref. 9 for a review). Invertebrate species studied so far tend to have more local regulatory controls, which can often be recapitulated by short transgenes, such has been shown for the roundworm Caenorhabditis elegans. Admittedly, in Drosophila, gene regulation during development is complex, with multiple enhancers acting on individual genes and some loci controlled by series of intricate enhancers. However, these enhancer–promoter interactions generally occur over distances shorter than 50 kb.

The authors illustrate their claim with the figure shown below. I've modified it to focus on the main point; namely, that mammals have evolved the special ability to control gene expression using many transcription factors that can function at great distances from the promoter.

The diagram is a bit deceptive because it's not to scale. A typical mammalian gene has about 2000bp (2kb) of coding region (~exons¹). It includes about 20kb of intron sequences.² The authors are suggesting that expression of these genes is regulated by 100 enhancers that can act on the promoter at distances of more than 1000kb (1Mb). Bound transcription factors can contact the transcription initiation complex (i.e. RNA polymerase holoenzyme) at the promoter by forming a large loop of DNA. According to the review, the average lop size in mammals is 120kb or six times the size of a typical gene. The biggest loop discovered so far is 1,300kb.

It's true that genomes (prokaryotic and eukaryotic) are organized into large loops formed by proteins binding at Scaffold Attachment Regions (SARs) or Topological Associating Domains (TADs). It's true that a transcription factor could bind at an enhancer near the base of the loop and a promoter could be found at the base of the other side of the loop. If the loop is large, the two binding sites could be hundreds of kb apart. There are a few examples of such long-range interactions but they are more likely to be exceptions than rules.

In most cases the loops are more local. They form when a bound transcription factor contacts the transcription initiation complex. The idea is that binding the transcription factor anywhere in the vicinity of the promoter increases the local concentration making it likely that there will be contact between the transcription factor and the transcription complex. The classic example of a loop is the one formed by lac repressor when it contacts two separate operators (O₁ and O₂) [Repression of the lac Operon]. It's a model for all similar local loops. The various parameters were worked out about 25 years ago. Loop formation depends on the strength of the various protein-protein interactions and the strength of the DNA-protein interactions. The probability of forming a loop depends on the distance between the O₁ and O₂ binding sites, as you might imagine.

In the case of activators and transcription complexes, the further the enhancer is away from the promoter the lower the probability that a bound transcription factor will be able to find and recognize the promoter region (in a given length of time). When the sites are far apart, it's more likely that the transcription factor will interact with other proteins that are accidentally bound in the same region. It can't be a general rule that functional sites can be 120,000 base pairs from the transcriptional start site. That's too far for serious effects in the absence of other topological constraints.

The authors of this paper argue that higher-order chromosome structure mediates long-range interactions but I'm not convinced that this applies to most genes.

It also can't be a general rule that the average gene is regulated by one hundred transcription factor binding sites. That doesn't make any sense. Why would most genes need this level of control? Furthermore, if mammals have evolved a higher order chromatin structure that allows for one hundred different enhancers to act at a promoter even though they are spread out over 1,000kb, then how does that work? And how do genes distinguish between genuine enhancers and spurious sites that must be littered all over that region?

Drosophila melanogaster is at least as complex as a typical mammal. Or at least it's in the same ballpark. It's genome is only 5% the size of the human genome. Why would mammals have needed to evolve so many more enhancers and so much more functional DNA in order to do something that small fruit fly genomes can do very well?

I suggested that authors should insert the following statement at the end of their papers in order to make it clear how their proposals solve the problem they are addressing.

(I/we/the authors) believe that the Deflated Ego Problem is a real scientific problem. (I/we/the authors) propose that explanation number (1/2/3/4/5/6/7) will account for the fact that we have too few genes.

In this case it's #5 with a bit of a twist.

1. Exons also include 5′ and 3′ untranslated regions.

2. This value depends on a lot of assumptions and conflicting data but it's in the right ballpark.

de Laat, W. and Duboule, D. (2013) Topology of mammalian developmental enhancers and their regulatory landscapes. Nature 502:499-506. [doi: 10.1038/nature12753]

11 comments :

Anonymous said...: I dont think its a sign of an inflated ego to think mammals are more complex than flies. There are objective measures one could use such as cell type number, number of neurons or neural connectivity. If this is true then one or more of the seven possibilites list above must be the case.
One hundred enhancers per gene seems a bit much but I think its reasonable to think there are many higher levels of regulation leading to complex combinatorial control of genes/ TADs etc. I skimmed a review a few weeks ago on the tremendous complexity in TAD regulation in a single cell type and I got the impression it couldnt be explained entirely by activation/repression of loci within the TADs
I'm trying to think of observations that could confirm or refute this view. If its valid perhaps we should expect the genome to be refractory to even small perturbations. So all inversions or translocation should disrupt gene expression, even in regions that havent moved, and for no obvious reason.; Friday, November 01, 2013 2:11:00 PM
Jonathan Badger said...: The problem is "biological complexity" is not a well defined idea. Basically "complex" organisms are defined as ones most like humans and "simple" ones less like humans. Sure, you can find numbers like number of neurons that are higher in mammals than flies, but there are other numbers (such as numbers of individuals) that are higher in flies than mammals. It's entirely arbitrary which number is considered most important or "advanced".; Friday, November 01, 2013 4:30:00 PM
Fukuda said...: "I dont think its a sign of an inflated ego to think mammals are more complex than flies. There are objective measures one could use such as cell type number, number of neurons or neural connectivity. If this is true then one or more of the seven possibilites list above must be the case. "

What are you really measuring by counting the number of cell types? Complexity is a concept that appeals to human beings, but what is its biological significance? Mammalians seem to have a much larger and structurally complex brain than insects, despite this fact honeybees perform complex dancing rituals using a symbolic language among a complex social hierarchy, display visual and other types of memory and are able to plan their behavior beforehand (see Zhang, S et al. 2006 for instance). "simpler" insects like Drosophila display long term memory and complex rituals and are heavily used models in behavioral genetics. Behavioral traits that require a way larger brain in mammals to be developed.

But the most interesting part of this is most of these behaviors are directed by the same pathways and transcription factors we mammals use. As a relatively simple example, the PKA-CREB axis is used by both flies and mammals to store long term memory, while a more interesting and complex one is the use of FoxP transcription factors in social communication. FoxP2 is required for language development in humans, song development in songbirds, ultrasound emission in bats and honeybee dancing(its insect homolog, FoxP).

The same goes for other traits, as we share the same basic gene circuits for heart (Tinman in flies, Nkx2-5 in humans) and eye development (Pax6) with Drosophila. And the list just goes on and on. We share a lot of our basic transcriptional building blocks and their related traits with all bilateral animals since we split in the Cambrian explosion.

There is no surprise there. We are way closer genetically to other animals than most people think, and most changes between animal clades probably lie in differential transcriptional regulation, more than brand new genetic mechanisms appeared outta nowhere.

From an evolutionary point of view, the number of cell types needed for expressing a trait or accomplishing a task is frankly irrelevant as long as the trait is adaptive. Trying to assess the complexity of behavioral and ecological interactions would probably be a more correct way imho.; Friday, November 01, 2013 5:09:00 PM
Marcoli said...: I was chuckling over your list of 'serious' reasons to explain differences in complexity. Perhaps readers could offer a few more. I got one:
8. 'Higher' groups are peramorphic: Simpler animals are programmed to go through embryogenesis relatively quickly, and so have less time to develop additional levels of complexity. More complex groups like vertebrates may develop over days, but most take weeks or months. They add additional stages of development that are not identifiable in simpler forms of animals. If a nematode were genetically programmed to add additional developmental stages, taking a month to hatch from an egg, how much more complex might it be? What political party might it join?; Friday, November 01, 2013 6:18:00 PM
Robert Byers said...: From a YEC creationist view we are more complex because we are made in gods image. Our soul. Yet we would want our bodies to be just off the same racxk as animals in these matters.
It confirms a common blueprint from a creator for biology. Exactly what I would do if I was God(Remember Einsteins quote).
If from a creator then we should predict like atomic numbers for like needs in biology.
All biological nature is from common laws in nature with a twist at the high end.; Saturday, November 02, 2013 1:34:00 AM
SPARC said...: IMO the argument that "invertebrate species studied so far tend to have more local regulatory controls, which can often be recapitulated by short transgenes" isn't cogent because the majority of published transgenic mice were produced with relatively short promoter segments. In certain cases differences of transgene expression patterns from those of the corresponding endogneous genes may be caused by missing additional regulatory sequnences.However, more often they are caused by position effect variegation. I.e. they are inserted in the vicinity of enhancers or silencers that gain influence on the regulation of the transgene. To a degree this can be omitted by flanking transgenes with insulators.Longer constructs like BAC and YAC transgenes are less prone for such effects but IMO this is likely caused by isolation from distant effects caused by sequences around the insertion site. However, in many cases the same result could be achieved by inserting the transgene into a locus (e.g. LC1) that doesn't cause positional effects.; Saturday, November 02, 2013 2:26:00 AM
SPARC said...: The main problem with conventional transgenic mice is the fact that many mice derived from pro-nucleus injected oocytes don't express the transgene at all. In many cases this will be due to shortening of the transgene product before or during its insertion into the genome which employs the endogenous repair mechanisms of the cell. Thus, the fact that longer transgene constructs like BACs require less injections to obtain functional transgenes that behave like the corresponding endogenous gene may be due to the fact shortening of longer constructs is just happens further away from the sequences required for proper regulation of transgene expression.; Saturday, November 02, 2013 5:35:00 AM
Jem said...: "Yet we would want our bodies to be just off the same rack as animals in these matters."

If you're speaking as a Biblical literalist, wouldn't you have to concede that the Bible says the exact opposite, that humans were not created at the same time and as part of the same process as the animals?

A great deal of theology has been expended on the idea that mankind has a capacity for ensoulment lacking in (other) animals. If you're right, God could just have easily picked the fruit flies instead of the human race - picking would be a capricious act, not one dictated by our natures.; Sunday, November 03, 2013 8:54:00 AM
Robert Byers said...: We were created from the same blueprint/program and our separate creation is no more relevant then bugs from birds.
the only process is a quick common design and twists to make different kinds within a single equation of biology.
Like in physics. That guy doing the same thing.
Creationism would predict and welcome a shhared atomic structure with all biology or most of it.

Ensoulment was put simply in the best body for a soul. Fruitflies can't drive or skate!
It could only be we were given the best body type as we can't have our own body type unique to us. A soul has no physical manifestation.

The world seeing such diversity and believing in evolution would imagine at the atomic level a superior/inferior score in DNA.
However no difference, because no evolution, but only common design twisting at the end biology.
Of coarse the thread is not making a creationist point. Just me.; Monday, November 04, 2013 12:53:00 AM
TheOtherJim said...: ...and so in that mega-complex organism known as the ameoba....; Monday, November 04, 2013 3:20:00 AM
imr90 said...: #9. Only humans use credit cards.; Monday, November 04, 2013 2:37:00 PM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Friday, November 01, 2013

Vertebrate Complexity Is Explained by the Evolution of Long-Range Interactions that Regulate Transcription?

11 comments :