More Recent Comments

Friday, June 22, 2007

John Dennehy's Citation Classic for this Week

 
Check out The Evilutionary Biologist for This Week's Citation Classic. This paper is one of the most important papers in molecular biology. For most people it was the proof that DNA is the genetic material.

Hershey, A.D. and Chase, M. (1952) Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36:39-56.

Sandwalk Rating

 
What's My Blog Rated? From Mingle2 - Online Dating

Toronto City Council Approves Yellow Ribbons.

 
Yesterday's report of the deaths of three Canadian soldiers in Afghanistan stampeded Toronto City Council into approving the yellow ribbon decals on fire trucks and ambulances [What Does the "Support Our Troops" Ribbon mean to You?].

The press release from City Council says it all [Statement by the City of Toronto “Support Our Troops” ribbons to remain on Fire, EMS vehicles]. There's talk on the radio that the decals will also be placed on police cars. That's insane—I hope it's a false rumour.
Toronto City Council today voted unanimously to continue the ribbon campaign on Toronto Fire and EMS vehicles for an indefinite period of time as a show of support for the Canadian Forces Personnel Support Agency.

In recommending that City Council unanimously endorse the continuation of the ribbon campaign, Mayor David Miller said that all of Canada’s men and women serving in the military have the unwavering support of all Torontonians.

Toronto’s emergency services train and work closely with the Canadian Armed Forces. They share the common thread of personal sacrifice, dedication and professionalism with Canadian forces and the two services will continue to support and work with Canada’s military.

Circular University of Toronto

 
The main front campus of the University of Toronto is a popular place these days as parades of graduating students walk across it on their way to Convocation Hall.

This photograph was taken by Sam Javanrouh at daily dose of imagery [King's College Circle]. My building (Medical Sciences Building) is at 1 o'clock. The science library is at 10 o'clock and Convocation Hall is at 3 o'clock.

Evolution Ad Wins Award

 
No, unfortunately it's not that kind of evolution. This famous commercial produced by Ogilvy & Mather of Toronto, won the Grand Prix for viral marketing at Cannes as reported in today's Toronto Star [Dove's `Evolution' ad wins at Cannes]. It may not be biological evolution but the commercial sends a powerful message anyway.

Canadians Not So Sure About Evolution

 
A recent Angus Reid poll shows that only 59% of Canadians accept evolution [What does that Darwin know anyway?]. In Ontario that number drops to only 51%.

While the influence of Young Earth Creationism remains low (~25%), there are some surprising differences among the various regions in Canada. Only 9% of the people in Quebec are creationists but that number rises to 33% on the Prairies.

Evolution: Human
beings evolved from
less advanced life
forms over millions
of years

Creation: God
created human beings
in their present form
within the last
10,000 years



Not sure



British Columbia65%21%15%
Alberta58%28%14%
Manitoba/Saskatchewan  56%33%11%
Ontario51%26%23%
Quebec71%9%20%
Atlantic53%26%21%
Canada59%22%19%
United States*53%66%3%

*USA Today/Gallup Poll (separate questions)



It's clear we have our work cut out for us if we are to convince Canadians that evolution is the correct view. We need to work on converting that 19% who still aren't sure.

[Hat Tip: a gloating John Pieret]

Thursday, June 21, 2007

Tangled Bank #82

 
The latest version of Tangled Bank has been posted on Greg Laden's blog.

This is a review by Derwin Darwin II with the title: Various Proofs of the Theory of Evolution presented in original form by my uncle, the honorable Charles Darwin in the year 1859 and in subsequent years.

Very entertaining. Thanks Greg.

Wednesday, June 20, 2007

Evolutionary Biologists Flunk Religion Poll

 
In a follow-up to previous studies, Gregory W. Graffin and William B. Provine surveyed prominent evolutionary biologists to find out what they thought about religion. The results are summarized in the latest issue of American Scientst [Evolution, Religion and Free Will]. (Click on the figure to see a larger version where you can read the fine print.)
Our study was the first poll to focus solely on eminent evolutionists and their views of religion. As a dissertation project, one of us (Graffin) prepared and sent a detailed questionnaire on evolution and religion to 271 professional evolutionary scientists elected to membership in 28 honorific national academies around the world, and 149 (55 percent) answered the questionnaire. All of them listed evolution (specifically organismic), phylogenetics, population biology/genetics, paleontology/paleoecology/paleobiology, systematics, organismal adaptation or fitness as at least one of their research interests. Graffin also interviewed 12 prestigious evolutionists from the sample group on the relation between modern evolutionary biology and religion.

A primary complaint of scientists who answered the earlier polls was that the concept of God was limited to a "personal God." Leuba considered an impersonal God as equivalent to pure naturalism and classified advocates of deism as nonbelievers. We designed the current study to distinguish theism from deism—that is to day a "personal God" (theism) versus an "impersonal God" who created the universe, all forces and matter, but does not intervene in daily events (deism). An evolutionist can be considered religious, in our poll, if he calls himself a deist. ...

Perhaps the most revealing question in the poll asked the respondent to choose the letter that most closely represented where her views belonged on a ternary diagram. The great majority of the evolutionists polled (78 percent) chose A, billing themselves as pure naturalists. Only two out of 149 described themselves as full theists (F), two as more theist than naturalist (D) and three as theistic naturalists (B). Taken together, the advocacy of any degree of theism is the lowest percentage measured in any poll of biologists' beliefs so far (4.7 percent).

No evolutionary scientists in this study chose pure deism (I), but the deistic side of the diagram is heavy compared to the theistic side. Eleven respondents chose C, and 10 chose other regions on the right side of the diagram (E, H or J). Most evolutionary scientists who billed themselves as believers in God were deists (21) rather than theists (7).
When asked directly whether they believe in God, almost 80% said no. I wonder how many of them think of themselves as atheists as opposed to agnostics?

Here's the bad news. 79% of these eminent evolutionary biologists say they believe in free will (option A on the question). Even the authors of the study were surprised by that one.
We anticipated a much higher percentage for option B and a low percentage for A, but got just the opposite result. One of us (Provine) has been thinking about human free will for almost 40 years, has read most of the philosophical literature on the subject and polls his undergraduate evolution class (200-plus students) each year on belief in free will. Year after year, 90 percent or more favor the idea of human free will for a very specific reason: They think that if people make choices, they have free will. The professional debate about free will has moved far from this position, because what counts is whether the choice is free or determined, not whether human beings make choices. People and animals both certainly choose constantly. Comments from the evolutionists suggest that they were equating human choice and human free will. In other words, although eminent, our respondents had not thought about free will much beyond the students in introductory evolution classes. Evolutionary biology is increasingly applied to psychology. Belief in free will adds nothing to the science of human behavior.
There's one other surprise. 72% think that religion is part of evolution—it's an adaptation. One can only wonder what these evolutionary biologists think of themselves. Are they able to overcome their deterministic predisposition to God or are they mutants who lack the gene(s)? Maybe it explains why they believe in free will?

[Hat Tip: Denyse O'Leary]

Denyse O'Leary Has Advice for the Fans of Francis Collins

 
From Me?: Something against Francis Collins? No!.
On the other hand, I admit to deep disappointment in the intellectual substance of Collins’ arguments, which I unpack in the multipart review at Access Research Network.

Note to all, especially Collins fans: C. S. Lewis is not a security blanket, and the debate over the origin of free will, morality, altruism, and consciousness has moved on from his day. Today's atheist is not usually a genial, classical God-denier; he is a radical materialist who honestly believes that we are all just robots replicating our selfish genes. And he cannot wait to get his gospel onto the curriculum of publicly funded schools, as "evolutionary psychology," forcing everyone's nose into his nonsense.
Hmmm ... I don't know of very many radical materialist atheists who fall for evolutionary psychology. I wonder who she could be talking about? Maybe it's PZ?

Nobel Laureates: Richard Roberts and Phillip Sharp

 
The Nobel Prize in Physiology or Medicine 1993.

"for their discoveries of split genes"



Richard J. Roberts (1943 - ) and Phillip A. Sharp (1944 - ) received the Nobel Prize in Physiology or Medicine for their discovery of interrupted genes and splicing in eukaryotes [see RNA Splicing: Introns and Exons and Monday's Molecule #31].

Roberts and Sharp discovered that the genes in adenovirus were split into various segments that were combined during RNA processing. The results started to become widely known in 1975-76 and the key papers were published in 1977. Later this gene organization was found to be common in chromosomal eukaryotic genes. Unlike many Nobel Prize discoveries, this one really was revolutionary. Here's the presentation speech by Professor Bertil Daneholt of the Nobel Assembly of the Karolinska Institute.
Your Majesties, Your Royal Highnesses, Ladies and Gentlemen,

Why do children resemble their parents? This question has probably always fascinated humans, but not until the advent of natural science have we arrived at an increasingly satisfactory answer.

In the middle of the last century, the Austrian monk Gregor Mendel conducted his famous breeding experiments with the garden pea. He concluded that every trait of an individual plant is determined by a set of two genes, one obtained from each parental plant. To Mendel a gene was an abstract concept, which he used to interpret his breeding experiments. He had no idea of the physical properties of genes.

Only in the mid-1940s could it be established that in terms of chemistry, genetic material is composed of the nucleic acid DNA. About ten years later the double helical structure of DNA was revealed. Ever since then, progress within the field of molecular biology has been very rapid, and several Nobel prizes have been awarded in this area of research.

Initially, genetic material was studied mainly in simple organisms, particularly in bacteria and bacterial viruses. It was shown that a gene occurs in the form of a single continuous segment of the long, thread-like DNA, and it was generally assumed that the genes in all organisms looked this way. Therefore, it was a scientific sensation when this year's Nobel Laureates, Richard Roberts and Phillip Sharp, in 1977, independently of each other, observed that a gene in higher organisms could be present in the genetic material as several distinct and separate segments. Such a gene resembles a mosaic. Both Roberts and Sharp analyzed an upper respiratory virus, which is particularly suitable for studies of the genetic material in complex organisms. It soon became apparent that most genes in higher organisms, including ourselves, exhibited this mosaic structure.

Roberts' and Sharp's discovery opened up a new perspective on evolution, that is, on how simple organisms develop into more complex ones. Earlier it was believed that genes evolve mainly through the accumulation of small discrete changes in the genetic material. But their mosaic gene structure also permits higher organisms to restructure genes in another, more efficient way. This is because during the course of evolution, gene segments - the individual pieces of the mosaic - are regrouped in the genetic material, which creates new mosaic patterns and hence new genes. This reshuffling process presumably explains the rapid evolution of higher organisms.

Roberts and Sharp also predicted that a specific genetic mechanism is required to enable split genes to direct the synthesis of proteins and thereby to determine the properties of the cell. Researchers had known for many years that a gene contains detailed instructions on how to build a protein. This instruction is first copied from DNA to another type of nucleic acid, known as messenger RNA. Subsequently, the RNA instruction is read, and the protein is synthesized. What Roberts and Sharp were now stating was that the messenger RNA in higher organisms has to be edited. The required process, called splicing, resembles the work that a film editor performs: the unedited film is scrutinized, the superfluous parts are cut out and the remaining ones are joined to form the completed film. Messenger RNA treated in this manner contains only those parts that match the gene segments. It later turned out that the same parts of the original messenger RNA are not always saved during the editing- there are choices. This implies that splicing can regulate the function of the genetic material in a previously unknown way.

Roberts' and Sharp's discovery also helps us understand how diseases arise. One example is a form of anemia called thalassemia, which is due to inherited defects in the genetic material. Several of these defects cause errors in the editing process during splicing; thus, an abnormal messenger RNA is formed and subsequently also a protein that functions poorly or not at all.

The discovery of split genes was revolutionary, triggering an explosion of new scientific contributions. Today this discovery is of fundamental importance for research in biology as well as in medicine.

Dr. Richard Roberts and Dr. Phillip Sharp,

Your discovery of split genes led to the prediction of a new genetic process, that of RNA splicing. The discovery also changed our view of how genes in higher organisms develop during evolution. On behalf of the Nobel Assembly of the Karolinska Institute I wish to convey to you our warmest congratulations, and I now ask you to step forward to receive the Nobel Prize from the hands of His Majesty the King.

What Does the "Support Our Troops" Ribbon mean to You?

 
Since last October emergency vehicles in Toronto have been displaying a decal in support of our troops in Afghanistan. The decals were placed on the vehicles at the request of firefighters and paramedics, whose unions are strong supporters of the soldiers. The original deal was that the decals would stay on for one year and then be removed when the vehicles came in for routine maintenance this Fall.

The issue has turned into a hot political fight that will be decided today at a City Council meeting [Time limit for 'Support Our Troops' ribbons is up].

As you might imagine, there are some city councilors who want the decals to stay on the ambulances and fire trucks.
Some councilors believe the decision to remove the decals is a black mark on the city.

"I was stunned this morning to hear on the radio that some official at the city had ordered emergency services, particularly ambulances, to take off the decal that supports our troops in Afghanistan," city councilor Brian Aston told CTV News on Tuesday.

"These decals are on there and it makes a very strong statement. To take them off, Toronto is the largest city, would just be an outrage. It would be a black eye on the reputation of our city," Ashton said.
It should also come as no surprise that some councilors want to stick to the original agreement and remove the decals in September.
Coun. Janet Davis said just as many councillors want to see the decals removed as those who support their presence on emergency vehicles.

Mayor David Miller said while emergency crews should continue to support Canadian troops, the one-year time limit for the decals was enough time.

"It's controversial on both sides. There are people who see it as support for the troops and there are people who see it as support for war," Miller said.
I'm one of those who believe that the "Support the Troops" ribbon is a political statement. I don't know very many people who are opposed to the war but have this sticker on their car. On the surface it seems like a no-brainer to offer support to our troops while opposing the mission. But, in fact, the term "no-brainer" is quite appropriate in this case. By blindly advertising support for the military you obscure the true difficulty in making rational decisions about how to deploy our army. It's no secret that most people who "support our troops" are also conservatives who are in favour of the war.

The idea that the "Support Our Troops" yellow ribbons are politically neutral is something that only a supporter of the war would say. It's ridiculous. It would be like putting peace symbols on the trucks on the grounds that surely everyone supports peace.

I am very supportive of individual soldiers who are posted to Afghanistan. It's not their fault that our government is insane. They have to follow orders. But that does not mean that I "support our troops" in the way that the decal signifies. As a matter of fact, I do not support our mission in Afghanistan and I would withdraw the troops tomorrow if I could. Every soldier who dies in Afghanistan will have died in vain. That's hardly a way to offer support to our troops.

Having those decals on city vehicles sends the wrong message. For those of us who oppose the war it signifies that the fire fighters and paramedics are on the other side of the issue. That makes me uncomfortable since these are people who deserve my respect and admiration but they're not going to get it if they push a political agenda through advertising on their vehicles.

Take the decals off. It's no place for politics.

Tuesday, June 19, 2007

RNA Splicing: Introns and Exons

Most eukaryotic protein-encoding genes are interrupted. The coding regions are divided into numerous blocks called "exons" and the exons are separated by "introns."

An example is shown below. The triose phosphate isomerase (TPI) gene from maize is composed of 9 exons and 8 introns. (Triose phosphate isomerase is one of the enzymes in the glycolysis/gluconeogenesis pathway.)


The top line is a cartoon representation of the TPI gene with each exon in a different color. The thick gray lines between them represent the introns. The gene is transcribed from left (5′) to right (3′) beginning at the promoter (P). The long primary RNA transcript contains both intron and exon sequences. Subsequent processing of this primary transcript results in modification of the 5′ end by addition of an m7 GTP cap and modification of the 3′ end by addition of adenylate (A) residues to form the poly A tail. More importantly, the introns are spliced out and the exon sequences are fused to form the mature mRNA. This mRNA is then transported to the cytoplasm where it is translated into protein.

Note that all the coding regions in the exons (hatched) are contiguous in the mature mRNA. The relationship between the exons and the structure of the protein is shown on the right where the color of each segment of the protein corresponds to the color of the exons in the upper figure. There is no correlation between the exons and any protein domains or motifs. (It used to be thought that exons corresponded to domains in the protein.)

The splicing reaction is complicated. The cell must cleave the primary transcript at each end of the intron while holding on to the flanking exons so the chopped RNA transcript does not come apart. Then the two exons have to be joined together. For protein-encoding genes the splicing reactions are catalyzed by an RNA/protein complex called a spliceosome. In some cases, the introns can be thousands of nucleotides long—much longer than the exons.

Let's look at a simplified version of this reaction. The various components of the spliceosome have to assemble at the 5′ (left) end of an intron and at the 3′ end. There's a third site in the middle called the branch site. All three sites are identified by specific short sequences in the primary transcript as shown below.


These are the consensus sequences for vertebrates, including us. The splice site and branch site sequences in other species are similar but not identical.

In the first step of the splicing reaction, the various components of the spliceosome bind to the 5′ splice site, the 3′ splice site, and the branch site. Then the three complexes interact with each other to draw together the ends of the intron and position them near the branch site. This forms the spliceosome.

The first reaction involves an attack of the 2′ -OH group of the branch point adenylate residue on the 5′ splice site. This forms an intermediate where the branch site A residue is attached to three different ends of the primary transcript. The structure resembles a lariat or lasso. This is the structure depicted in Monday's Molecule #31.

Meanwhile, the 5′ end of the transcript is still bound to the spliceosome. This is important because it's about to be joined to the next exon and the reaction wouldn't work if the 5′ end were released following the first cleavage reaction.

In the next step, the spliceosome catalyzes the attack of the -OH group at the end of the 5′ exon on the 3′ splice site. This results in cleavage of the 3′ intron/exon junction and joining of the 5′ exon to the 3′ exon. The intron sequence (dark brown) is released as a lariat (looped) structure.

The two reactions are known as transesterification reactions because they require the breaking of one strand of RNA and formation of a new ester linkage. The details are not very important. What's important is to recognize that splicing depends on the correct interaction between the components of the spliceosome and the 5′ and 3′ splice site sequences (and the branch site).

These interactions are mediated by small RNAs that are bound to the spliceosome proteins. These RNAs are called small nuclear RNAs (snRNAs) and they're one example of a host of small RNAs produced by non-protein encoding genes. The snRNA/protein complexes are called small nuclear ribonuclear proteins or snRNPs (snurps).

The snRNAs are complimentary to the splice sites and branch sites and that's how the various snRNPs recognize them. This interaction is very weak since it depends on only three or four base pairs. It can be even less since there are many slice sites that are not perfect matches to the consensus sequences shown above. The relative lack of significant sequence similarity makes splicing a very error-prone reaction.

U1 snRNP recognizes 5′ splice sites, U2 snRNP binds to the branch site, and U5 snRNP binds to the 3′ splice site. A more detailed description of the formation of the splicesome is shown below.







What is a gene, post-ENCODE?

Back in January we had a discussion about the definition of a gene [What is a gene?]. At that time I presented my personal preference for the best definition of a gene.
A gene is a DNA sequence that is transcribed to produce a functional product.
This is a definition that's widely shared among biochemists and molecular biologists but there are competing definitions.

Now, there's a new kid on the block. The recent publication of a slew of papers from the ENCODE project has prompted many of the people involved to proclaim that a revolution is under way. Part of the revolution includes redefining a gene. I'd like to discuss the paper by Mark Gerstein et al. (2007) [What is a gene, post-ENCODE? History and updated definition] to see what this revolution is all about.

The ENCODE project is a large scale attempt to analyze and annotate the human genome. The first results focus on about 1% of the genome spread out over 44 segments. These results have been summarized in an extraordinarily complex Nature paper with massive amounts of supplementary material (The Encode Project Consortium, 2007). The Nature paper is supported by dozens of other papers in various journals. Ryan Gregory has a list of blog references to these papers at ENCODE links.

I haven't yet digested the published results. I suspect that like most bloggers there's just too much there to comment on without investing a great deal of time and effort. I'm going to give it a try but it will require a lot of introductory material, beginning with the concept of alternative splicing, which is this week's theme.

The most widely publicized result is that most of the human genome is transcribed. It might be more correct to say that the ENCODE Project detected RNA's that are either complimentary to much of the human genome or lead to the inference that much of it is transcribed.

This is not news. We've known about this kind of data for 15 years and it's one of the reasons why many scientists over-estimated the number of humans genes in the decade leading up to the publication of the human genome sequence. The importance of the ENCODE project is that a significant fraction of the human genome has been analyzed in detail (1%) and that the group made some serious attempts to find out whether the transcripts really represent functional RNAs.

My initial impression is that they have failed to demonstrate that the rare transcripts of junk DNA are anything other than artifacts or accidents. It's still an open question as far as I'm concerned.

It's not an open question as far as the members of the ENCODE Project are concerned and that brings us to the new definition of a gene. Here's how Gerstein et al. (2007) define the problem.
The ENCODE consortium recently completed its characterization of 1% of the human genome by various high-throughput experimental and computational techniques designed to characterize functional elements (The ENCODE Project Consortium 2007). This project represents a major milestone in the characterization of the human genome, and the current findings show a striking picture of complex molecular activity. While the landmark human genome sequencing surprised many with the small number (relative to simpler organisms) of protein-coding genes that sequence annotators could identify (~21,000, according to the latest estimate [see www.ensembl.org]), ENCODE highlighted the number and complexity of the RNA transcripts that the genome produces. In this regard, ENCODE has changed our view of "what is a gene" considerably more than the sequencing of the Haemophilus influenza and human genomes did (Fleischmann et al. 1995; Lander et al. 2001; Venter et al. 2001). The discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome prompts us to reconsider now what a gene is.
Keep in mind that I personally reject the premise and I don't think I'm alone. As far as I'm concerned, the "extensive transcriptional activity" could be artifact and I haven't had a "protein-centric" view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967. Even if the ENCODE results are correct my preferred definition of a gene is not threatened. So, what's the fuss all about?

Regulatory Sequences
Gerstein et al. are worried because many definitions of a gene include regulatory sequences. Their results suggest that many genes have multiple large regions that control transcription and these may be located at some distance from the transcription start site. This isn't a problem if regulatory sequences are not part of the gene, as in the definition quoted above (a gene is a transcribed region). As a mater of fact, the fuzziness of control regions is one reason why most modern definitions of a gene don't include them.
Overlapping Genes
According to Gerstein et al.
As genes, mRNAs, and eventually complete genomes were sequenced, the simple operon model turned out to be applicable only to genes of prokaryotes and their phages. Eukaryotes were different in many respects, including genetic organization and information flow. The model of genes as hereditary units that are nonoverlapping and continuous was shown to be incorrect by the precise mapping of the coding sequences of genes. In fact, some genes have been found to overlap one another, sharing the same DNA sequence in a different reading frame or on the opposite strand. The discontinuous structure of genes potentially allows one gene to be completely contained inside another one’s intron, or one gene to overlap with another on the same strand without sharing any exons or regulatory elements.
We've known about overlapping genes ever since the sequences of the first bacterial operons and the first phage genomes were published. We've known about all the other problems for 20 years. There's nothing new here. No definition of a gene is perfect—all of them have exceptions that are difficult to squeeze into a one-size-fits-all definition of a gene. The problem with the ENCODE data is not that they've just discovered overlapping genes, it's that their data suggests that overlapping genes in the human genome are more the rule than the exception. We need more information before accepting this conclusion and redefining the concept of a gene based on analysis of the human genome.
Splicing
Splicing was discovered in 1977 (Berget et al. 1977; Chow et al. 1977; Gelinas and Roberts 1977). It soon became clear that the gene was not a simple unit of heredity or function, but rather a series of exons, coding for, in some cases, discrete protein domains, and separated by long noncoding stretches called introns. With alternative splicing, one genetic locus could code for multiple different mRNA transcripts. This discovery complicated the concept of the gene radically.
Perhaps back in 1978 the discovery of splicing prompted a re-evaluation of the concept of a gene. That was almost 30 years ago and we've moved on. Now, many of us think of a gene as a region of DNA that's transcribed and this includes exons and introns. In fact, the modern definition doesn't have anything to do with proteins.

Alternative splicing does present a problem if you want a rigorous definition with no fuzziness. But biology isn't like that. It's messy and you can't get rid of fuzziness. I think of a gene as the region of DNA that includes the longest transcript. Genes can produce multiple protein products by alternative splicing. (The fact that the definition above says "a" functional product shouldn't mislead anyone. That was not meant to exclude multiple products.)

The real problem here is that the ENCODE project predicts that alternative splicing is abundant and complex. They claim to have discovered many examples of splice variants that include exons from adjacent genes as shown in the figure from their paper. Each of the lines below the genome represents a different kind of transcript. You can see that there are many transcripts that include exons from "gene 1" and "gene 2" and another that include exons from "gene 1" and "gene 4." The combinations and permutations are extraordinarily complex.

If this represents the true picture of gene expression in the human genome, then it would require a radical rethinking of what we know about molecular biology and evolution. On the other hand, if it's mostly artifact then there's no revolution under way. The issue has been fought out in the scientific literature over the past 20 years and it hasn't been resolved to anyone's satisfaction. As far as I'm concerned the data overwhelmingly suggests that very little of that complexity is real. Alternative splicing exists but not the kind of alternative splicing shown in the figure. In my opinion, that kind of complexity is mostly an artifact due to spurious transcription and splicing errors.
Trans-splicing
Trans-splicing refers to a phenomenon where the transcript from one part of the genome is attached to the transcript from another part of the genome. The phenomenon has been known for over 20 years—it's especially common in C. elegans. It's another exception to the rule. No simple definition of a gene can handle it.
Parasitic and mobile genes
This refers mostly to transposons. Gerstein et al say, "Transposons have altered our view of the gene by demonstrating that a gene is not fixed in its location." This isn't true. Nobody has claimed that the location of genes is fixed.
The large amount of "junk DNA" under selection
If a large amount of what we now think of as junk DNA turns out to be transcribed to produce functional RNA (or proteins) then that will be a genuine surprise to some of us. It won't change the definition of a gene as far as I can see.
The paper goes on for many more pages but the essential points are covered above. What's the bottom line? The new definition of an ENCODE gene is:
There are three aspects to the definition that we will list below, before providing the succinct definition:
  1. A gene is a genomic sequence (DNA or RNA) directly encoding functional product molecules, either RNA or protein.
  2. In the case that there are several functional products sharing overlapping regions, one takes the union of all overlapping genomic sequences coding for them.
  3. This union must be coherent—i.e., done separately for final protein and RNA products—but does not require that all products necessarily share a common subsequence.
This can be concisely summarized as:
The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
On the surface this doesn't seem to be much different from the definition of a gene as a transcribed region but there are subtle differences. The authors describe how their new definition works using a hypothetical example.

How the proposed definition of the gene can be applied to a sample case. A genomic region produces three primary transcripts. After alternative splicing, products of two of these encode five protein products, while the third encodes for a noncoding RNA (ncRNA) product. The protein products are encoded by three clusters of DNA sequence segments (A, B, and C; D; and E). In the case of the three-segment cluster (A, B, C), each DNA sequence segment is shared by at least two of the products. Two primary transcripts share a 5' untranslated region, but their translated regions D and E do not overlap. There is also one noncoding RNA product, and because its sequence is of RNA, not protein, the fact that it shares its genomic sequences (X and Y) with the protein-coding genomic segments A and E does not make it a co-product of these protein-coding genes. In summary, there are four genes in this region, and they are the sets of sequences shown inside the orange dashed lines: Gene 1 consists of the sequence segments A, B, and C; gene 2 consists of D; gene 3 of E; and gene 4 of X and Y. In the diagram, for clarity, the exonic and protein sequences A and E have been lined up vertically, so the dashed lines for the spliced transcripts and functional products indicate connectivity between the proteins sequences (ovals) and RNA sequences (boxes). (Solid boxes on transcripts) Untranslated sequences, (open boxes) translated sequences.
This isn't much different from my preferred definition except that I would have called the region containing exons C and D a single gene with two different protein products. Gerstein et al (2007) split it into two different genes.

The bottom line is that in spite of all the rhetoric the "new" definition of a gene isn't much different from the old one that some of us have been using for a couple of decades. It's different from some old definitions that other scientists still prefer but this isn't revolutionary. That discussion has already been going on since 1980.

Let me close by making one further point. The "data" produced by the ENCODE consortium is intriguing but it would be a big mistake to conclude that everything they say is a proven fact. Skepticism about the relevance of those extra transcripts is quite justified as is skepticism about the frequency of alternative splicing.


Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Weissman, S. and Snyder, M. (2007) What is a gene, post-ENCODE? History and updated definition. Genome Res. 17:669-681.

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

[Hat Tip: Michael White at Adaptive Complexity]

Monday, June 18, 2007

Gene Genie #9

 
The latest issue of the carnival Gene Genie has just been posted on DNAdirect talk.

Skepticism About "Out-of-Africa"

 
Alan R. Templeton has long been a critic of those who would over-interpret the genetic data on human origins. He's not alone. There are a surprisingly large number of biologists who refuse to jump on the "Out-of-Africa" bandwagon. This group does not get the same amount of publicity as the advocates of a recent (<100,000 years) wave of migration out of Africa. I think it's because skepticism of a new theory is seen as sour grapes. That, plus the fact that it's hard to publish criticisms of work that's already in the scientific literature.

An upcoming issue of the journal Evolution will contain a review by Templeton on human origins and the Out-of-Africa theory. Right now it's only available online [GENETICS AND RECENT HUMAN EVOLUTION]. Here's the abstract,
Starting with "mitochondrial Eve" in 1987, genetics has played an increasingly important role in studies of the last two million years of human evolution. It initially appeared that genetic data resolved the basic models of recent human evolution in favor of the "out-of-Africa replacement" hypothesis in which anatomically modern humans evolved in Africa about 150,000 years ago, started to spread throughout the world about 100,000 years ago, and subsequently drove to complete genetic extinction (replacement) all other human populations in Eurasia. Unfortunately, many of the genetic studies on recent human evolution have suffered from scientific flaws, including misrepresenting the models of recent human evolution, focusing upon hypothesis compatibility rather than hypothesis testing, committing the ecological fallacy, and failing to consider a broader array of alternative hypotheses. Once these flaws are corrected, there is actually little genetic support for the out-of-Africa replacement hypothesis. Indeed, when genetic data are used in a hypothesis-testing framework, the out-of-Africa replacement hypothesis is strongly rejected. The model of recent human evolution that emerges from a statistical hypothesis-testing framework does not correspond to any of the traditional models of human evolution, but it is compatible with fossil and archaeological data. These studies also reveal that any one gene or DNA region captures only a small part of human evolutionary history, so multilocus studies are essential. As more and more loci became available, genetics will undoubtedly offer additional insights and resolutions of human evolution.

[Hat Tip: Gene Expression]