More Recent Comments

Thursday, June 21, 2007

Tangled Bank #82

 
The latest version of Tangled Bank has been posted on Greg Laden's blog.

This is a review by Derwin Darwin II with the title: Various Proofs of the Theory of Evolution presented in original form by my uncle, the honorable Charles Darwin in the year 1859 and in subsequent years.

Very entertaining. Thanks Greg.

Wednesday, June 20, 2007

Evolutionary Biologists Flunk Religion Poll

 
In a follow-up to previous studies, Gregory W. Graffin and William B. Provine surveyed prominent evolutionary biologists to find out what they thought about religion. The results are summarized in the latest issue of American Scientst [Evolution, Religion and Free Will]. (Click on the figure to see a larger version where you can read the fine print.)
Our study was the first poll to focus solely on eminent evolutionists and their views of religion. As a dissertation project, one of us (Graffin) prepared and sent a detailed questionnaire on evolution and religion to 271 professional evolutionary scientists elected to membership in 28 honorific national academies around the world, and 149 (55 percent) answered the questionnaire. All of them listed evolution (specifically organismic), phylogenetics, population biology/genetics, paleontology/paleoecology/paleobiology, systematics, organismal adaptation or fitness as at least one of their research interests. Graffin also interviewed 12 prestigious evolutionists from the sample group on the relation between modern evolutionary biology and religion.

A primary complaint of scientists who answered the earlier polls was that the concept of God was limited to a "personal God." Leuba considered an impersonal God as equivalent to pure naturalism and classified advocates of deism as nonbelievers. We designed the current study to distinguish theism from deism—that is to day a "personal God" (theism) versus an "impersonal God" who created the universe, all forces and matter, but does not intervene in daily events (deism). An evolutionist can be considered religious, in our poll, if he calls himself a deist. ...

Perhaps the most revealing question in the poll asked the respondent to choose the letter that most closely represented where her views belonged on a ternary diagram. The great majority of the evolutionists polled (78 percent) chose A, billing themselves as pure naturalists. Only two out of 149 described themselves as full theists (F), two as more theist than naturalist (D) and three as theistic naturalists (B). Taken together, the advocacy of any degree of theism is the lowest percentage measured in any poll of biologists' beliefs so far (4.7 percent).

No evolutionary scientists in this study chose pure deism (I), but the deistic side of the diagram is heavy compared to the theistic side. Eleven respondents chose C, and 10 chose other regions on the right side of the diagram (E, H or J). Most evolutionary scientists who billed themselves as believers in God were deists (21) rather than theists (7).
When asked directly whether they believe in God, almost 80% said no. I wonder how many of them think of themselves as atheists as opposed to agnostics?

Here's the bad news. 79% of these eminent evolutionary biologists say they believe in free will (option A on the question). Even the authors of the study were surprised by that one.
We anticipated a much higher percentage for option B and a low percentage for A, but got just the opposite result. One of us (Provine) has been thinking about human free will for almost 40 years, has read most of the philosophical literature on the subject and polls his undergraduate evolution class (200-plus students) each year on belief in free will. Year after year, 90 percent or more favor the idea of human free will for a very specific reason: They think that if people make choices, they have free will. The professional debate about free will has moved far from this position, because what counts is whether the choice is free or determined, not whether human beings make choices. People and animals both certainly choose constantly. Comments from the evolutionists suggest that they were equating human choice and human free will. In other words, although eminent, our respondents had not thought about free will much beyond the students in introductory evolution classes. Evolutionary biology is increasingly applied to psychology. Belief in free will adds nothing to the science of human behavior.
There's one other surprise. 72% think that religion is part of evolution—it's an adaptation. One can only wonder what these evolutionary biologists think of themselves. Are they able to overcome their deterministic predisposition to God or are they mutants who lack the gene(s)? Maybe it explains why they believe in free will?

[Hat Tip: Denyse O'Leary]

Denyse O'Leary Has Advice for the Fans of Francis Collins

 
From Me?: Something against Francis Collins? No!.
On the other hand, I admit to deep disappointment in the intellectual substance of Collins’ arguments, which I unpack in the multipart review at Access Research Network.

Note to all, especially Collins fans: C. S. Lewis is not a security blanket, and the debate over the origin of free will, morality, altruism, and consciousness has moved on from his day. Today's atheist is not usually a genial, classical God-denier; he is a radical materialist who honestly believes that we are all just robots replicating our selfish genes. And he cannot wait to get his gospel onto the curriculum of publicly funded schools, as "evolutionary psychology," forcing everyone's nose into his nonsense.
Hmmm ... I don't know of very many radical materialist atheists who fall for evolutionary psychology. I wonder who she could be talking about? Maybe it's PZ?

Nobel Laureates: Richard Roberts and Phillip Sharp

 
The Nobel Prize in Physiology or Medicine 1993.

"for their discoveries of split genes"



Richard J. Roberts (1943 - ) and Phillip A. Sharp (1944 - ) received the Nobel Prize in Physiology or Medicine for their discovery of interrupted genes and splicing in eukaryotes [see RNA Splicing: Introns and Exons and Monday's Molecule #31].

Roberts and Sharp discovered that the genes in adenovirus were split into various segments that were combined during RNA processing. The results started to become widely known in 1975-76 and the key papers were published in 1977. Later this gene organization was found to be common in chromosomal eukaryotic genes. Unlike many Nobel Prize discoveries, this one really was revolutionary. Here's the presentation speech by Professor Bertil Daneholt of the Nobel Assembly of the Karolinska Institute.
Your Majesties, Your Royal Highnesses, Ladies and Gentlemen,

Why do children resemble their parents? This question has probably always fascinated humans, but not until the advent of natural science have we arrived at an increasingly satisfactory answer.

In the middle of the last century, the Austrian monk Gregor Mendel conducted his famous breeding experiments with the garden pea. He concluded that every trait of an individual plant is determined by a set of two genes, one obtained from each parental plant. To Mendel a gene was an abstract concept, which he used to interpret his breeding experiments. He had no idea of the physical properties of genes.

Only in the mid-1940s could it be established that in terms of chemistry, genetic material is composed of the nucleic acid DNA. About ten years later the double helical structure of DNA was revealed. Ever since then, progress within the field of molecular biology has been very rapid, and several Nobel prizes have been awarded in this area of research.

Initially, genetic material was studied mainly in simple organisms, particularly in bacteria and bacterial viruses. It was shown that a gene occurs in the form of a single continuous segment of the long, thread-like DNA, and it was generally assumed that the genes in all organisms looked this way. Therefore, it was a scientific sensation when this year's Nobel Laureates, Richard Roberts and Phillip Sharp, in 1977, independently of each other, observed that a gene in higher organisms could be present in the genetic material as several distinct and separate segments. Such a gene resembles a mosaic. Both Roberts and Sharp analyzed an upper respiratory virus, which is particularly suitable for studies of the genetic material in complex organisms. It soon became apparent that most genes in higher organisms, including ourselves, exhibited this mosaic structure.

Roberts' and Sharp's discovery opened up a new perspective on evolution, that is, on how simple organisms develop into more complex ones. Earlier it was believed that genes evolve mainly through the accumulation of small discrete changes in the genetic material. But their mosaic gene structure also permits higher organisms to restructure genes in another, more efficient way. This is because during the course of evolution, gene segments - the individual pieces of the mosaic - are regrouped in the genetic material, which creates new mosaic patterns and hence new genes. This reshuffling process presumably explains the rapid evolution of higher organisms.

Roberts and Sharp also predicted that a specific genetic mechanism is required to enable split genes to direct the synthesis of proteins and thereby to determine the properties of the cell. Researchers had known for many years that a gene contains detailed instructions on how to build a protein. This instruction is first copied from DNA to another type of nucleic acid, known as messenger RNA. Subsequently, the RNA instruction is read, and the protein is synthesized. What Roberts and Sharp were now stating was that the messenger RNA in higher organisms has to be edited. The required process, called splicing, resembles the work that a film editor performs: the unedited film is scrutinized, the superfluous parts are cut out and the remaining ones are joined to form the completed film. Messenger RNA treated in this manner contains only those parts that match the gene segments. It later turned out that the same parts of the original messenger RNA are not always saved during the editing- there are choices. This implies that splicing can regulate the function of the genetic material in a previously unknown way.

Roberts' and Sharp's discovery also helps us understand how diseases arise. One example is a form of anemia called thalassemia, which is due to inherited defects in the genetic material. Several of these defects cause errors in the editing process during splicing; thus, an abnormal messenger RNA is formed and subsequently also a protein that functions poorly or not at all.

The discovery of split genes was revolutionary, triggering an explosion of new scientific contributions. Today this discovery is of fundamental importance for research in biology as well as in medicine.

Dr. Richard Roberts and Dr. Phillip Sharp,

Your discovery of split genes led to the prediction of a new genetic process, that of RNA splicing. The discovery also changed our view of how genes in higher organisms develop during evolution. On behalf of the Nobel Assembly of the Karolinska Institute I wish to convey to you our warmest congratulations, and I now ask you to step forward to receive the Nobel Prize from the hands of His Majesty the King.

What Does the "Support Our Troops" Ribbon mean to You?

 
Since last October emergency vehicles in Toronto have been displaying a decal in support of our troops in Afghanistan. The decals were placed on the vehicles at the request of firefighters and paramedics, whose unions are strong supporters of the soldiers. The original deal was that the decals would stay on for one year and then be removed when the vehicles came in for routine maintenance this Fall.

The issue has turned into a hot political fight that will be decided today at a City Council meeting [Time limit for 'Support Our Troops' ribbons is up].

As you might imagine, there are some city councilors who want the decals to stay on the ambulances and fire trucks.
Some councilors believe the decision to remove the decals is a black mark on the city.

"I was stunned this morning to hear on the radio that some official at the city had ordered emergency services, particularly ambulances, to take off the decal that supports our troops in Afghanistan," city councilor Brian Aston told CTV News on Tuesday.

"These decals are on there and it makes a very strong statement. To take them off, Toronto is the largest city, would just be an outrage. It would be a black eye on the reputation of our city," Ashton said.
It should also come as no surprise that some councilors want to stick to the original agreement and remove the decals in September.
Coun. Janet Davis said just as many councillors want to see the decals removed as those who support their presence on emergency vehicles.

Mayor David Miller said while emergency crews should continue to support Canadian troops, the one-year time limit for the decals was enough time.

"It's controversial on both sides. There are people who see it as support for the troops and there are people who see it as support for war," Miller said.
I'm one of those who believe that the "Support the Troops" ribbon is a political statement. I don't know very many people who are opposed to the war but have this sticker on their car. On the surface it seems like a no-brainer to offer support to our troops while opposing the mission. But, in fact, the term "no-brainer" is quite appropriate in this case. By blindly advertising support for the military you obscure the true difficulty in making rational decisions about how to deploy our army. It's no secret that most people who "support our troops" are also conservatives who are in favour of the war.

The idea that the "Support Our Troops" yellow ribbons are politically neutral is something that only a supporter of the war would say. It's ridiculous. It would be like putting peace symbols on the trucks on the grounds that surely everyone supports peace.

I am very supportive of individual soldiers who are posted to Afghanistan. It's not their fault that our government is insane. They have to follow orders. But that does not mean that I "support our troops" in the way that the decal signifies. As a matter of fact, I do not support our mission in Afghanistan and I would withdraw the troops tomorrow if I could. Every soldier who dies in Afghanistan will have died in vain. That's hardly a way to offer support to our troops.

Having those decals on city vehicles sends the wrong message. For those of us who oppose the war it signifies that the fire fighters and paramedics are on the other side of the issue. That makes me uncomfortable since these are people who deserve my respect and admiration but they're not going to get it if they push a political agenda through advertising on their vehicles.

Take the decals off. It's no place for politics.

Tuesday, June 19, 2007

RNA Splicing: Introns and Exons

Most eukaryotic protein-encoding genes are interrupted. The coding regions are divided into numerous blocks called "exons" and the exons are separated by "introns."

An example is shown below. The triose phosphate isomerase (TPI) gene from maize is composed of 9 exons and 8 introns. (Triose phosphate isomerase is one of the enzymes in the glycolysis/gluconeogenesis pathway.)


The top line is a cartoon representation of the TPI gene with each exon in a different color. The thick gray lines between them represent the introns. The gene is transcribed from left (5′) to right (3′) beginning at the promoter (P). The long primary RNA transcript contains both intron and exon sequences. Subsequent processing of this primary transcript results in modification of the 5′ end by addition of an m7 GTP cap and modification of the 3′ end by addition of adenylate (A) residues to form the poly A tail. More importantly, the introns are spliced out and the exon sequences are fused to form the mature mRNA. This mRNA is then transported to the cytoplasm where it is translated into protein.

Note that all the coding regions in the exons (hatched) are contiguous in the mature mRNA. The relationship between the exons and the structure of the protein is shown on the right where the color of each segment of the protein corresponds to the color of the exons in the upper figure. There is no correlation between the exons and any protein domains or motifs. (It used to be thought that exons corresponded to domains in the protein.)

The splicing reaction is complicated. The cell must cleave the primary transcript at each end of the intron while holding on to the flanking exons so the chopped RNA transcript does not come apart. Then the two exons have to be joined together. For protein-encoding genes the splicing reactions are catalyzed by an RNA/protein complex called a spliceosome. In some cases, the introns can be thousands of nucleotides long—much longer than the exons.

Let's look at a simplified version of this reaction. The various components of the spliceosome have to assemble at the 5′ (left) end of an intron and at the 3′ end. There's a third site in the middle called the branch site. All three sites are identified by specific short sequences in the primary transcript as shown below.


These are the consensus sequences for vertebrates, including us. The splice site and branch site sequences in other species are similar but not identical.

In the first step of the splicing reaction, the various components of the spliceosome bind to the 5′ splice site, the 3′ splice site, and the branch site. Then the three complexes interact with each other to draw together the ends of the intron and position them near the branch site. This forms the spliceosome.

The first reaction involves an attack of the 2′ -OH group of the branch point adenylate residue on the 5′ splice site. This forms an intermediate where the branch site A residue is attached to three different ends of the primary transcript. The structure resembles a lariat or lasso. This is the structure depicted in Monday's Molecule #31.

Meanwhile, the 5′ end of the transcript is still bound to the spliceosome. This is important because it's about to be joined to the next exon and the reaction wouldn't work if the 5′ end were released following the first cleavage reaction.

In the next step, the spliceosome catalyzes the attack of the -OH group at the end of the 5′ exon on the 3′ splice site. This results in cleavage of the 3′ intron/exon junction and joining of the 5′ exon to the 3′ exon. The intron sequence (dark brown) is released as a lariat (looped) structure.

The two reactions are known as transesterification reactions because they require the breaking of one strand of RNA and formation of a new ester linkage. The details are not very important. What's important is to recognize that splicing depends on the correct interaction between the components of the spliceosome and the 5′ and 3′ splice site sequences (and the branch site).

These interactions are mediated by small RNAs that are bound to the spliceosome proteins. These RNAs are called small nuclear RNAs (snRNAs) and they're one example of a host of small RNAs produced by non-protein encoding genes. The snRNA/protein complexes are called small nuclear ribonuclear proteins or snRNPs (snurps).

The snRNAs are complimentary to the splice sites and branch sites and that's how the various snRNPs recognize them. This interaction is very weak since it depends on only three or four base pairs. It can be even less since there are many slice sites that are not perfect matches to the consensus sequences shown above. The relative lack of significant sequence similarity makes splicing a very error-prone reaction.

U1 snRNP recognizes 5′ splice sites, U2 snRNP binds to the branch site, and U5 snRNP binds to the 3′ splice site. A more detailed description of the formation of the splicesome is shown below.







What is a gene, post-ENCODE?

Back in January we had a discussion about the definition of a gene [What is a gene?]. At that time I presented my personal preference for the best definition of a gene.
A gene is a DNA sequence that is transcribed to produce a functional product.
This is a definition that's widely shared among biochemists and molecular biologists but there are competing definitions.

Now, there's a new kid on the block. The recent publication of a slew of papers from the ENCODE project has prompted many of the people involved to proclaim that a revolution is under way. Part of the revolution includes redefining a gene. I'd like to discuss the paper by Mark Gerstein et al. (2007) [What is a gene, post-ENCODE? History and updated definition] to see what this revolution is all about.

The ENCODE project is a large scale attempt to analyze and annotate the human genome. The first results focus on about 1% of the genome spread out over 44 segments. These results have been summarized in an extraordinarily complex Nature paper with massive amounts of supplementary material (The Encode Project Consortium, 2007). The Nature paper is supported by dozens of other papers in various journals. Ryan Gregory has a list of blog references to these papers at ENCODE links.

I haven't yet digested the published results. I suspect that like most bloggers there's just too much there to comment on without investing a great deal of time and effort. I'm going to give it a try but it will require a lot of introductory material, beginning with the concept of alternative splicing, which is this week's theme.

The most widely publicized result is that most of the human genome is transcribed. It might be more correct to say that the ENCODE Project detected RNA's that are either complimentary to much of the human genome or lead to the inference that much of it is transcribed.

This is not news. We've known about this kind of data for 15 years and it's one of the reasons why many scientists over-estimated the number of humans genes in the decade leading up to the publication of the human genome sequence. The importance of the ENCODE project is that a significant fraction of the human genome has been analyzed in detail (1%) and that the group made some serious attempts to find out whether the transcripts really represent functional RNAs.

My initial impression is that they have failed to demonstrate that the rare transcripts of junk DNA are anything other than artifacts or accidents. It's still an open question as far as I'm concerned.

It's not an open question as far as the members of the ENCODE Project are concerned and that brings us to the new definition of a gene. Here's how Gerstein et al. (2007) define the problem.
The ENCODE consortium recently completed its characterization of 1% of the human genome by various high-throughput experimental and computational techniques designed to characterize functional elements (The ENCODE Project Consortium 2007). This project represents a major milestone in the characterization of the human genome, and the current findings show a striking picture of complex molecular activity. While the landmark human genome sequencing surprised many with the small number (relative to simpler organisms) of protein-coding genes that sequence annotators could identify (~21,000, according to the latest estimate [see www.ensembl.org]), ENCODE highlighted the number and complexity of the RNA transcripts that the genome produces. In this regard, ENCODE has changed our view of "what is a gene" considerably more than the sequencing of the Haemophilus influenza and human genomes did (Fleischmann et al. 1995; Lander et al. 2001; Venter et al. 2001). The discrepancy between our previous protein-centric view of the gene and one that is revealed by the extensive transcriptional activity of the genome prompts us to reconsider now what a gene is.
Keep in mind that I personally reject the premise and I don't think I'm alone. As far as I'm concerned, the "extensive transcriptional activity" could be artifact and I haven't had a "protein-centric" view of a gene since I learned about tRNA and ribosomal RNA genes as an undergraduate in 1967. Even if the ENCODE results are correct my preferred definition of a gene is not threatened. So, what's the fuss all about?

Regulatory Sequences
Gerstein et al. are worried because many definitions of a gene include regulatory sequences. Their results suggest that many genes have multiple large regions that control transcription and these may be located at some distance from the transcription start site. This isn't a problem if regulatory sequences are not part of the gene, as in the definition quoted above (a gene is a transcribed region). As a mater of fact, the fuzziness of control regions is one reason why most modern definitions of a gene don't include them.
Overlapping Genes
According to Gerstein et al.
As genes, mRNAs, and eventually complete genomes were sequenced, the simple operon model turned out to be applicable only to genes of prokaryotes and their phages. Eukaryotes were different in many respects, including genetic organization and information flow. The model of genes as hereditary units that are nonoverlapping and continuous was shown to be incorrect by the precise mapping of the coding sequences of genes. In fact, some genes have been found to overlap one another, sharing the same DNA sequence in a different reading frame or on the opposite strand. The discontinuous structure of genes potentially allows one gene to be completely contained inside another one’s intron, or one gene to overlap with another on the same strand without sharing any exons or regulatory elements.
We've known about overlapping genes ever since the sequences of the first bacterial operons and the first phage genomes were published. We've known about all the other problems for 20 years. There's nothing new here. No definition of a gene is perfect—all of them have exceptions that are difficult to squeeze into a one-size-fits-all definition of a gene. The problem with the ENCODE data is not that they've just discovered overlapping genes, it's that their data suggests that overlapping genes in the human genome are more the rule than the exception. We need more information before accepting this conclusion and redefining the concept of a gene based on analysis of the human genome.
Splicing
Splicing was discovered in 1977 (Berget et al. 1977; Chow et al. 1977; Gelinas and Roberts 1977). It soon became clear that the gene was not a simple unit of heredity or function, but rather a series of exons, coding for, in some cases, discrete protein domains, and separated by long noncoding stretches called introns. With alternative splicing, one genetic locus could code for multiple different mRNA transcripts. This discovery complicated the concept of the gene radically.
Perhaps back in 1978 the discovery of splicing prompted a re-evaluation of the concept of a gene. That was almost 30 years ago and we've moved on. Now, many of us think of a gene as a region of DNA that's transcribed and this includes exons and introns. In fact, the modern definition doesn't have anything to do with proteins.

Alternative splicing does present a problem if you want a rigorous definition with no fuzziness. But biology isn't like that. It's messy and you can't get rid of fuzziness. I think of a gene as the region of DNA that includes the longest transcript. Genes can produce multiple protein products by alternative splicing. (The fact that the definition above says "a" functional product shouldn't mislead anyone. That was not meant to exclude multiple products.)

The real problem here is that the ENCODE project predicts that alternative splicing is abundant and complex. They claim to have discovered many examples of splice variants that include exons from adjacent genes as shown in the figure from their paper. Each of the lines below the genome represents a different kind of transcript. You can see that there are many transcripts that include exons from "gene 1" and "gene 2" and another that include exons from "gene 1" and "gene 4." The combinations and permutations are extraordinarily complex.

If this represents the true picture of gene expression in the human genome, then it would require a radical rethinking of what we know about molecular biology and evolution. On the other hand, if it's mostly artifact then there's no revolution under way. The issue has been fought out in the scientific literature over the past 20 years and it hasn't been resolved to anyone's satisfaction. As far as I'm concerned the data overwhelmingly suggests that very little of that complexity is real. Alternative splicing exists but not the kind of alternative splicing shown in the figure. In my opinion, that kind of complexity is mostly an artifact due to spurious transcription and splicing errors.
Trans-splicing
Trans-splicing refers to a phenomenon where the transcript from one part of the genome is attached to the transcript from another part of the genome. The phenomenon has been known for over 20 years—it's especially common in C. elegans. It's another exception to the rule. No simple definition of a gene can handle it.
Parasitic and mobile genes
This refers mostly to transposons. Gerstein et al say, "Transposons have altered our view of the gene by demonstrating that a gene is not fixed in its location." This isn't true. Nobody has claimed that the location of genes is fixed.
The large amount of "junk DNA" under selection
If a large amount of what we now think of as junk DNA turns out to be transcribed to produce functional RNA (or proteins) then that will be a genuine surprise to some of us. It won't change the definition of a gene as far as I can see.
The paper goes on for many more pages but the essential points are covered above. What's the bottom line? The new definition of an ENCODE gene is:
There are three aspects to the definition that we will list below, before providing the succinct definition:
  1. A gene is a genomic sequence (DNA or RNA) directly encoding functional product molecules, either RNA or protein.
  2. In the case that there are several functional products sharing overlapping regions, one takes the union of all overlapping genomic sequences coding for them.
  3. This union must be coherent—i.e., done separately for final protein and RNA products—but does not require that all products necessarily share a common subsequence.
This can be concisely summarized as:
The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
On the surface this doesn't seem to be much different from the definition of a gene as a transcribed region but there are subtle differences. The authors describe how their new definition works using a hypothetical example.

How the proposed definition of the gene can be applied to a sample case. A genomic region produces three primary transcripts. After alternative splicing, products of two of these encode five protein products, while the third encodes for a noncoding RNA (ncRNA) product. The protein products are encoded by three clusters of DNA sequence segments (A, B, and C; D; and E). In the case of the three-segment cluster (A, B, C), each DNA sequence segment is shared by at least two of the products. Two primary transcripts share a 5' untranslated region, but their translated regions D and E do not overlap. There is also one noncoding RNA product, and because its sequence is of RNA, not protein, the fact that it shares its genomic sequences (X and Y) with the protein-coding genomic segments A and E does not make it a co-product of these protein-coding genes. In summary, there are four genes in this region, and they are the sets of sequences shown inside the orange dashed lines: Gene 1 consists of the sequence segments A, B, and C; gene 2 consists of D; gene 3 of E; and gene 4 of X and Y. In the diagram, for clarity, the exonic and protein sequences A and E have been lined up vertically, so the dashed lines for the spliced transcripts and functional products indicate connectivity between the proteins sequences (ovals) and RNA sequences (boxes). (Solid boxes on transcripts) Untranslated sequences, (open boxes) translated sequences.
This isn't much different from my preferred definition except that I would have called the region containing exons C and D a single gene with two different protein products. Gerstein et al (2007) split it into two different genes.

The bottom line is that in spite of all the rhetoric the "new" definition of a gene isn't much different from the old one that some of us have been using for a couple of decades. It's different from some old definitions that other scientists still prefer but this isn't revolutionary. That discussion has already been going on since 1980.

Let me close by making one further point. The "data" produced by the ENCODE consortium is intriguing but it would be a big mistake to conclude that everything they say is a proven fact. Skepticism about the relevance of those extra transcripts is quite justified as is skepticism about the frequency of alternative splicing.


Gerstein, M.B., Bruce, C., Rozowsky, J.S., Zheng, D., Du, J., Korbel, J.O., Emanuelsson, O., Zhang, Z.D., Weissman, S. and Snyder, M. (2007) What is a gene, post-ENCODE? History and updated definition. Genome Res. 17:669-681.

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

[Hat Tip: Michael White at Adaptive Complexity]

Monday, June 18, 2007

Gene Genie #9

 
The latest issue of the carnival Gene Genie has just been posted on DNAdirect talk.

Skepticism About "Out-of-Africa"

 
Alan R. Templeton has long been a critic of those who would over-interpret the genetic data on human origins. He's not alone. There are a surprisingly large number of biologists who refuse to jump on the "Out-of-Africa" bandwagon. This group does not get the same amount of publicity as the advocates of a recent (<100,000 years) wave of migration out of Africa. I think it's because skepticism of a new theory is seen as sour grapes. That, plus the fact that it's hard to publish criticisms of work that's already in the scientific literature.

An upcoming issue of the journal Evolution will contain a review by Templeton on human origins and the Out-of-Africa theory. Right now it's only available online [GENETICS AND RECENT HUMAN EVOLUTION]. Here's the abstract,
Starting with "mitochondrial Eve" in 1987, genetics has played an increasingly important role in studies of the last two million years of human evolution. It initially appeared that genetic data resolved the basic models of recent human evolution in favor of the "out-of-Africa replacement" hypothesis in which anatomically modern humans evolved in Africa about 150,000 years ago, started to spread throughout the world about 100,000 years ago, and subsequently drove to complete genetic extinction (replacement) all other human populations in Eurasia. Unfortunately, many of the genetic studies on recent human evolution have suffered from scientific flaws, including misrepresenting the models of recent human evolution, focusing upon hypothesis compatibility rather than hypothesis testing, committing the ecological fallacy, and failing to consider a broader array of alternative hypotheses. Once these flaws are corrected, there is actually little genetic support for the out-of-Africa replacement hypothesis. Indeed, when genetic data are used in a hypothesis-testing framework, the out-of-Africa replacement hypothesis is strongly rejected. The model of recent human evolution that emerges from a statistical hypothesis-testing framework does not correspond to any of the traditional models of human evolution, but it is compatible with fossil and archaeological data. These studies also reveal that any one gene or DNA region captures only a small part of human evolutionary history, so multilocus studies are essential. As more and more loci became available, genetics will undoubtedly offer additional insights and resolutions of human evolution.

[Hat Tip: Gene Expression]

Skepticism About Evo-Devo

 
The May issue of Evolution contains an article by Hoekstra and Coyne on Evolutionary-Developmental Biology or Evo-Devo. There are many evolutionary biologists who have serious doubts about the claims of evo-devo but these doubts don't often make it into the scientific literature because it's very hard to publish critiques. The Hoekstra and Coyne (2007) article is a welcome contribution to the debate.

Here's the abstract,
An important tenet of evolutionary developmental biology (“evo devo”) is that adaptive mutations affecting morphology are more likely to occur in the cis-regulatory regions than in the protein-coding regions of genes. This argument rests on two claims: (1) the modular nature of cis-regulatory elements largely frees them from deleterious pleiotropic effects, and (2) a growing body of empirical evidence appears to support the predominant role of gene regulatory change in adaptation, especially morphological adaptation. Here we discuss and critique these assertions. We first show that there is no theoretical or empirical basis for the evo devo contention that adaptations involving morphology evolve by genetic mechanisms different from those involving physiology and other traits. In addition, some forms of protein evolution can avoid the negative consequences of pleiotropy, most notably via gene duplication. In light of evo devo claims, we then examine the substantial data on the genetic basis of adaptation from both genome-wide surveys and single-locus studies. Genomic studies lend little support to the cis-regulatory theory: many of these have detected adaptation in protein-coding regions, including transcription factors, whereas few have examined regulatory regions. Turning to single-locus studies, we note that the most widely cited examples of adaptive cis-regulatory mutations focus on trait loss rather than gain, and none have yet pinpointed an evolved regulatory site. In contrast, there are many studies that have both identified structural mutations and functionally verified their contribution to adaptation and speciation. Neither the theoretical arguments nor the data from nature, then, support the claim for a predominance of cis-regulatory mutations in evolution. Although this claim may be true, it is at best premature. Adaptation and speciation probably proceed through a combination of cis-regulatory and structural mutations, with a substantial contribution of the latter.

Hoekstra, Hopi, E. and Coyne, Jerry, A. (2007) THE LOCUS OF EVOLUTION: EVO DEVO AND THE GENETICS OF ADAPTATION. Evolution 65:995–1016.

Monday's Molecule #31

 
Today's molecule is complicated but it makes a lot of sense if you know your basic biochemistry. We don't need a long complicated name this time. It's sufficient to simply describe what you're looking at and why it's significant. You have to identify the key residue to get credit for the answer.

As usual, there's a connection between Monday's molecule and this Wednesday's Nobel Laureate(s). This one is an obvious direct connection. Once you have identified the molecule you should be able to name the Nobel Laureate(s).

The reward (free lunch) goes to the person who correctly identifies the molecule and the reaction and the Nobel Laureate(s). Previous free lunch winners are ineligible for one month from the time they first collected the prize. There are no ineligible candidates for this Wednesday's reward since recent winners (including last week's winner, "Kyo") have declined the prize on the grounds that they live in another country and can't make it for lunch on Thursday (a feeble excuse, in my opinion, haven't you heard of airplanes?).

Comments will be blocked for 24 hours. Comments are now open.

UPDATE: The molecule is the lariat structure of the RNA splicing intermediate. The key residue is the adenylate residue that's joined through its 2′ hydroxyl group to the 5′ end of the intron [see RNA Splicing: Introns and Exons]. The Nobel Laureates are Rich Roberts and Phil Sharp. (See the comments for an interesting anecdote concerning the discovery of this molecule.)

Saturday, June 16, 2007

Cellular Respiration Ninja Enzymes

 
Here's a student video presentation of glycolysis and respiration. It's much better than most [e.g., An Example of High School Biochemistry]. However, there are two errors in the video. The first one is fairly serious. The second one is less serious but it's something we cover in my class and it helps illustrate a fundamental concept about how certain reactions work. The second error was very common in most biochemistry textbooks in the past but it's been eliminated from the majority of 21st century textbooks. Can you spot both errors?



[Hat Tip: Greg Laden]
[Hint: What are the products produced by glycolysis and by the Krebs cycle?]

Friday, June 15, 2007

Penicillin Resistance in Bacteria: After 1960

 
The widespread appearance of penicillin-resistant bacteria by 1960 prompted the introduction of new drugs that could not be degraded by newly evolved β-lactamases [see Penicillin Resistance in Bacteria: Before 1960].

The most important of these new drugs are the cephalosporins, modified β-lactams with bulky side chains at two different positions. These drugs still inhibit the transpeptidases and prevent cell wall formation but because of the bulky side chains they cannot be hydrolyzed by β-lactamases. Thus, they are effective against most of the penicillin-resistant strains that arose before 1960.

Other drugs, such as methicillin, were modified penicillins. They also had modified side chains that prevented degradation by the β-lactamases.

It wasn't long before cephalosporin- and methicillin-resistant strains began to appear in hospitals. As a general rule, these strains were not completely resistant to high doses of the new class of drugs but as time went on the resistant strains became more and more immune to the drugs.

The new version of drug resistance also involves the transpeptidase target but instead of developing into β-lactamases they evolve into enzymes that can no longer bind the cephalosporins. Usually the development of resistance takes place in several stages.

There are many different transpeptidases in most species of bacteria. The are usually referred to as penicillin-binding proteins or PBP's. Often the first sign of non-lactamase drug resistance is a mutant version of one PDP (e.g., PDP1a) and subsequent development of greater resistance requires the evolution of other PDB's that don't bind the drug. In the most resistant strains there will be one particular PDB (e.g., PDB2a) that is still active at high drug concentrations while the other transpeptidases will be inhibited.

Resistant enzymes have multiple mutations, which explains the slow, stepwise acquisition of drug resistance. An example is shown in the figure. This is PDP1a from Streptococcus pneumoniae (Contreras, et al. 2006) and the mutant amino acids are displayed as gold spheres. Most of the mutations do not affect the binding of the drug but those surrounding the entry to the active site are crucial. The necessary amino acid substitutions are numbered in the figure. You can see that they line the groove where the cephalosporin drug (purple) is bound. The effect of the mutations is to prevent the bulky β-lactam from inhibiting the enzyme. This is a very different form of drug resistance than the evolution of degradation enzymes that characterized the first stage of penicillin resistant bacteria.


Chambers, H.F. (2003) Solving staphylococcal resistance to beta-lactams. Trends Microbiol. 11:145-148.

Contreras-Martel, C., Job, V., Di Guilmi, A.M., Vernet, T., Dideberg, O. and Dessen, A. (2006) Crystal structure of penicillin-binding protein 1a (PBP1a) reveals a mutational hotspot implicated in beta-lactam resistance in Streptococcus pneumoniae. J. Mol. Biol. 355:684-696.

Livermore, D.M. (2000) Antibiotic resistance in staphylococci. Int. J. Antimicrob. Agents 16:s3-s10.

Penicillin Resistance in Bacteria: Before 1960

 
The Nobel Prize for the discovery and analysis of penicillin was awarded in 1945 [Nobel Laureates: Sir Alexander Fleming, Ernst Boris Chain, Sir Howard Walter Florey]. It was about this time that penicillin became widely available in Europe and North America.

By 1946 6% of Staphylococcus aureus strains were resistant to penicillin. Resistance in other species of bacteria was also detected in the 1940's. By 1960 up to 60% of Staphylococcus aureus strains were resistant with similar levels of resistance reported in other clinically relevant strains causing a wide variety of diseases (Livermore, 2000).

Penicillins are a class of antibiotics with a core structure called a β-lactam. The different types of penicillin have different R groups on one end of the core structure. A typical examples of a penicillin is penicillin G [Monday's Molecule #30]. Others common derivatives are ampicillin and amoxicillin.

The original resistance to this entire class of drugs was caused mostly by the evolution of bacterial enzymes that could degrade them before they could block cell wall synthesis. (Recall that bacteria have cell walls and penicillin blocks cell wall synthesis [How Penicillin Works to Kill Bacteria].)
It seems strange that the evolution of penicillin resistance would require a totally new enzyme for degrading the drug. Where did this enzyme come from? And how did it arise so quickly in so many different species?

The degrading enzyme is called penicillinase, β-lactamase, or oxacillinase. They all refer to the same class of enzyme that binds penicillins and then cleaves the β-lactam unit releasing fragments that are inactive. The enzymes are related to the cell wall transpeptidase that is the target of the drug. The inhibition of the transpeptidase is effective because penicillin resembles the natural substrate of the reaction: the dipeptide, D-alanine-D-alanine.

In the normal reaction, D-Ala-D-Ala binds to the enzyme and the peptide bond is cleaved causing release one of the D-Ala residues. The other one, which is part of the cell wall peptidoglycan, remains bound to the enzyme. In the second part of the reaction, the peptidoglycan product is transferred from the enzyme to a cell wall crosslinking molecule. This frees the enzyme for further reactions (see How Penicillin Works to Kill Bacteria for more information).

Penicillin binds to the peptidase as well and the β-lactam bond is cleaved resulting in the covalent attachment of the drug to the enzyme. However, unlike the normal substrate, the drug moiety cannot be released from the transpeptidase so the enzyme is permanently inactivated. This leads to disruption of cell wall synthesis and death.

Resistant strains have acquired mutations in the transpeptidase gene that allow the release of the cleaved drug. Thus, the mutant enzyme acts like a β-lactamase by binding penicillins, cleaving them, and releasing the products. Although the β-lactamases evolved from the transpeptidase target enzymes, the sequence similarity between them is often quite low in any given species. This is one of the cases where structural similarity reveals the common ancestry [see the SCOP Family beta-Lactamase/D-ala carboxypeptidase]. It's clear that several different β-lactamases have evolved independently but, in many cases, a particular species of bacteria seems to have licked picked up a β-lactamase gene by horizontal transfer from another species. The transfer can be mediated by bacteriophage or plasmids.


Livermore, D.M. (2000) Antibiotic resistance in staphylococci. Int. J. Antimicrob. Agents 16:s3-s10.

Thursday, June 14, 2007

Catherine Shaffer Responds to My Comments About Her WIRED Article

 
Over on the WIRED website there's a discussion about the article on junk DNA [One Scientist's Junk Is a Creationist's Treasure]. In the comments section, the author Catherine Shaffer responds to my recent posting about her qualifications [see WIRED on Junk DNA]. She says,
You might be interested to learn that I contacted Larry Moran while working on this article and after reading the archives of his blog. I wanted to ask him to expand upon his assertion that junk DNA disproves intelligent design. His response was fairly brief, did not provide any references, and did not invite further discussion. It's interesting that he's now willing to write a thousand words or so about how wrong I am publicly, but was not able to engage this subject privately with me.
Catherine Shaffer sent me a brief email message where she mentioned that she had read my article on Junk DNA Disproves Intelligent Design Creationism. She wanted to know more about this argument and she wanted references to those scientists who were making this argument. Ms. Shaffer mentioned that she was working on an article about intelligent design creationism and junk DNA.

I responded by saying that the presence of junk DNA was expected according to evolution and that it was not consistent with intelligent design. I also said that, "The presence of large amounts of junk DNA in our genome is a well established fact in spite of anything you might have heard in the popular press, which includes press releases." She did not follow up on my response.
His blog post is inaccurate in a couple of ways. First, I did not make the claim, and was very careful to avoid doing so, that “most” DNA is not junk. No one knows how much is functional and how much is not, and none of my sources would even venture to speculate upon this, not even to the extent of “some” or “most.”
Her article says, "Since the early '70s, many scientists have believed that a large amount of many organisms' DNA is useless junk. But recently, genome researchers are finding that these "noncoding" genome regions are responsible for important biological functions." Technically she did not say that most DNA is not junk. She just strongly implied it.

I find it difficult to believe that Ryan Gregory would not venture to speculate on the amount of junk DNA but I'll let him address the validity of Ms. Shaffer's statement.
Moran also mistakenly attributed a statement to Steven Meyer that Meyer did not make.
I can see why someone might have "misunderstood" my reference to what Myer said so I've edited my posting to make it clear.
Judmarc and RickRadditz—Here is a link to the full text of the genome biology article on the opossum genome: Regulatory conservation of protein coding and microRNA genes in vertebrates: lessons from the opossum genome. We didn't have space to cover this in detail, but in essence what the researchers found was that upstream intergenic regions were more highly conserved in the possum compared to coding regions, but also represented a greater area of difference between possums and humans.
This appears to be a reference to the paper she was discussing in her article. It wasn't at all clear to me that this was the article she was thinking about in the first few paragraphs of her WIRED article.

Interested readers might want to read the comment by "Andrea" over on the WIRED site. She He doesn't pull any punches in demonstrating that Catherine Shaffer failed to understand what the scientific paper was saying. Why am I not surprised? (Recall that this is a science writer who prides herself on being accurate.)
So, yes, this does run counter to the received wisdom, which makes it fascinating. You are right that the discussion of junk vs. nonjunk and conserved vs. nonconserved is much more nuanced, and we really couldn't do it justice in this space. Here is another reference you might enjoy that begins to deconstruct even our idea of what conservation means: “Conservation of RET regulatory function from human to zebrafish without sequence similarity.” Science. 2006 Apr 14;312(5771):276-9. Epub 2006 Mar 23. Revjim—If you have found typographical errors in the copy, please do point them out to us. The advantage of online publication is that we do get a chance to correct these after publication.
Sounds to me like Catherine Shaffer is grasping at straws (or strawmen).
For Katharos and others—I interviewed five scientists for this article. Dr. Francis Collins, Dr. Michael Behe, Dr. Steve Meyers, Dr. T. Ryan Gregory, and Dr. Gill Bejerano. Each one is a gentleman and a credentialed expert either in biology or genetics. I am grateful to all of them for their time and kindness.
I think we all know just how "credentialed" Stephen Meyer is. He has a Ph.D. in the history and philosophy of science. Most of us are familiar with the main areas of expertise of Michael Behe and none of them appear to be science.