More Recent Comments

Showing posts sorted by relevance for query domains. Sort by date Show all posts
Showing posts sorted by relevance for query domains. Sort by date Show all posts

Saturday, February 03, 2018

What's in Your Genome?: Chapter 5: Regulation and Control of Gene Expression

I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?]. Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs [What's in Your Genome? Chapter 4: Pervasive Transcription].

Chapter 5 is Regulation and Control of Gene Expression.
Chapter 5: Regulation and Control of Gene Expression

What do we know about regulatory sequences?
The fundamental principles of regulation were worked out in the 1960s and 1970s by studying bacteria and bacteriophage. The initiation of transcription is controlled by activators and repressors that bind to DNA near the 5′ end of a gene. These transcription factors recognize relatively short sequences of DNA (6-10 bp) and their interactions have been well-characterized. Transcriptional regulation in eukaryotes is more complicated for two reasons. First, there are usually more transcription factors and more binding sites per gene. Second, access to binding sites depends of the state of chromatin. Nucleosomes forming high order structures create a "closed" domain where DNA binding sites are not accessible. In "open" domains the DNA is more accessible and transcription factors can bind. The transition between open and closed domains is an important addition to regulating gene expression in eukaryotes.
The limitations of genomics
By their very nature, genomics studies look at the big picture. Such studies can tell us a lot about how many transcription factors bind to DNA and how much of the genome is transcribed. They cannot tell you whether the data actually reflects function. For that, you have to take a more reductionist approach and dissect the roles of individual factors on individual genes. But working on single genes can be misleading ... you may miss the forest for the trees. Genomic studies have the opposite problem, they may see a forest where there are no trees.
Regulation and evolution
Much of what we see in evolution, especially when it comes to phenotypic differences between species, is due to differences in the regulation of shared genes. The idea dates back to the 1930s and the mechanisms were worked out mostly in the 1980s. It's the reason why all complex animals should have roughly the same number of genes—a prediction that was confirmed by sequencing the human genome. This is the field known as evo-devo or evolutionary developmental biology.
           Box 5-1: Can complex evolution evolve by accident?
Slightly harmful mutations can become fixed in a small population. This may cause a gene to be transcribed less frequently. Subsequent mutations that restore transcription may involve the binding of an additional factor to enhance transcription initiation. The result is more complex regulation that wasn't directly selected.
Open and closed chromatin domains
Gene expression in eukaryotes is regulated, in part, by changing the structure of chromatin. Genes in domains where nucleosomes are densely packed into compact structures are essentially invisible. Genes in more open domains are easily transcribed. In some species, the shift between open and closed domains is associated with methylation of DNA and modifications of histones but it's not clear whether these associations cause the shift or are merely a consequence of the shift.
           Box 5-2: X-chromosome inactivation
In females, one of the X-chromosomes is preferentially converted to a heterochromatic state where most of the genes are in closed domains. Consequently, many of the genes on the X chromosome are only expressed from one copy as is the case in males. The partial inactivation of an X-chromosome is mediated by a small regulatory RNA molecule and this inactivated state is passed on to all subsequent descendants of the original cell.
           Box 5-3: Regulating gene expression by
           rearranging the genome

In several cases, the regulation of gene expression is controlled by rearranging the genome to bring a gene under the control of a new promoter region. Such rearrangements also explain some developmental anomalies such as growth of legs on the head fruit flies instead of antennae. They also account for many cancers.
ENCODE does it again
Genomic studies carried out by the ENCODE Consortium reported that a large percentage of the human genome is devoted to regulation. What the studies actually showed is that there are a large number of binding sites for transcription factors. ENCODE did not present good evidence that these sites were functional.
Does regulation explain junk?
The presence of huge numbers of spurious DNA binding sites is perfectly consistent with the view that 90% of our genome is junk. The idea that a large percentage of our genome is devoted to transcriptional regulation is inconsistent with everything we know from the the studies of individual genes.
           Box 5-3: A thought experiment
Ford Doolittle asks us to imagine the following thought experiment. Take the fugu genome, which is very much smaller than the human genome, and the lungfish genome, which is very much larger, and subject them to the same ENCODE analysis that was performed on the human genome. All three genomes have approximately the same number of genes and most of those genes are homologous. Will the number of transcription factor biding sites be similar in all three species or will the number correlate with the size of the genomes and the amount of junk DNA?
Small RNAs—a revolutionary discovery?
Does the human genome contain hundreds of thousands of gene for small non-coding RNAs that are required for the complex regulation of the protein-coding genes?
A “theory” that just won’t die
"... we have refuted the specific claims that most of the observed transcription across the human genome is random and put forward the case over many years that the appearance of a vast layer of RNA-based epigenetic regulation was a necessary prerequisite to the emergence of developmentally and cognitively advanced organisms." (Mattick and Dinger, 2013)
What the heck is epigenetics?
Epigenetics is a confusing term. It refers loosely to the regulation of gene expression by factors other than differences in the DNA. It's generally assumed to cover things like methylation of DNA and modification of histones. Both of these effects can be passed on from one cell to the next following mitosis. That fact has been known for decades. It is not controversial. The controversy is about whether the heritability of epigenetic features plays a significant role in evolution.
           Box 5-5: The Weismann barrier
The Weisman barrier refers to the separation between somatic cells and the germ line in complex multicellular organisms. The "barrier" is the idea that changes (e.g. methylation, histone modification) that occur in somatic cells can be passed on to other somatic cells but in order to affect evolution those changes have to be transferred to the germ line. That's unlikely. It means that Lamarckian evolution is highly improbable in such species.
How should science journalists cover this story?
The question is whether a large part of the human genome is devoted to regulation thus accounting for an unexpectedly large genome. It's an explanation that attempts to refute the evidence for junk DNA. The issue is complex and very few science journalists are sufficiently informed enough to do it justice. They should, however, be making more of an effort to inform themselves about the controversial nature of the claims made by some scientists and they should be telling their readers that the issue has not yet been resolved.


Friday, February 09, 2024

Open and closed chromatin domains (and epigenetics)

Gene expression in eukaryotes is influenced by the state of chromatin. Tightly packed nucleosomes inhibit the binding of transcription factors and RNA polymerase so that genes in these regions are "repressed." From time to time these regions loosen up a bit allowing access to transcription complexes and subsequent transcription.

The tightly packed regions are known as closed domains and the accessible regions are open domains. Some authors add an intermediate domain called a permissive domain. This model of eukaryotic gene expression has been around for 50 years and the important mechanisms controlling the switch were worked out in the 1980s. I found a recent review that covers this issue in the context of epigenetics and the image below comes from that paper (Klemm et al., 2019).

Thursday, March 13, 2008

Levels of Protein Structure

There are four levels of protein structure. The primary structure refers to the sequence of amino acid residues in the polypeptide chain written left-to-right from the N-terminus to the C-terminus.

Secondary structures are ordered structures formed by internal hydrogen bonding between amino acid residues. The common secondary structures are the α helix, the β strand, and various loops and turns. The β sheet is often counted as secondary structure although, strictly speaking, it is a motif (see below).


The tertiary structure of a polypeptide is the three-dimensional conformation. Typical proteins contain α helices, β strands, and turns, although there are some proteins that only have α helices and turns, and others that have only β sheets and turns. In many cases, the final structure consists of distinct, independently folded regions called domains.

An example of a protein with multiple domains is shown on the left. This protein is the enzyme pyruvate kinase from cat (Felix domesticus). There are three separate domains indicated by the square brackets on the side. Note that each of the domains is connected to another by a short stretch of unordered polypeptide chain.

In some cases, a particular domain is shared by several proteins suggesting that different proteins can be formed by combining various domains that evolved separately. In other cases, similar domain structures might arise independently by convergent evolution.

Quaternary structure only applies to proteins that are composed of more than one polypeptide chain. Each of the polypeptides is called a subunit. The subunits might be identical, as in the example shown above, or they might be very different as in my favorite enzyme ubiquinone:cytochrome c oxidoreductase (complex III).

There are certain motifs that occur over and over again in different proteins. The helix-loop-helix motif, for example, consists of two α helices joined by a reverse turn. The Greek key motif consists of four antiparallel β strands in a β sheet where the order of the strands along the polypeptide chain is 4, 1, 2, 3. The β sandwich is two layers of β sheet [see β Strands and β Sheets].

The vast majority of motifs do not have a common evolutionary origin in spite of many claims to the contrary. They arise independently and converge on a common stable structure. The fact that these same motifs occur in hundreds of different proteins indicates that there are a limited number of possible folds in the universe of protein structures. The original primitive protein may have been relatively unstructured but over time there will be selection for more and more stable structures. This selection will favor the common motifs.

Larger motifs are often called domain folds because they make up the core of a domain. The parallel twisted sheet is found in many domains that have no obvious relationship other than the fact that they share this very stable core structure. The β barrel structure is found in many membrane proteins. There are dozens of enzymes that have adapted to an α/β barrel. These enzymes are not evolutionarily related. (The β helix is much less common.)


[Figure Credit: The figures are from Horton et al. (2006)]

Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Prentice Hall, Upper Saddle River N.J. (USA)

Tuesday, January 14, 2020

The Three Domain Hypothesis: RIP

The Three Domain Hypothesis died about twenty years ago but most people didn't notice.

The original idea was promoted by Carl Woese and his colleagues in the early 1980s. It was based on the discovery of archaebacteria as a distinct clade that was different from other bacteria (eubacteria). It also became clear that some eukaryotic genes (e.g. ribosomal RNA) were more closely related to archaebacterial genes and the original data indicated that eukaryotes formed another distinct group separate from either the archaebacteria or eubacteria. This gave rise to the Three Domain Hypothesis where each of the groups, bacteria (Eubacteria), archaebacteria (Archaea), and eukaryotes (Eucarya, Eukaryota), formed a separate clade that contained multiple kingdoms. These clades were called Domains.

Thursday, March 13, 2008

Examples of Protein Structure

Here's a slideshow of figures from Horton et al. (2006) showing cartoons of various proteins. At first it seems as though every protein is completely different but after a while you begin to notice that there are recurring motifs—especially β sheets in the hydrophobic core of a domain.



Click here to see a full-screen version of the slideshow. More adventuresome readers might want to visit the actual structures on the Protein Data Base (PDB). Here are the links to each PDB record. References to the people who solved the structure can be found in the PDB record.
  • Human (Homo sapiens) serum albumin [PDB 1BJ5] (class: all-α). This protein has several domains consisting of layered α helices and helix bundles.
  • Escherichia coli cytochrome b562 [PDB 1QPU] (class: all-α). This is a heme-binding protein consisting of a single four-helix bundle domain.
  • Escherichia coli UDP N-acetylglucosamine acyl transferase [PDB 1LXA] (class: all-β). The structure of this enzyme shows a classic example of a β-helix domain.
  • Jack bean (Canavalia ensiformis) concanavalin A [PDB 1CON] (class: all-β). This carbohydrate-binding protein (lectin) is a single-domain protein made up of a large β-sandwich fold.
  • Human (Homo sapiens)peptidylprolyl cis/trans isomerase [PDB 1VBS] (class: all-β). The dominant feature of the structure is a β-sandwich fold.
  • Cow (Bos taurus) γ crystallin [PDB 1A45] (class: all-β) This protein contains β-barrel two domains.
  • Jellyfish (Aequorea victoria) green fluorescent protein [PDB
    1GFL
    ] (class: all-β). This is a β-barrel structure with a central α helix. The strands of the sheet are antiparallel.
  • Pig (Sus scrofa) retinol-binding
    protein [PDB 1AQB] (class: all-β). Retinol binds in the interior of a β-barrel fold.
  • Brewer’s yeast (Saccharomyces carlsburgensis) old yellow enzyme
    (FMN oxidoreductase) [PDB 1OYA] (class: α/β). The central fold is an α/β barrel with parallel β strands connected by α helices. Two of the connecting regions are highlighted in yellow.
  • Escherichia coli enzyme required for tryptophan biosynthesis [PDB 1PII] (class: α/β). This is a bifunctional enzyme containing two distinct domains. Each domain is an example of an α/β barrel. The left-hand domain contains the indolglycerol phosphate synthetase activity, and the right-hand domain contains the phosphoribosylanthranilate isomerase activity.
  • Pig (Sus scrofa) adenylyl kinase [PDB 3ADK] (class: α/β). This single-domain protein consists of a five-stranded parallel β sheet with layers of α helices above and below the sheet. The substrate binds in the prominent groove between helices.
  • Escherichia coli flavodoxin [PDB 1AHN] (class: α). The fold
    is a five-stranded parallel twisted sheet surrounded by α helices.
  • Human (Homo sapiens) thioredoxin [PDB 1ERU] (class: ). The structure of this protein is very similar to that of E. coli flavodoxin except that the five-stranded twisted sheet in the thioredoxin fold contains a single antiparallel strand.
  • Escherichia coli L-arabinose-binding protein [PDB 1ABE] (class: α/β). This is a two-domain protein where each domain is similar to
    that in E. coli flavodoxin. The sugar L-arabinose binds in the cavity between the two domains.
  • Escherichia coli DsbA (thiol-disulfide oxidoreductase/disulfide isomerase) [PDB 1A23] (class: α/β). The predominant feature of this structure is a (mostly) antiparallel β sheet sandwiched between α helices. Cysteine side chains at the end of one of the α helices are shown (sulfur atoms are yellow).
  • Neisseria gonorrhea pilin [PDB 2PIL] (class: α + β). This polypeptide is one of the subunits of the pili on the surface of the bacteria responsible for gonorrhea. There are two distinct regions of the structure: a β sheet and a long α helix.
  • Chicken (Gallus gallus) triose phosphate isomerase [PDB 1TIM]. This protein has two identical subunits with α/β barrel folds.
  • HIV-1 aspartic protease [PDB 1DIF]. This protein has two identical all-β subunits that bind symmetrically. HIV protease is the target of many new drugs designed to treat AIDS patients.
  • Streptomyces lividans potassium channel protein [PDB 1BL8]. This membrane-bound protein has four identical subunits, each of which contributes to a membrane-spanning eight-helix bundle.
  • Bacteriophage MS2 capsid protein [PDB 2MS2]. The basic unit of the MS2 capsid is a trimer of identical subunits with a large β sheet.
  • Human (Homo sapiens) hypoxanthine-guanine phosphoribosyl transferase (HGPRT) [PDB 1BZY]. HGPRT is a tetrameric protein containing two different types of subunit.
  • Rhodopseudomonas viridis photosystem [PDB 1PRC]. This complex, membrane-bound protein has two identical subunits (orange, blue) and two other subunits (purple, green) bound to several molecules of photosynthetic pigments.


Horton, H.R., Moran, L.A., Scrimgeour, K.G., perry, M.D. and Rawn, J.D. (2006) Principles of Biochemisty. Pearson/Prentice Hall, Upper Saddle River N.J. (USA)

Monday, September 22, 2014

What are lncRNAs?

Many genes encode proteins and many other genes specify functional RNAs that do not encode proteins. The "RNA genes" include the classic genes for ribosomal RNAs and tRNAs as well as genes for very well-studied RNAs that carry out catalytic roles in the cell. There are a myriad of small RNAs required for things like splicing and regulation. All species, both prokaryotes and eukaryotes, contain genes for a wide variety or functional RNAs.

Eukaryotes seem to have an abundance of genes for small RNAs that perform a number of specific roles in regulation etc. They also have a lot of DNA regions complementary to long noncoding RNAs or lncRNAs (also lincRNA). The definition of long noncoding RNAs seems arbitrary and ambiguous [see Long Noncoding RNA]. Some of them might even encode proteins!

As a general rule, these RNAs are longer than 200 bp and some scientists put the cutoff at 1000 bp. Simple eukaryotes, such as yeast, don't have a lot of lncRNAs but eukaryotes with large complex genomes that are full of junk DNA seem to have a lot of different lncRNAs. The DNA regions1 that specify these lncRNAs ar not conserved. This strongly suggest that many of the lncRNAs are spurious nonfunctional transcripts even though some of them have well-characteized functions [see On the function of lincRNAs].

As usual, we have a definition problem. Are "lncRNAs" just a generic class of long noncoding RNAs that include thousands of nonfunctional molecules that are nothing more than junk RNA? Or, does the term "lncRNA" refer only to the subset that has a function? If it's the latter, then we should probably be referring to "putative" lncRNAs most of the time since the vast majority have not been shown to have a function. (There are about 10,000 of these RNAs in humans.)

I don't see how you can avoid the elephant in the room whenever you talk about lncRNAs. The most important question in NOT whether some of them have a function—that was demonstrated 30 years ago. The important question is whether the majority, or even a substantial minority, have a function.

That's why I was eager to read a short review by Rinn and Guttman in a recent issue of Science (Rinn and Guttman, 2014). They describe two lncRNAs that probably play a role in organizing chromatin within the nucleus (Xist and Neat1, both fram mammals). That's cool.

Then they say,
Collectively, these studies suggest that lncRNAs may shape nuclear organization by using the spatial proximity of their transcription locus as a means to target preexisting local neighborhoods. lncRNAs can in turn modify and reshape the organization of these local neighborhoods to establish new nuclear domains by interacting with various protein complexes, including chromatin regulators. Once established, a lncRNA can act to maintain these nuclear domains through active transcription and recruitment of interacting proteins to these domains. While the mechanism for how lncRNAs establish these domains is not fully understood, it is becoming increasingly clear that lncRNAs are important at all levels of nuclear organization—exploiting, driving, and maintaining nuclear compartmentalization.
It sure sounds like they are describing a particular function (nuclear organization) to the majority of lncRNAs. But what if 90% of all 10,000 lncRNAs have no function and what if only 100 of the remaining functional lncRNAs are involved in nuclear organization? That means there are 900 functional lncRNAs that play a different role in the cell?

If that were true, you would write that last paragraph very differently. If you recognize the elephant, you might say something like this ....
Very few lncRNAs have been shown to have a function and there's a very good chance that most of them are spurious transcripts that have no function. However, a small percentage do seem to have a function. In this review we have identified some long noncoding RNAs that appear to be involved in nuclear organization. We propose to call these RNAs "noRNAs" for "nuclear organizer RNAs" on the grounds that once a function has been identified we should stop referring to them as lncRNAs.
But that doesn't sound nearly as exciting as the subtitle of the article, "Long noncoding RNAs may function as organizing factors that shape the cell nucleus" or the quotation that's prominently displayed in a box in the center of the page, "... it is becoming increasingly clear that IncRNAs are important in all levels of nuclear organization—exploiting, driving, and maintaining nuclear compartmentalization." When did science become so dedicated to hype over substance? I must have missed the memo.


1. I use "DNA regions" instead of "genes" because the definition of a gene requires that the gene product be functional. You can't call them genes unless you have demonstrated that the RNA has a function.

Rinn, J. and Guttman, M. (2014) RNA and dynamic nuclear organization. Science 345"1240-1241 [doi: 10.1126/science.1252966]

Saturday, October 13, 2007

HSP90 Structure

 
Hsp90 is a molecular chaperone that plays a role in the folding and assembly of other proteins. Current ideas suggest that it binds to substrate proteins at a "client" site and this either encourages folding into the proper conformation or prevents aggregation. The binding and release of polypeptides is accompanied by hydrolysis of ATP to ADP + Pi.

The bacterial version of hsp90 is called HtpG. Eukaryotes have several different members of the Hsp90 family including one that resides in the endoplasmic reticulum. The cytosolic protein is called Hsp90 and the ER version is called GRP94. Hsp90 is a highly conserved protein showing significant sequence identity between prokaryotic and eukaryotic proteins. The HSP90 family shares many of the same characteristics of the more highly conserved HSP70s [Heat Shock and Molecular Chaperones, Gene HSPA5 Encodes BiP-a Molecular Chaperone, The Evolution of the HSP70 Gene Family].

Daniel Gewirth and his colleagues have just published the complete structure of GRP94 from dog (Canis familiaris). The article appears in Molecular Cell and their structure is on the cover of the journal (Dollins et al. 2007). This is the endoplasmic reticulum version of Hsp90 and its the only ER version of this protein whose structure is known. Gewirth has been working on the structure since 2001 and he deposited the first structural coordinates of a fragment of this protein back in February 2004. (See the Protein Data Base (PDB) for the structures. Search for "hsp90".)

The complete protein is a dimer of two identical subunits. Each monomer has three distinct domains; an N-terminal domain (N); a middle domain (M); and a C-terminal domain (C). The ATP hydrolysis site sits at the interface between the N and M domains. The C domains interact to form the dimer. The presumed site of binding for misfolded proteins ("client" site") is in the V-shaped pocket formed when the C domains come together.

The mechanism of action of Hsp90 proteins is not known although it presumably involves a conformational change induced by ATP hydrolysis. This paper provides an important clue to that mechanism because the dimer structure differs from that seen with the yeast protein (Hsp82) and the E. coli protein (HtpG) (below).
Each of the structures seems to identify a protein in one of the conformations adopted in vivo. The most likely explanation is that the wings of the protein open and close to capture and release the substrate protein. This conformational change is induced by binding and hydrolysis of ATP.

Now that we have a structure for GRP94 from dog we can compare the structures of proteins from different species to see how closely they resemble each other. Let's look at the N-terminal domain to get an idea of how protein structure is conserved over billions of years. The four structures below are, from left to right, yeast (1zwh), dog (2fyp), human (1us7) and E. coli (2ior).


Aren't they remarkably similar! This is exactly the sort of thing you expect with a highly conserved protein.

By the way, anyone can create these images by going to the PDB site [2ior] and viewing the structures with the MBT SimpleViewer. If you haven't already installed this viewer it will automatically install in your browser and it only takes a few minutes.


Dollins, E.D., Warren, J.J., Immormino, R.M. and Gewirth, D.T. (2007) Structures of GRP94-Nucleotide Complexes Reveal Mechanistic Differences between the hsp90 Chaperones. Molec. Cell 28:41-56.

Tuesday, January 19, 2016

Massimo Pigliucci tries to defend accommodationism (again): result is predictable

Massimo Pigliucci is an atheist who thinks that science and religion are compatible because they rule in different domains. He takes a very narrow view of "science"— one that excludes the work of historians and philosophers who are presumably using some other way of knowing. (He doesn't tell us what that is.)

I prefer the broad view of science as a way of knowing that relies on evidence, rational thinking, and healthy skepticism. This broad view of science is not universal—but it's not uncommon. In fact, Alan Sokel has defended this view of Massimo Pigiucci's own blog: [What is science and why should we care? — Part III]. According to this view, any attempt to gain knowledge should employ the scientific worldview. Historian and philosophers should follow this path if they hope to be successful. Pigliucci should know that there are different definitions and any discussion of the compatibility of science and religion must take these differences into account.

Sunday, December 31, 2006

The Three Domain Hypothesis (part 6)

[Part 1][Part 2][Part 3][Part 4][Part 5]

Evolving Biological Organization

Carl Woese discovered archaebacteria and he made them fit into a separate super-kingdom, or “domain.” He is the man behind the claim that archaebacteria are so different from other bacteria that they deserve equal taxonomic status with eukaryotes. Woese is the father of the Three Domain Hypothesis, which not only claims domain-level recognition for archaebacteria, but also claims that eukaryotes descend from a primitive archaebacterium.

Back in 1995, when evidence against the Three Domain Hypothesis was mounting, I made a bet with Steven LaBonne that Woese would recant by January 1997.

I lost that bet, but eight years later Woese has finally come to his senses ... at least partly ....

I’m reviewing articles that appeared in Microbial Phylogeny and Evolution edited by Jan Sapp. Carl Woese’s contribution (“Evolving Biological Organization”) describes his current thoughts about the emergence of defined species from the pool of primitive gene-swapping cells that characterized the early history of life.

Woese’s idea, which has been evolving of a period of ten years, is that primitive life existed as a community of cells that freely exchanged genes. They shared a basic translation system for making proteins, but had little else in common. These cells evolved as a community and not as distinct lineages.

Woese refers to this time as the “progenote era” where the word “progenote” refers to a cell that has not yet established a definite link between a stable genotype and a heritable phenotype. At some point in time, certain cells make the transition from progenote to the founders of a stable lineage. The transition point is known as the “Darwinian threshold.”
The real mystery, however, is how this incredibly simple, unsophisticated, imprecise communal progenote—cells with only ephemeral genealogical traces—evolved to become the complex, precise, integrated, individualized modern cells, which have stable genealogical records. This shift from a primitive genetic free-for-all to modern organisms must by all acounts have been one of the most profound happenings in the whole of evolutionary history. Although we do not yet understand it, the transition needs to be appropriately marked and named. “Darwinian threshold” (or “Darwinian Transition”) seems appropriate: crossing that threshold means entering a new stage, where organismal lineages and genealogies have meaning. where evolutionary descent is largely vertical, and where the evolutionary course can begin to be described by tree representation. (p. 109)
According to Woese, bacteria were the first species to emerge from the pool. From that point onwards, the evolution of bacteria was “Darwinian” and could be represented by a bifurcating tree.

What about archaebacteria and eukaryotes? They emerged later ...
At that point, though, both the archaeal and eukaryotic designs remain in the pre-Darwin progenote, condition: still heavily immersed in the universal HGT field, still in the throes of shaping major features of their representative designs; and so, their evolutions cannot be represented in tree form. In other words, the node in the conventional phylogenetic tree that denotes a common ancestor of the archaea and eukaryotes does not actually exist. The two cell designs are not specifically related; it is just that the tree representation made them “sisters by default.” (p.111)
Woese suggests that the archaebacteria were the next to cross the Darwinian threshold followed by eukaryotes. This explains why archaebacteria have simpler cell components and eukaryotes are more complex. (The precursors of the eukaryote lineage spent more time in the progenote era and accumulated more innovative structures, such as nuclear membranes.)

The progenote community may have spawned other “domains” but these are now extinct, although Woese suggests there are some clues pointing to their previous existence. I assume that the progenote community itself petered out shortly after the emergence of eukaryotes.

This new theory of Woese is not very satisfying. I find the explanation somewhat confusing. Woese is trying to preserve the distinctiveness of the Three Domains while denying that their relationship can be discerned. In other words, he wants to have his cake and eat it too.

In order to defend the monophyletic domains, especially archaebacteria, he has to postulate that each one descends from a single cell, or lineage, that pops out of the progenote community. That’s why each domain has a defined root (i.e. monophyletic). But in order to account for the massive amounts of data that show eukaryotes closer to bacteria than to archaebacteria, he postulates an extended period of evolution where cells exchanged genes in a communal pool. This is not unlike the ideas of many other workers in the field except that for Woese it represents a denial of one of the basic tenets of the original Three Domain Hypothesis.

Woese is very clear about this. He makes the case that the branches at the base of the ribosomal RNA tree are not meaningful. It is wrong to assume that archaebacteria and eukaryotes share a common ancestor. I’ll close this part with an extended quote from Woese to show you just how far he’s willing to go to make the case. (Note how much he has come to agree with people like Ford Doolittle [Part 5] who have been challenging the Three Domain Hypothesis for over a decade.)
Classical biology has also saddled us with the phylogenetic tree, an image the biologist invests with a deep and totally unwarranted significance. The tree is no more than a representational device, but to the biologist it is some God-given truth. Thus, for example, we agonize over how the tree can accommodate horizontal gene transfer events, when it should simply be a matter of when (and to what extent) the evolution course can be usefully represented by a tree diagram. Evolution defines the tree, not the reverse. Tree imagery has locked the biologist into a restricted way of looking at ancestors. It is the tree image, almost certainly, that has caused us to turn Darwin’s conjecture that all organisms might have descended from a simple primordial form into doctrine: the doctrine of common descent. As we shall discuss below, it is also the tree image that has caused biologists (incorrectly) to take the archaea and the eukaryotes to be sister lineages. Much of the current “discussion/debate” about the evolutionary course is couched in the shallow but colorful and cathected rhetoric of “shaking,” “rerooting,” “uprooting,” or “chopping down” the universal phylogenetic tree. (p.102)


Microbobial Phylogeny and Evolution: Concepts and Controversies Jan Sapp, ed., Oxford University Press, Oxford UK (2005)

Jan Sapp The Bacterium’s Place in Nature

Norman Pace The Large-Scale Structure of the Tree of Life.

Woflgang Ludwig and Karl-Heinz Schleifer The Molecular Phylogeny of Bacteria Based on Conserved Genes.

Carl Woese Evolving Biological Organization.

W. Ford Doolittle If the Tree of Life Fell, Would it Make a Sound?.

William Martin Woe Is the Tree of Life.

Radhey Gupta Molecular Sequences and the Early History of Life.

C. G. Kurland Paradigm Lost.


Monday, August 10, 2015

Insulators, junk DNA, and more hype and misconceptions

The folks at Evolution News & Views (sic) can serve a very useful purpose. They are constantly scanning the scientific literature for any hint of evidence to support their claim about junk DNA. Recall that Intelligent Design Creationists have declared that if most of our genome is junk then intelligent design is falsified since one of the main predictions of intelligent design is that most of our genome will be functional.

THEME

Genomes & Junk DNA
They must be getting worried because their most recent posts sounds quite desperate. The last one is: The Un-Junk Industry. It quotes a popular press report on a paper published recently in Procedings of the National Academy of Sciences (USA). The creationists concede that the paper itself doesn't even mention junk DNA but the article in EurekAlert does.

Wednesday, March 05, 2014

The crystal structure of E. coli RNA polymearse σ70 holoenzyme

THEME:
Transcription

The Journal of Biological Chemistry (JBC) publishes a little booklet of the "best of jbc." The latest copy arrived in the mail a few days ago and it alerted me to a paper published one year ago on the structure of Escherichia coli RNA polymerase σ70 holoenzyme (Murikami, 2013).1

The control of transcription initiation is a very important topic in biochemistry and molecular biology and the events in E. coli are the model for transcription initiation in all other species. We know more about RNA polymerase and promoter sites in E. coli than in any other species.

Monday, April 02, 2007

Blood Clotting: The Basics

 
When blood vessels are damaged the leak must be sealed as rapidly as possible to prevent excess blood loss. The first response is formation of a blood clot at the site of damage. The clot is made up of cross-linked fibers made from a protein called fibrin.

The fibrin network is formed from a precursor of fibrin called fibrinogen [hear it]. Fibrinogen is a large protein that circulates freely in the blood stream. The key to understanding the mechanism of blood clot formation is in understanding how fibrinogen is converted to fibrin and why this only occurs at the site of damage to the lining of the blood vessel.

The activation mechanism is very complicated and highly regulated. The disruption of a blood clot when it is no longer needed is also complicated and highly regulated.

We'll start by looking at the basics of clot formation and dissolution.

Fibrinogen is composed of three different polypeptide chains or subunits. Each one is present in two copies (α2β2γ2). The α, β, and γ chains wrap around each other to form a coiled coil triple helix. Two of these coiled coil complexes are joined head-to-head at the N-terminal ends of the polypeptides to make the complete molecule.


The complete fibrinogen molecule, which is very large as far as proteins go, consists of two domains. The central region where the N-terminal ends (N) are located forms the E domain. The outside ends where the C-terminal (C) ends are found are called the D domains.

Fibrinogen is soluble in blood plasma and the molecules show very little tendency to aggregate to form blood clots. Aggregation is prevented in large part by the N-terminal tails of the α (red) and β (blue) subunits projecting out of the central E domain. Blood clotting is initiated when these tails are chopped off by a specific protein-cutting enzyme (protease) called thrombin. Thrombin converts fibrinogen to fibrin and fibrin spontaneously aggregates to form a clot.

The activation takes place in two stages. In the first stage thrombin cleaves the α subunit releasing fibrinogen peptide A (FpA) and creating fibrin. The resulting fibrins can interact through their E domains to form filaments.

In the slower second step, the β subunit is cleaved releasing fibrinogen peptide B (FpB) and this permits aggregation of filaments to form complex networks. The resulting clot is called a soft clot. It is converted to a hard clot by Factor XIIIa (the "a" stands for "activated"). FXIIIa catalyzes the formation of covalent cross-links between fibrin molecules. The activated cross-link enzyme (FXIIIa) is formed from an inactive precursor (FXIII) by the action of thrombin. Thrombin not only cleaves fibrinogen, it also cleaves a number of clotting factors, like FXIII, to create active forms.

The initiation of clotting depends on thrombin activity. Thrombin is formed by proteolytic cleavage of inactive prothrombin to create the active protease (thrombin). This activation of prothrombin takes place at the site of injury and it's the way clotting is regulated. We'll cover it later on. You're probably getting the idea—blood clotting is controlled and regulated by a cascade of protein cleavages.

Once a clot is formed it eventually has to be dissolved once the injury is healed. This step is called fibrinolysis. The enzyme that dissolves clots is called plasmin. It chops aggregated fibrin fibers in the coiled coil region thus breaking up the clot. Can you guess how active plasmin is formed?

That's right. It's formed from a precursor called plasminogen by proteolytic cleavage. The enzyme that activates plasminogen is called tissue plasminogen activator (TPA). Plasminogen has a high affinity for fibrin clots but not for free fibrinogen. TPA also binds to fibrin and it only cleaves plasminogen when a complex of fibrin clot+plasminogen+TPA forms. The scheme on the right is a summary of what we've covered so far.

You may have heard of TPA. It's an enzyme that's given to heart attack patients but it must be delivered as soon as possible in order to prevent death. Here's what The American Heart Association says about TPA.
We strongly urge people to seek medical attention as soon as possible if they believe they're having a stroke or heart attack. The sooner tPA or other appropriate treatment is begun, the better the chances for recovery.

Tissue plasminogen activator (tPA) is a thrombolytic agent (clot-busting drug). It's approved for use in certain patients having a heart attack or stroke. The drug can dissolve blood clots, which cause most heart attacks and strokes.

Studies have shown that tPA and other clot-dissolving agents can reduce the amount of damage to the heart muscle and save lives. However, to be effective, they must be given within a few hours after symptoms begin. Administering tPA or other clot-dissolving agents is complex and is done through an intravenous (IV) line in the arm by hospital personnel.
There are very few textbooks that do a good job of summarizing and simplifying blood clotting. One of the better ones is Textbook of Biochemistry with Clinical Correlations 6th ed. edited by Thomas H. Devlin. This is an excellent book for those interested in biochemistry with a medical slant. I recommend it very highly as a reference text. It ain't cheap.

Devlin, T.H. (ed.) (2006) Textbook of Biochemistry with Clinical Correlations 6th ed., Wiley-Liss, Hoboken, N.J. (USA)

Wolberg, A.S. (2007) Thrombin generation and fibrin clot structure. Blood Reviews Jan. 5 2007. [PubMed]

Monday, August 31, 2015

The origin of eukaryotes and the ring of life

The latest issue of Philosophical Transactions of the Royal Society B (Sept. 26, 2015) is devoted to Eukaryotic origins: progress and challenges. There are 16 articles and anyone interested in this subject has to read all of them.

Many (most) of you aren't going to do that so let me try and summarize the problem and the best current ideas on how to solve it. We begin with the introduction to the issue by the editors, Tom Williams, Martin Embley (Williams and Embly, 2015). Here's the abstract ...

Friday, October 12, 2007

Eugene Koonin and the Biological Big Bang Model of Major Transitions in Evolution

 
Eugene Koonin runs a large laboratory at the National Center for Biothechnology Information (NCBI) in Bethesda, MD. (USA) [Evolutionary Genomics ResearchGroup]. His group tends to focus on new ways of analyzing sequences databases and on interesting findings from database mining expeditions.

Koonin and his coworkers are strong supporters of the Three Domain Hypothesis and they usually interpret their data in terms of three domains of life (Bacteria, Archaea, Eukaryotes) with eukaryotes being derived from archaea. As for other evolutonary relationships, Koonin tends to be a "lumper" rather than a "splitter." He will sometimes conclude that two genes or proteins are homologous based on evidence that others find inconclusive.

Creationist Link

Darwin Doubting Heretic Reveals Himself at National Center for Biotechnology
Evolution News & Views
According to Koonin, proteins with similar architecture (folds) are related by common descent even if there's no significant sequence similarity. This is not an uncommon position but it's controversial. Some scientists are not willing to accept that structural similarity alone is sufficient to establish homology. There are too many cases where this assumption leads to awkward and unreasonable implications. Convergence is a possibility that must be entertained.

Koonin has recently published a paper in Biology Direct where he attempts to explain a number of real—or imagined— problems in evolution. In case you're not familiar with this journal, it's a new "open access" journal with an unusual policy. The reviewers must identify themselves and their comments are posted at the end of the paper along with responses from the author(s). Koonin is one of the editors of this journal and he explains the basic philosophy in an editorial [A community experiment with fully open and published peer review].
In Biology Direct, we seek to live by the realities of the 21st century while addressing the issue of information overflow in a constructive fashion and offering a remedy for the ills of anonymous peer review. The journal will publish "essentially anything", even papers that receive three unanimously negative reviews, the only conditions being that three Editorial Board members agree to review (or solicit a review for) the manuscript and that the work qualifies as scientific (not pseudoscientific as is the case for intelligent design or creationism) – and, of course, that the author wants his/her paper published alongside the reviews it receives. Everything in Biology Direct will be completely in the open: the author will invite the referees without any mediation by the Editors or Publisher, and the reviews will be signed and published together with the article. The idea is that any manuscript, even a seriously flawed one, that is interesting enough for three respected scientists to invest their time in reading and reviewing will do more good than harm if published – along with candid reviews written by those scientists. Under the Biology Direct rules, an author is free to solicit as many members of the Editorial Board as s/he has patience for. The philosophy behind this approach is that what really matters is not how many scientists are uninterested in a paper (or even assess it negatively, which could be the underlying reason for declining to review) but that there are some qualified members of the scientific community who do find it worthy of attention. A manuscript will be, effectively, rejected only after the author gives up on finding three reviewers or exhausts the entire Editorial Board. We believe this is fair under the rationale that work that fails, after a reasonable effort from the author, to attract three reviewers is probably of no substantial interest, even if technically solid.
The paper that concerns me here is Koonin (2007) The Biological Big Bang model for the major transitions in evolution. In all cases, the "problems" that Koonin addresses seem to be the rapid and unexplained appearance of novel characteristics, especially those that count as major transitions in evolution. Examples are the origin of cells and the Cambrian explosion.

These events can all be explained by the Biological Big Bang (BBB) model of evolution as Koonin describes in his paper. The concept of the Biological Big Bang is explained in the abstract ...
I propose that most or all major evolutionary transitions that show the "explosive" pattern of emergence of new types of biological entities correspond to a boundary between two qualitatively distinct evolutionary phases. The first, inflationary phase is characterized by extremely rapid evolution driven by various processes of genetic information exchange, such as horizontal gene transfer, recombination, fusion, fission, and spread of mobile elements. These processes give rise to a vast diversity of forms from which the main classes of entities at the new level of complexity emerge independently, through a sampling process. In the second phase, evolution dramatically slows down, the respective process of genetic information exchange tapers off, and multiple lineages of the new type of entities emerge, each of them evolving in a tree-like fashion from that point on.
The BBB model is clearly not gradualistic. Koonin attempts to ally himself with other advocates of episodic change such as Niles Eldredge and Stephen J. Gould. (Where necessary, I have converted Koonin's number references to ones in which the author and date is displayed.)
However, the evolution of life is, obviously, a non-uniform process as described, e.g., in Simpson's classic book [3,4], and captured, more formally, in the punctuated equilibrium concept of Gould and Eldredge [Eldredge and Gould, 1997; Gould and Eldredge, 1993]. Lengthy intervals of gradualist modification are punctuated by brief bursts of innovation that are often called transitions, to emphasize the fact that they culminate in the emergence of new levels of organizational and functional complexity [Maynard Smith and Szathmáry, 1997]
I take issue with Koonin's description of punctuated equilibria (PE). PE is a pattern of evolution that describes speciation events. The morphological changes that characterize a new species are locked in place rapidly during cladogenesis (speciation by splitting). For the most part, these morphological changes are subtle and it often takes an expert to recognize them in the best documented cases. Furthermore, these small changes occur repeatedly in a number of distinct cladogenesis events spread out over tens of millions of years. The result is a very distinctive and well-defined phylogenetic tree that defines the punctuated equilibrium pattern. .

Science Link

Examine macroevolutionary concepts carefully
Nick Matzke
PE has nothing to do with the major transitions that Maynard Smith and Szathmáry discuss in their books (The Major Transitions in Evolution, The Origins of Life). Nor is PE related in any way to the big bang problems that Koonin is addressing in this paper. It's difficult to decide whether Koonin misunderstands punctuated equilibria or whether he is just stretching an analogy. I suspect the former.

It is no coincidence that the Biological Big Bang model borrows terminology from cosmology. Koonin is very explicit about the similarities between his evolutionary model and the cosmological model for the origin of the universe. As a matter of fact, he explicitly addresses this issue in response to one of the reviewers (William Martin) who challenges the comparison. To me, the comparison between biology and cosmology seems forced and I think he weakens his case considerably by comparing the origin of the universe to the Cambrian explosion. Comparisons like that—and the false analogy with punctuated equilibria—contribute to the sense of unease that I had on finishing the paper. One has the distinct impression that Koonin is grasping at straws in order to knock down a strawman.

What are the major problem with evoluton according to Koonin? Are they strawmen? He identifies six major problems and claims that they can all be explained by a model where "evolutionary transitions follow a general principle that is distinct from regular cladogenesis." The BBB is characterized by a phase of rapid evolution with extensive exchange of genetic information between organisms. This phase is followed by a slow phase of evolution of the sort that generates the typical tree pattern.

Before describing the six examples, we need to address the differences, if any, between Koonin's Biological Big Bang and the "net of life" model that is replacing the traditional tree of life at the deepest levels [The Three Domain Hypothesis (Part 5, Part 6)]. The similarities are obvious. In the net of life model the early stages of evolution involved massive exchanges of genetic information such that it is now impossible to construct a traditional tree relating the major groups of species such as prokaryotes and eukaryotes. What Koonin is doing is to generalize this event "by proposing that a phase of rapid, promiscuous evolution might underlie many, if not most of the major transitions in the history of life."

The six transitions are listed below. Each one is followed by a brief comment where I attempt to evaluate its significance.

1. Origin of protein folds
There seem to exist ~1,000 or, by other estimates, a few thousand distinct structural folds the relationships between which (if existent) are unclear.
There is no reason to postulate that all proteins sharing a common fold will share a common ancestor. Some of these proteins might well have arisen entirely independently and evolved to a common fold by convergence. In other words, the underlying assumption that the origin of protein folds represents evolution of some sort may be false. Furthermore, there is even less reason to think that groups with different folds have an evolutionary relationship. In order for this to be true there would have to have been a primordial protein with one kind of fold that gave rise to a protein with another kind of fold. Instead different polypeptides with little three-dimensional structure (random coils) may have independently evolved into proteins with particular folds.

2. Origins of Viruses
For several major classes of viruses, notably, positive-strand RNA viruses and nucleo-cytoplasmic large DNA viruses (NCLDV) of eukaryotes, substantial evidence of monophyletic origin has been obtained. However, there is no evidence of a common ancestry for all viruses.
The reason why there's no evidence of a common ancestor for all viruses may be because there is no common ancestor for all viruses.

3. Origin of Cells
The two principal cell types (the two prokaryotic domains of life), archaea and bacteria, have chemically distinct membranes, largely, non-homologous enzymes of membrane biogenesis, and also, non-homologous core DNA replication enzymes. This severely complicates the reconstruction of a cellular ancestor of archaea and bacteria and suggests alternative solutions.
The existence of distinct bacterial and archaeal domains is hotly disputed. Many groups of bacteria have distinctive features that distinguish them from other groups. There is no need to postulate a radically new mechanism of evolution that accounts for bacteria and archaea if they really aren't much different than the major branches described below.

4. Origin of the major branches (phyla) of bacteria and archaea
Although both bacteria and archaea show a much greater degree of molecular coherence within a domain than is seen between the domains (in particular, the membranes and the replication machineries are homologous throughout each domain), the topology of the deep branches in the archaeal and, especially, bacterial phylogenetic trees remains elusive. The trees conspicuously lack robustness with respect to the gene(s) analyzed and methods employed, and despite the considerable effort to delineate higher taxa of bacteria, a consensus is not even on the horizon. The division of the archaea into two branches, euryarchaeota and crenarchaeota is better established but even this split is not necessarily reproduced in trees, and further divisions in the archaeal domain remain murky.
In addition to eurarchaeote and crenarchaeota there are other groups of prokaryotes that are easily resolved by current techniques (e.g., cyanobacteria, proteobacteria). It may be difficult to resolve the base of the tree because of extensive horizontal gene transfer as postulated in the "net of life" scenarios. This is the one case, along with #3, where the concept of an unusual type of evolution may be correct. I still don't like the term Biological Big Bang to describe it.

5. Origin of the major branches (supergroups) of eukaryotes
Despite many ingenious attempts to decipher the branching order near the root of the phylogenetic tree of eukaryotes, there has been little progress, and an objective depiction of the state of affairs seems to be a "star" phylogeny, with the 5 or 6 supergroups established with reasonable confidence but the relationship between them remaining unresolved.
Substantial progress has been made but the problem is very difficult because of the lack of reliable phylogenetic markers. That, plus the fact that we are trying to sort out events that took place more than one billion years ago. It is too early to conclude that our inability to reach consensus means that something strange must have been going on. That's a cop-out at this time.

6. Origin of the animal phyla
The Cambrian explosion in animal evolution during which all the diverse body plans appear to have emerged almost in a geological instant is a highly publicized enigma [32-35]. Although molecular clock analysis has been invoked to propose that the Cambrian explosion is an artifact of the fossil record whereas the actual divergence occurred much earlier [36,37], the reliability of these estimates appears to be questionable [38]. In an already familiar pattern, the relationship between the animal phyla remains controversial and elusive.
Actually, the relationships between animal phyla are quite well understood at the molecular level. The lineages do not appear to be scrambled by excessive recombination between species as Koonin's hypothesis would require.

The Cambrian explosion is an interesting example of fairly rapid morphological evolution. It may be true that dozens of independent animal lineages simultaneously acquired a new mechanism of evolution during the Cambrian but this does not seem to be the most parsimonious explanation of the data.

To sum up, I don't think that Koonin's examples cry out for explanation in the way he thinks they do. Some of them may not even be examples of evolution. It seems reasonable to attribute the origins of the major groups of bacteria, and the first eukaryotic cells, to a rapid exchange of genes in the beginning phase of life on Earth, but there's no need to postulate that this promiscuous phase was ever repeated at other stages and certainly no reason to assume that there were repeated waves of promiscuity followed by quiescent phases of stabilizaton.

The Biological Big Bang is not so much wrong as it is unnecessary.


Koonin, E. (2007) The Biological Big Bang model for the major transitions in evolution. Biology Direct 2:21doi:10.1186/1745-6150-2-21.

[Photo Credit: The Tree of life is from Ford Doolittle's Scientific American article "Uprooting the Tree of Life" (February 2000). © Scientific American.]

Friday, April 25, 2014

ASBMB Core Concepts in Biochemistry and Molecular Biology: Molecular Structure and Function

Theme

Better Biochemistry
The American Society for Biochemistry and Molecular Biology (ASBMB) has decided that the best way to teach undergraduate biochemistry is to concentrate on fundamental principles rather than facts and details. This is an admirable goal—one that I strongly support.

Over the past few months, I've been discussing the core concepts proposed by Tansey et al. (2013) [see Fundamental Concepts in Biochemistry and Molecular Biology]. The five concepts are:
  1. evolution [ASBMB Core Concepts in Biochemistry and Molecular Biology: Evolution ]
  2. matter and energy transformation [ASBMB Core Concepts in Biochemistry and Molecular Biology: Matter and Energy Transformation]
  3. homeostasis [ASBMB Core Concepts in Biochemistry and Molecular Biology: Homeostasis]
  4. biological information [ASBMB Core Concepts in Biochemistry and Molecular Biology: Biological Information]
  5. macromolecular structure and function [ASBMB Core Concepts in Biochemistry and Molecular Biology: Molecular Structure and Function]

Wednesday, May 30, 2007

Do You Trust Scientists?

 
Last September (2006) John Wilkins wrote a series of posting on why Creationists reject evolution/science. I highly recommend that you read all four essays right now.
  1. Why are creationists creationist?
  2. Why are creationists creationist? 2 - conceptual spaces
  3. Why are creationists creationist? 3: compartments and coherence
  4. Why are creationists creationist? 4: How to oppose anti-science
John explains that much of science is not intuitively obvious and children have a natural tendency to resist notions that go against what they see as common sense. They will encounter serious problems if the authority figures in their lives, such as parents and pastors, are telling them stories that conflict with what they hear in school—especially if these anti-science authority figures are reinforcing their naive common sense notions of the natural world. John outlines the various defensive mechanisms that people adopt when faced with such a dilemma.

Part of the problem is how we present science in a culture that is pre-disposed to mistrust it. As John points out in essay #4, we need to work on making science more trustworthy.
The crucial way to get people to trust science is to show them, by letting them do it, that science is the premier way to learn about the world. Science is a learning process that relies on no single person, but which each individual can engage in. I'm sure science teachers have been trying to get this message across for years, but have been swamped by the demands of curricula designed to make students tertiary ready. A better bet would be to educate the population first, and offer ways in which those who are really committed to science, and are therefore much more likely to actually become scientists or otherwise benefit from it, can become ready for the later education.

This will have a benefit - the policy makers, usually elected from the general population of non-scientists, will understand that even if they do not understand the particular discipline that is cognitively relevant to a given social issue, like global warming or HIV AIDS, that the reasons why the specialists assert these claims is not a matter of simple social construction or dogmatic faith. They may even be better able to assess these claims on their merit, and to critically reject those that are fashionable among scientists but lack the necessary evidentiary support.
I agree that this is a problem. It's more of a problem in some cultures than in others but everyone who is interested in promoting rationalism over superstition should pay attention. Where I might disagree slightly with John is that I think we need a two-pronged attack. Not only do we need to increase the status of science but we need to weaken the hold of religion.

In today's posting, John reiterates these themes [Antiscience is learned in childhood] by referring to a recently published article by by psychologists Paul Bloom and Deena Skolnick Weisberg. Here's the link to their article in The Edge [WHY DO SOME PEOPLE RESIST SCIENCE?].

Bloom and Weisberg make some of the same points that John makes about how children learn. Those are interesting points but I want to focus on whether scientists can be trusted. Here's what Bloom and Skolnick say,
In sum, the developmental data suggest that resistance to science will arise in children when scientific claims clash with early emerging, intuitive expectations. This resistance will persist through adulthood if the scientific claims are contested within a society, and will be especially strong if there is a non-scientific alternative that is rooted in common sense and championed by people who are taken as reliable and trustworthy. This is the current situation in the United States with regard to the central tenets of neuroscience and of evolutionary biology. These clash with intuitive beliefs about the immaterial nature of the soul and the purposeful design of humans and other animals — and, in the United States, these intuitive beliefs are particularly likely to be endorsed and transmitted by trusted religious and political authorities. Hence these are among the domains where Americans' resistance to science is the strongest.

We should stress that this failure to defer to scientists in these domains does not necessarily reflect stupidity, ignorance, or malice. In fact, some skepticism toward scientific authority is clearly rational. Scientists have personal biases due to ego or ambition—no reasonable person should ever believe all the claims made in a grant proposal. There are also political and moral biases, particularly in social science research dealing with contentious issues such as the long-term effects of being raised by gay parents or the explanation for gender differences in SAT scores. It would be naïve to ignore all this, and someone who accepted all "scientific" information would be a patsy. The problem is exaggerated when scientists or scientific organizations try to use their authority to make proclamations about controversial social issues. People who disagree with what scientists have to say about these issues might reasonably infer that it is not safe to defer to them more generally.

But this rejection of science would be mistaken in the end. The community of scientists has a legitimate claim to trustworthiness that other social institutions, such as religions and political movements, lack. The structure of scientific inquiry involves procedures, such as experiments and open debate, that are strikingly successful at revealing truths about the world. All other things being equal, a rational person is wise to defer to a geologist about the age of the earth rather than to a priest or to a politician.

Given the role of trust in social learning, it is particularly worrying that national surveys reflect a general decline in the extent to which people trust scientists. To end on a practical note, then, one way to combat resistance to science is to persuade children and adults that the institute of science is, for the most part, worthy of trust.
So here's the problem. How do we convince people that scientists are worthy of trust? It's clear that the front lines are at the interface between what scientists know and what the general public knows about science. This is often framed as an issue about communicating science. Many non-scientists think that scientists need to do a better job. Is this really the problem?

The "burden" of communicating science is often assumed to fall on the shoulders of science writers and science journalists. They are the ones who write the press releases and increasingly they are the ones who write about science in newspapers and magazines. The leading "science" figures on television today are not scientists but science journalists. Even in the leading science journals such as Nature and Science it's the science journalists and not scientists who write the articles that will be read by a wide audience.

In today's world we have a rather paradoxical situation where non-scientists who write about science are proclaiming themselves to be experts on science communication, yet they call upon scientists to learn from them how to manipulate the media to get the science message across. But if science journalists are doing such a good job then why do we need scientists? Is it possible that the failure to make science a trustworthy enterprise is due, in part, to the failure of science journalism?

I'd like to explore this question further.

Tuesday, June 19, 2007

RNA Splicing: Introns and Exons

Most eukaryotic protein-encoding genes are interrupted. The coding regions are divided into numerous blocks called "exons" and the exons are separated by "introns."

An example is shown below. The triose phosphate isomerase (TPI) gene from maize is composed of 9 exons and 8 introns. (Triose phosphate isomerase is one of the enzymes in the glycolysis/gluconeogenesis pathway.)


The top line is a cartoon representation of the TPI gene with each exon in a different color. The thick gray lines between them represent the introns. The gene is transcribed from left (5′) to right (3′) beginning at the promoter (P). The long primary RNA transcript contains both intron and exon sequences. Subsequent processing of this primary transcript results in modification of the 5′ end by addition of an m7 GTP cap and modification of the 3′ end by addition of adenylate (A) residues to form the poly A tail. More importantly, the introns are spliced out and the exon sequences are fused to form the mature mRNA. This mRNA is then transported to the cytoplasm where it is translated into protein.

Note that all the coding regions in the exons (hatched) are contiguous in the mature mRNA. The relationship between the exons and the structure of the protein is shown on the right where the color of each segment of the protein corresponds to the color of the exons in the upper figure. There is no correlation between the exons and any protein domains or motifs. (It used to be thought that exons corresponded to domains in the protein.)

The splicing reaction is complicated. The cell must cleave the primary transcript at each end of the intron while holding on to the flanking exons so the chopped RNA transcript does not come apart. Then the two exons have to be joined together. For protein-encoding genes the splicing reactions are catalyzed by an RNA/protein complex called a spliceosome. In some cases, the introns can be thousands of nucleotides long—much longer than the exons.

Let's look at a simplified version of this reaction. The various components of the spliceosome have to assemble at the 5′ (left) end of an intron and at the 3′ end. There's a third site in the middle called the branch site. All three sites are identified by specific short sequences in the primary transcript as shown below.


These are the consensus sequences for vertebrates, including us. The splice site and branch site sequences in other species are similar but not identical.

In the first step of the splicing reaction, the various components of the spliceosome bind to the 5′ splice site, the 3′ splice site, and the branch site. Then the three complexes interact with each other to draw together the ends of the intron and position them near the branch site. This forms the spliceosome.

The first reaction involves an attack of the 2′ -OH group of the branch point adenylate residue on the 5′ splice site. This forms an intermediate where the branch site A residue is attached to three different ends of the primary transcript. The structure resembles a lariat or lasso. This is the structure depicted in Monday's Molecule #31.

Meanwhile, the 5′ end of the transcript is still bound to the spliceosome. This is important because it's about to be joined to the next exon and the reaction wouldn't work if the 5′ end were released following the first cleavage reaction.

In the next step, the spliceosome catalyzes the attack of the -OH group at the end of the 5′ exon on the 3′ splice site. This results in cleavage of the 3′ intron/exon junction and joining of the 5′ exon to the 3′ exon. The intron sequence (dark brown) is released as a lariat (looped) structure.

The two reactions are known as transesterification reactions because they require the breaking of one strand of RNA and formation of a new ester linkage. The details are not very important. What's important is to recognize that splicing depends on the correct interaction between the components of the spliceosome and the 5′ and 3′ splice site sequences (and the branch site).

These interactions are mediated by small RNAs that are bound to the spliceosome proteins. These RNAs are called small nuclear RNAs (snRNAs) and they're one example of a host of small RNAs produced by non-protein encoding genes. The snRNA/protein complexes are called small nuclear ribonuclear proteins or snRNPs (snurps).

The snRNAs are complimentary to the splice sites and branch sites and that's how the various snRNPs recognize them. This interaction is very weak since it depends on only three or four base pairs. It can be even less since there are many slice sites that are not perfect matches to the consensus sequences shown above. The relative lack of significant sequence similarity makes splicing a very error-prone reaction.

U1 snRNP recognizes 5′ splice sites, U2 snRNP binds to the branch site, and U5 snRNP binds to the 3′ splice site. A more detailed description of the formation of the splicesome is shown below.