More Recent Comments

Thursday, January 18, 2007

Gap Penalties

Reed A. Cartwright (De Rerum Natura) has just posted a summary of his recently published paper on the effect of gap costs in sequence alignment [Logarithmic gap costs decrease alignment accuracy].

It sounds esoteric but, in fact, it's a very important problem. Computer driven sequence alignments are behind a great deal of the bioinformatics that's being published today. Surprisingly, no computer program can do as good a job at global sequence alignment as a competent student. This should be cause for concern since it means that all the published work is known to be sub-optimal because the algorithms aren't up to the task. Most workers don't acknowledge this—I suspect they simply don't realize that the alignment programs are inefficient.

Reed looked at a particular problem in sequence alignment. The only difficult part about sequence alignment is placing the gaps that are due to insertions and deletions (indels) arising from the time that two sequences diverged from a common ancestor. During automated sequence alignment the program has to assign a penalty, or cost, for inserting gaps in the alignment. If there was no penalty associated with indels then the program would insert gaps willy-nilly to bring every position into perfect alignment. The idea is to limit the placement of gaps to only those locations where they truly represent an evolutionary event.

The standard penalty is represented by the formula Gk a + bk where Gk is the gap penalty. There are two components to the penalty: "a" is the penalty for creating a gap, and "b" is the penalty for extending it by "k" residues.

Reed tested several other types of gap penalties to see if they did a better job at aligning sequences. You should read his posting to see the surprising result. His paper is available here.

Here's an example of a computer generated multiple sequence alignment from the Pfam database [HSP70 alignments]. The protein is HSP70, the major protein chaperone. If you look at the right-hand side of the first page you can see how the algorithm placed the gaps (represented by dots). Most of you coud do a better job with just a little practice.

Wednesday, January 17, 2007

Doomsday Clock Advances

 
The Bulletin of the Atomic Scientists has moved the Doomsday Clock two minutes closer to midnight ["Doomsday Clock"].

I think they're right. The world is a much more dangerous place now that the nation with the most weapons of mass destruction is threatening to use them.

Australians Debate "The God Delusion"

 
Listen to four Australians discuss The God Delusion. The show was broadcast last October on Australia's ABC Television.

I found the "debate" unsatisfying and somewhat troubling. Germaine Greer comes across as a bit of a kook and the two agnostics just don't get it. The only person who makes any sense is a Jesuit priest!

[Hat Tip: Richard Dawkins' website]

More Little Mosque on the Prairie

 


See part of an episode of Little Mosque on the Prairie. The rest are here.

[Hat Tip: Alex Palazzo]

What Would You Sequence If the Price Were Only $1000?

 
Nature Genetics has a Question of the Year.
The sequencing of the equivalent of an entire human genome for $1,000 has been announced as a goal for the genetics community, and new technologies suggest that reaching this goal is a matter of when, rather than if. What then? In celebration of its upcoming 15th anniversary, Nature Genetics is asking prominent geneticists to weigh in on this question: what would you do if this sequencing capacity were available immediately?
That's an easy one to answer.

My students are involved in several projects that try to figure out the evolution of our favorite gene family [HSP70]. Many of the projects are limited by the lack of complete information on every member of the gene family in certain key species. (See The Evolution of Gene Families for am explanation of why you need to have sequences of every copy.)

So here's a short list of genome sequences that we desperately need in order to address some important issues:
  • any snake; rattlesnake would be good
  • any turtle
  • any bird other than chicken; ostrich or emu would be good, penguin would be awesome
  • lamprey
  • octopus
  • lobster or crab
  • maple tree and dandelion; or any other pair of flowering plants (except rice or Arabidopsis)
  • ginkgo
  • any bryophyte
  • any moss
  • horsetail

[Hat Tip: Hsien Hsien Lei]

Nobel Laureates: Furchgott, Ignarro, and Murad

The Nobel Prize in Physiology or Medicine 1998

"for their discoveries concerning nitric oxide as a signalling molecule in the cardiovascular system"


Robert F. Furchgott, Louis J. Ignarro, and Ferid Murad received the Nobel Prize in Physiology or Medicine in 1998 for their discovery that nitric oxide (NO) is a signalling molecule responsible for dilation of blood vessels. (See How Viagra Works.)

Furchgott and Ignarro independently established that nitric oxide was the active stimulatory molecule in vasodilation. Murad recognized that the stimulatory effect of nitroglycerine on cGMP levels was due to the fact that nitroglycerine produced nitric oxide inside the cell. Nitroglycerine had long been used to treat high blood pressure. In fact, Alfred Nobel, the discoverer of nitroglycerine and the founder of the Nobel Prizes, was treated with nitroglycerine for this problem.

Tuesday, January 16, 2007

How Viagra Works

 
Mondays Molecule was sildenofil (5-[2-ethoxy-5- (4-methylpiperazin-1- ylsulfonyl) phenyl]-1- methyl-3-propyl-1,6-dihydro-7H-pyrazolo [4,3-d] pyrimidin-7-one) better known as its citrate salt, Viagra®.

Viagra® is most often used in the treatment of erectile disfunction. The way it works is to inhibit a specific enzyme called phosphodiesterase-5 located in the smooth muscle of the arteries that supply blood to the penis. In order to understand the significnace of this inhibition, we need a little background.

Nitric oxide (NO) is a chemical produced by special nerve cells called NANC nerve cells. (NANC stands for nonadrenergic-noncholinergic.) Under certain, rather special, conditions the brain sends a signal down the axon of a NANC nerve cell located in the penis. This causes NO to be released into the blood stream in the arteries of the penis.

One of the main roles of NO is to trigger the relaxation of the smooth muscle that lines the arteries. This leads to vasodilation and the lowering of blood pressure. In the penis this causes engorgement as the arteries expand and fill up with blood. The result is an erection that's stimulated by NO.

Nitric oxide acts locally. It diffuses into adjacent cells and binds to an enzyme called guanylyl cyclase. The binding of NO activates the enzyme, stimulating it to produce cyclic guanosine monophosphate or cGMP. The substrate for this reaction is guanosine triphosphate (GTP), a molecule that's similar to ATP except that the base is guanine instead of adenine.

ATP can be also be cyclized to form cAMP—a compound analogous to cGMP. cAMP is a common signal in many hormone-induced signal transduction pathways (and in creating a sense of smell). Like cAMP, cGMP is a signalling molecule. It activates specific enzymes that add phosphate to various proteins causing them to become more, or perhaps less, active. During an erection, the cGMP signal leads to changes in phosphorylation of muscle proteins causing the muscles to relax and the arteries to expand.

As you might expect, cGMP is not infinitely stable; otherwise a man might have an erection forever. cGMP is removed by the action of cGMP phosphodiesterase, which converts it to GMP. The turnover of cGMP in the penis is quite rapid leading to lack of signal unless NO is continually produced by the NANC nerve cells in order to replenish the supply of cGMP by reactivating guanylyl cyclase. This production of NO requires the attention of the brain, which has to keep focused on the task at hand.

The smooth muscle cells in the penis contain a special cGMP phosphodiesterase called phosphodiesterase-5 (PDE5). Sometimes the degradation of cGMP by PDE5 outpaces the production of cGMP by guanylyl cyclase. In such cases, the steady-state levels of cGMP aren't sufficient to signal muscle relaxation and no erection occurs. This is a common cause of erectile disfunction.

Viagra® works by inhibiting PDE5 thus blocking the breakdown of cGMP. This causes levels of cGMP to increase and an erection is prolonged. The structure of the PDE5 enzyme has been solved by Sung et al. (2003) in the presence of bound sildenafil (Viagra®) and two other inhibitors, tadalafil (Cialis®) and vardenafil (Levitra®). The structures are shown as stereo images in the figure below.

The upper image is the PDE5 proetin with overlapping molecules of sildenafil (red) and tadalfil (green) bound to the enzyme. The bottom images shown the structures of the three inhibitors. Viagra® binds to the site where cGMP would normally bind, thus blocking the degradation of cGMP. The structure of Viagra® is similar to cGMP and this exlains why it is such a potent inhibitor.

Sung B-J., Hwang, K.Y., Jeon, Y.H., Lee, J.I., Heo, Y.S., Kim, J.H., Moon, J., Yoon, J.M., Hyun, Y.L., Kim, E., Eum, S.J., Park, S.Y., Lee, J.O., Lee, T.G., Ro, S., and Cho, J.M. (2003) Structure of the catalytic domain of human phosphodiesterase 5 with bound drug molecules. Nature 425:98-102.

The Best Writing on Science Blogs 2006

 
Coturnix (A Blog Around the Clock) has put together an anthology of the best science writing on blogs for 2006. It's being published in a book called "the open laboratory." Read about The Great Unveiling and buy the book. Most of your friends are in it.

None of my contributions made the cut. Maybe next year. I'm still going to buy a copy when I'm in Chapel Hill next weekend.

Ethical Issues in Science

 
One of the things I have to do this week is deal with the teaching of so-called "ethics" in genetics and biochemistry courses. Let me give you two examples in order to focus the debate: genetically modified foods, and a proper diet.

It's almost a requirement these days that introductory genetics courses include a section on genetically modified crops. This invariably leads to tutorials, or labs, or essays, about whether GM-foods are a good thing or not. These discussions are usually lots of fun and the students enjoy this part of the course. Professors are convinced they are teaching ethics and that it's a good thing to show students that ethics is an important part of science.

In introductory biochemistry courses we often have a section on fuel metabolism. That's the part of biochemistry that deals specifically with how your food is converted to energy. It's human biochemistry. In that section of the course the Professor often raises the question of proper diet. Is it okay to eat meat? Are trans fatty acids bad for you? Should you be eating carbohydrates? Our experience is that Professors who teach this section often have very strong opinions and their personal ethical stance is portrayed as scientific fact.

These are two different cases. In the first one, the question is whether the value of debating controversial "ethical" issues outweighs the disadvantages. The biggest downside, in my opinion, is the emphasis on technology as opposed to pure basic science. By giving prominence to "ethical" issues we are emphasizing the consequences of genetic knowledge as it relates to the human condition.

I prefer to spend my time trying to convince students that knowledge for its own sake is valuable. It's hard to do that if the fun part of the course has to do with the application of genetic technology in the creation of genetically modified foods.

The second case involves a different kind of ethics. Here, the students aren't debating whether you should eat trans fatty acids or not. They are being given an ethical perspective disguised as a scientific fact. I don't think this is a good idea. At the very least, the issue should be presented as controversial and students should be encouraged to read the medical literature; which, by the way, has very little to do with the biochemistry being taught in class.

Should students be discussing the benefits of the Atkins diet? Perhaps, but it should be a discussion and not a lecture, right? And does a focus on human eating behavior detract from the importance of basic scientific knowledge? I think it does.

Part of the problem arises from a desire to please the students. How often do we hear the complaint that students aren't interested in biochemistry and genetics? The students are bored by science so we have to add sections on genetically modified foods and genetic screening to our introductory genetics courses. Isn't this strange? Rather than concentrate on making the basic science as interesting and exciting as possible, we cater to the students by giving them the topics they think are interesting. That's no way to educate.

There's another problem; what is ethics? Sometimes it's hard to see the difference between simple controversy and ethics. Sometimes it's hard to define exactly what "ethics" is all about in spite of the fact that "bioethics" is one of the biggest growth industries in science. Here's where a philosopher or two could weigh in.

Monday, January 15, 2007

Plastic Duckies

 
On January 29, 1992 a 40-foot container fell off a ship in the middle of the Pacific Ocean. Inside the container were 29,000 "Floatees," small bathtub toys. There were blue turtles, yellow ducks, red beavers, and green frogs. Over the next few years, these toys washed up on shores all around the Pacific, especially in Indonesia, Australia, and South America [Friendly Floatees].

The story of these Floatees has been told many times. For years beachcombers around the world have been talking about Beachcombing Science from Bath Toys.

Thousands of the Floatees drifted north where they passed through the Bering Straight and became locked in the pack ice north of the Arctic Circle. The prediction was that they would emerge into the Atlantic in 2003 and, sure enough, Floatees started to turn up in New England and Great Britain. More are expected this year [Drake's other armada].

There's a picture of a plastic duckie on the cover of this month's Harpers magazine. The feature story is MOBY-DUCK: Or, the Synthetic Wilderness of Childhood by Donovan Holn. It's a wonderful read. Donaovan Holn has weaved together the story of the Floatees and his personal voyage of discovery. As you follow along you will learn about ocean currents, flotsam and jetsam, beachcombing, childhood, and so much more.

Astrobiology: A Null Set

 
Phil Plait of Bad Astronomy recently lost out to PZ Mierz of Pharyngula in the contest for best blog. His penalty for not getting enough astronomy enthusiasts to cast ballots is to write something about biology.

So naturally he chooses Astrobiology as his example— a discipline without a single living example. Typical astronomer, taking the easy way out.

As I tell my students, biology is much harder than physics and astronomy. Any biologist can handle physics with their eyes closed but physics students (and Professors) are afraid of biology. It's way too messy for them.

The Logic of Irreducible Complexity

 
Ross Thomas (HALFaCANUCK) uses predicate calculus to analyze whether the following argument,
the irreducibly complex nature of the eye proves God's existence
is logically correct. [Irreducible illogicality] The answer will surprise you.

P.S. Don't tell the IDiots about this one!

Alanis Morissette Doesn't Get Irony

 
Guy Kawasaki interviews Jon Winokur [Ten Questions with Jon Winokur: How to Heighten Your Sense of the Absurd]. In response to a question about what he's working on now (Q12) Winokur say he's writing a book called The Big Curmudgeon. Winokur then goes ont to say,
It drives me crazy when people say “ironic” when they mean “coincidental.” The classic example is Morissettian Irony, which I define in the book as “irony based on a misapprehension of irony, i.e., no irony at all.” It’s named for the pop singer Alanis Morissette, whose hit single, “Ironic” mislabels coincidence and inconvenience as irony.

In the song, situations purporting to be ironic are merely sad, random, or annoying (“It's a traffic jam when you're already late/It's a no-smoking sign on your cigarette break”). In other words, “Ironic” is an un-ironic song about irony. Which, of course, is ironic in itself. But wait, there’s more, a “bonus irony” if you will: “Ironic” has been cited as an example of how Americans don’t get irony, despite the fact that Alanis Morissette is Canadian!
I hate it when people don't get irony ... or sarcasm.

[To see the video, go to Alanis Morissette, click on "music" then on "ironic" at the bottom, third from the left.]

[Hat Tip: Jim Lippard]

[Photo Credit: Agência Brasil disponibiliza, gratuitamente, imagens e fotos. Para cumprir a legislação em vigor, solicitamos aos nossos usuários a gentileza de registrar os créditos como no exemplo: nome do fotógrafo—via Wikipedia.]

Basic Concepts: The Central Dogma of Molecular Biology

The demise of the Central Dogma of Molecular Biology is becoming an annual event. Most recently, it was killed by non-coding RNA (ncRNA) (Mattick, 2003; 2004). In previous years the suspects included alternative splicing, reverse transcriptase, introns, junk DNA, epigenetics, RNA viruses, trans-splicing, transposons, prions, epigenetics, and gene rearrangements. (I’m sure I’ve forgotten some.)

What’s going on? The Central Dogma sounds like the backbone of an entire discipline. If it’s really a “dogma” how come it gets refuted on a regular basis? If it’s really so “central” to the field of molecular biology then why hasn’t the field collapsed?

In order to answer these questions we need to understand what the Central Dogma actually means. It was first proposed by Francis Crick in a talk given in 1957 and published in1958 (Crick, 1958). In the original paper he described all possible directions of information flow between DNA, RNA, and protein. Crick concluded that once information was transferred from nucleic acid (DNA or RNA) to protein it could not flow back to nucleic acids. In other words, the final step in the flow of information from nucleic acids to proteins is irreversible.

Fig. 1. Information flow and the sequence hypothesis. These diagrams of potential information flow were used by Crick (1958) to illustrate all possible transfers of information (left) and those that are permitted (right). The sequence hypothesis refers to the idea that information encoded in the sequence of nucleotides specifies the sequence of amino acids in the protein.
Crick restated the Central Dogma of Molecular Biology in a famous paper published in 1970 at a time when the premature slaying of the Central Dogma by reverse transcriptase was being announced (Crick, 1970). According to Crick, the correct, concise version of the Central Dogma is ...
... once (sequential) information has passed into protein it cannot get out again (F.H.C. Crick, 1958)
The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred from protein to either protein or nucleic acid. (F.H.C. Crick, 1970)
Announcing the (Premature) Death of the Central Dogma

The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals. However, the latter may not be the case in complex organisms. A number of startling observations about the extent of non-protein coding RNA (ncRNA) transcription in the higher eukaryotes and the range of genetic and epigenetic phenomena that are RNA-directed suggests that the traditional view of genetic regulatory systems in animals and plants may be incorrect.

Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25:930-939.


The central dogma, DNA makes RNA makes protein, has long been a staple of biology textbooks.... Technologies based on textbook biology will continue to generate opportunities in bioinformatics. However, more exciting prospects may come from new discoveries that extend or even violate the central dogma. Consider developmental biology. The central dogma says nothing about the differences between the cells in a human body, as each one has the same DNA. However, recent findings have begun to shed light on how these differences arise and are maintained, and the biochemical rules that govern these differences are only being worked out now. The emerging understanding of developmental inheritance follows a series of fundamental discoveries that have led to a realization that there is more to life than the central dogma.

Henikoff, S. (2002) Beyond the central dogma. Bioinformatics 18:223-225.


It will take years, perhaps decades, to construct a detailed theory that explains how DNA, RNA and the epigenetic machinery all fit into an interlocking, self- regulating system. But there is no longer any doubt that a new theory is needed to replace the central dogma that has been the foundation of molecular genetics and biotechnology since the 1950s.

The central dogma, as usually stated, is quite simple: DNA makes RNA, RNA makes protein, and proteins do almost all of the work of biology.


Gibbs. W.W. (2003) The unseen genome: gems among the junk. Sci. Am. 289:26-33.
Unfortunately, there’s a second version of the Central Dogma that’s very popular even though it’s historically incorrect. This version is the simplistic DNA → RNA → protein pathway that was published by Jim Watson in the first edition of The Molecular Biology of the Gene (Watson, 1965). Watson’s version differs from Crick’s because Watson describes the two-step (DNA → RNA and RNA → protein) pathway as the Central Dogma. It has long been known that these conflicting versions have caused confusion among students and scientists (Darden and Tabery, 2005; Thieffry, 1998). I argue that as teachers we should teach the correct version, or, at the very least, acknowledge that there are conflicting versions of the Central Dogma of Molecular Biology.

The pathway version of the Central Dogma is the one that continues to get all the attention. It’s the version that is copied by almost all textbooks of biochemistry and molecular biology. For example, the 2004 edition of the Voet & Voet biochemistry textbook says,
In 1958, Crick neatly encapsulated the broad outlines of this process in a flow scheme he called the central dogma of molecular biology: DNA directs its own replication and its transcription to yield RNA, which, in turn, directs its translation to form proteins. (Voet and Voet, 2004)
If the Watson pathway version of the Central Dogma really was the one true version then it would have been discarded or modified long ago. In his original description, Watson drew single arrows from DNA to RNA and from RNA to protein and stated ....
The arrow encircling DNA signifies that it is the template for its self-replication; the arrow between DNA and RNA indicates that all cellular RNA molecules are made on DNA templates. Most importantly, both these latter arrows are unidirectional, that is, RNA sequences are never copied on protein templates; likewise, RNA never acts as a template for DNA.
Fig. 2. Watson’s version of the Central Dogma. This figure is taken from the first edition of The Molecular Biology of the Gene (p. 298).
Watson's statement is clearly untrue, as the discovery of reverse transcriptase demonstrated only a few years after his book was published. Furthermore, there are now dozens of examples of information flow pathways that are more complex than the simple scheme shown in Watson’s 1965 book. (Not to mention the fact that many information flow pathways terminate with functional RNA’s and never produce protein.)

Watson’s version of the Central Dogma is the one scientists most often refer to when they claim that the Central Dogma is dead. The reason it refuses to die is because it is not the correct Central Dogma. The correct version has not been refuted.

Crick was well aware of the difference between his (correct) version and the Watson version. In his original 1958 paper, Crick referred to the standard information flow pathway as the sequence hypothesis. In his 1970 paper he listed several common misunderstandings of the Central Dogma including ....
It is not the same, as is commonly assumed, as the sequence hypothesis, which was clearly distinguished from it in the same article (Crick, 1958). In particular, the sequence hypothesis was a positive statement, saying that the (overall) transfer nucleic acid → protein did exist, whereas the central dogma was a negative statement saying that transfers from protein did not exist.
The Sequence Hypothesis and the Central Dogma in 1957

My own thinking (and that of many of my colleagues) is based on two general principles, which I shall call the Sequence Hypothesis and the Central Dogma. The direct evidence for both of them is negligible, but I have found them to be of great help in getting to grips with these very complex problems. I present them here in the hope that others can make similar use of them. Their speculative nature is emphasized by their names. It is an instructive exercise to attempt to build a useful theory without using them. One generally ends in the wilderness.

The Sequence Hypothesis. This has already been referred to a number of times. In its simplest form it assumes that the specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and that this sequence is a (simple) code for the amino acid sequence of a particular protein.

This hypothesis appears to be rather widely held. Its virtue is that it unites several remarkable pairs of generalizations: the central biochemical importance of proteins and the dominating role of genes, and in particular of their nucleic acid; the linearity of protein molecules (considered covalently) and the genetic linearity within the functional gene, as shown by the work of Benzer and Pontecorvo; the simplicity of the composition of protein molecules and the simplicity of nucleic acids. Work is actively proceeding in several laboratories, including our own, in an attempt to provide more direct evidence for this hypothesis.

The Central Dogma. This states that once “information” has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.


Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163 quoted in Judson, H.F. The Eight Day of Creation, Expanded Edition (1979, 1996) p. 332.
So, how do we explain the current state of the Central Dogma? The Watson version is the one presented in almost every textbook, even though it is not the correct version according to Francis Crick. The Watson version has become the favorite whipping boy of any scientist who lays claim to a revolutionary discovery, even though a tiny bit of research would uncover the real meaning of the Central Dogma of Molecular Biology. The Watson version has been repeatedly refuted or shown to be incomplete, and yet it continues to be promoted as the true Central Dogma. This is very strange.

The Crick version is correct—it has never been seriously challenged—but few textbooks refer to it. One exception is Lewin’s GENES VIII (Lewin, 2004) (and earlier editions). Lewin defines the Central Dogma of Molecular Biology as,
The central dogma states that information in nucleic acid can be perpetuated or transferred but the transfer of information into protein is irreversible. (B. Lewin, 2004)
I recommend that all biochemistry and molecular biology teachers adopt this definition—or something very similar—and teach it in their classrooms.

Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163. [PDF]
Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]
Darden, L. and Tabery, J. (2005) Molecular Biology
Lewin, B. (2004) GENES VIII Pearson/Prentice Hall
Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25:930-939
Mattick, J.S. (2004) The hidden genetic program of complex organisms. Sci. Am. 291:60-67.
Thieffry, D. (1998) Forty years under the central dogma. Trends Biochem. 23:312-316.
Watson, J.D. (1965) The Molecular Biology of the Gene. W.A. Benjamin. Inc. New York

Chapel Hill, North Carolina



I'm going to Chapel Hill next weekend. It's one of my favorite places in the world. My daughter lives there. (Hi Jane, are you ready?)

I'm going to meet some bloggers.