More Recent Comments

Tuesday, January 15, 2008

Greg Laden Gets Suckered by John Mattick

 
Oh dear. Greg Laden reviews a paper from John Mattick's group and he falls for the hype, hook line and sinker. Here's what Greg says [Genes are only part of the story: ncRNA does stuff].
The "Junk DNA" story is largely a myth, as you probably already know. DNA does not have to code for one of the few tens of thousands of proteins or enzymes known for any given animal, for example, to have a function. We know that. But we actually don't know a lot more than that, or more exactly, there is not a widely accepted dogma for the role of "non-coding DNA." It does really seem that scientists assumed for too long that there was no function in the DNA.
I hate to break it to you Greg, but junk DNA is not a myth. It really is true that a huge amount of our genome is junk. It's mostly defective transposons like SINES and LINES [Junk in your Genome: LINEs]. It's a lie that we don't know what most non-coding DNA is doing. We do know. It's not doing anything because it's mostly screwed up transposons and pseudogenes like Alu's.

Mattick may have found a few bits of DNA that encode regulatory RNAs but that's only a small part of the total genome. He, and you, have fallen for excuse #5 of The Deflated Ego Problem.

Ryan Gregory has already tried to teach Greg some real science about junk DNA so I won't pile on any more than I have [Signs of function in non-coding RNAs in mouse brain.].

UPDATE: RPM chimes in to expose the flawed thinking of Greg Laden [How Easy is it to Write About Junk DNA?]


Humans Have Only 20,500 Protein-Encoding Genes

The first drafts of the human genome indicated about 30,000 genes, a number that was very much in line with many predictions that had been made over the years by scientists who were studying the topic. (Other scientists, and most science writers, thought there were about 100,000 genes [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome]).

Since the publication of the first draft, the number of genes has been dropping as annotators eliminate sequences that were falsely attributed to protein-encoding genes. Current estimates suggest there are about 28,000 different genes all together with about 4,000 of them encoding RNA products such as ribosomal RNA, tRNA, and the small RNAs involved in a numer of metabolic processes [Ensembl: Homo sapiens].

A gene encoding a protein will have an open reading frame (ORF) consisting of multiple codons— usually more than 100. Some of these potential protein-encoding genes appear to be unique to humans. They weren't found in the other mammalian genomes that had been sequenced (e.g., mouse, dog). Quite a few scientists took this as evidence for genes that distinguish humans from other mammals. According to them, these unique genes arose during the recent evolution of Homo sapiens and that's why there are no homologues in the other mammalian genomes.

Other scientists looked at the data in a different light. They suspected that these "unique" or "orphan" genes were more likely to be artifacts because they were not conserved. In other words, they reached exactly the opposite conclusion based on their understanding of evolution. Their prediction was that these orphan genes resulted from spurious ORF's and not real genes.

Blogging on Peer-Reviewed ResearchThis problem has been examined by Eric Lander's group in Boston, MA (USA) and the results were published in PNAS (Clamp et al., 2007). Their careful analysis has eliminated most of the orphan genes and the new gene count for protein-encoding genes is now 20,488.

Here's how the authors describe the purpose of their study,

The purpose of this article is to test whether the nonconserved human ORFs represent bona fide human protein-coding genes or whether they are simply spurious occurrences in cDNAs. Although it is broadly accepted that ORFs with strong cross-species conservation to mouse or dog are valid protein-coding genes (7), no work has addressed the crucial issue of whether nonconserved human ORFs are invalid. Specifically, one must reject the alternative hypothesis that the nonconserved ORFs represent (i) ancestral genes that are present in our common mammalian ancestor but were lost in mouse and dog or (ii) novel genes that arose in the human lineage after divergence from mouse and dog.
To begin the study they choose to analyze the 21,895 protein-encoding genes in the Ensembl database. They looked for genes that were related to similar sequences in the mouse and dog genomes. (These are the only two well-characterized non-human, mammalian genomes.) After visual inspection of low scoring sequences they were able to eliminate about 1600 potential genes because they were pseudogenes, transposons, or artifacts of various sorts.

They were left with 19,108 verified genes and 1177 orphan "genes"—human ORF's that were not similar to any gene in the mouse and dog genomes. These genes could be newly evolved genes in the human/primate lineage or ancient genes that had been lost in mice and dogs.

The next step was to categorize the orphan "genes" to see if they looked like real protein-encoding genes. The results indicated that in terms of sequence similarity to the same regions in the mouse and dog genomes, the orphan ORF's were indistinguishable from random sequences. Similarly, the characteristics of the presumed codons of these genes were very different from conserved genes and very similar to random sequences with short accidental reading frames. Thus, the orphan sequences look like artifacts.

To confirm this conclusion, the authors compared the sequences to the macaque and chimpanzee genomes. They were not found in those genomes either.
If the orphans represent valid human protein-coding genes, we would have to conclude that the vast majority of the orphans were born after the divergence from chimpanzee. Such a model would require a prodigious rate of gene birth in mammalian lineages and a ferocious rate of gene death erasing the huge number of genes born before the divergence from chimpanzee. We reject such a model as wholly implausible. We thus conclude that the vast majority of orphans are simply randomly occurring ORFs that do not represent protein-coding genes.
This analysis was extended to the other gene catalogs (Vega, and RefSeq) as well as an updated version of the Ensembl catalog (v38). This resulted identification of an additional 1271 valid genes. Adding in the genes in the mitochondrial genome (13) and the Y chromosome (78) gives a total of 20,470 genes.

Finally, reanalysis of the transposons and pseudogenes revealed 18 cases where a real gene had evolved from an inactive pseudogene. This gives a grand total of 20,488 protein-encoding genes in the human genome.

There are several conclusions that can be drawn from this excellent study.
We show that the vast majority of ORFs without cross-species counterparts are simply random occurrences. The exceptions appear to represent a sufficiently small fraction that the best course is would be consider such ORFs as noncoding in the absence of direct experimental evidence.
This is going to be a major challenge for many workers who prefer to see evolution in a different manner. There are a number of papers that view these orphans sequences as direct evidence that human specific genes had arisen in the recent past. Clamp et al. (2007) are saying that if the sequences aren't present in the macaque and chimpanzee then one should conclude that they are artifacts.

Remember, many of the artifactual genes are supported by EST/cDNA data suggesting that they are transcribed. This study calls that evidence into question—correctly in my opinion—indicating that we should be skeptical of the EST data.
One important biological implication of our results is that truly novel protein-coding genes (encoding at least 100 amino acids) arise only rarely in mammalian lineages. With the current gene catalogs, there are only 168 "human-specific" genes (<1% of the total; only 11 are manually reviewed entries in RefSeq; see SI Table 4). These genes lack clear orthologs or paralogs in mouse and dog, but are recognizable because they belong to small paralogous families within the human genome (2 to 9 members) or contain Pfam domains homologous to other proteins. These paralogous families shows a range of nucleotide identities, consistent with their having arisen over the course of ~75 million years since the divergence from the mouse lineage.
This is an important conclusion and I think it is accurate. There are very few "new" genes in the human genome, and, by implication, in other mammalian genomes. This conclusion is consistent with what we know about evolution but it contradicts studies that purport to show rapid evolution of novel genes and novel regulatory mechanisms in humans.


[Image Credit: The human karyotype is from the Ensembl website.]

Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M.F., Kellis, M., Lindblad-Toh, K. and Lander, E.S. (2007) Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. (USA) 104:19428-19433. [DOI 10.1073/pnas.0709013104]

Digital Object Identifier (DOI)

 
The digital object identifier, or DOI, is a unique identifier that's given to electronic documents. The idea is that it serves as a permalink to the item. An item can be moved to a different webpage but the DOI will always point to it as long as the DOI is undated when the item is moved.

We often encounter these DOI identifiers in online journal articles. For example, a recently published PNAS article has the following DOI 10.1073/pnas.0709013104. I usually forget how to resolve those DOI's. In case I'm not the only one, I thought I'd post the information.

The resolver is locatad at http://dx.doi.org/. So if you want to see the PNAS article you type in the following URL: http://dx.doi.org/10.1073/pnas.0709013104. Try it.


Best Canadian Sci/Tech Blog 2007

 
Nominations have closed and the voting has begun for the best Sci/Tech blog in Canada. Here's the ballot.

The nominees are ....
The only blogs that I've read before today are Eastern Blot and The World's Fair. Should I be reading the others? Please let me know if you think any of these are science blogs worth reading. Several of the Canadian science blogs that I read every day are not on the list.

Is there an easy way of finding out how popular those blogs are? There must be some tool out there that will tell me the average number of visits per day/week/month for each of the nominees.


Monday, January 14, 2008

What Is this Dog Thinking?

 
If you think you know what's going on in the mind of this dog, get over to Friendly Atheist and enter the contest [Friendly Atheist Contest #14: Dog Prays to God].

Remember the rules. According to Hemant Mehta, "Funny and creative answers will have a shot at winning."

You can enter as often as you like.

Here's what the boy is thinking, "This is so embarrassing. I'm soon gonna have to break it to him that I'm an atheist."


Insurance Against Alien Abductions

 
According to some studies, up to four million Americans may have been abducted by aliens [Abduction by Aliens or Sleep Paralysis?]. I often use this information when questioning religious people about the rationality of their inner convictions. As it turns out, most theists reject the silly beliefs of alien abductees without seeing any connection between this and their own proof of God by religious experience.

A group of people have banded together to exploit help those who fear being abducted by aliens. They have prepared special dog tags [www.earthbounddog.com].
Picture yourself lost in the galaxy...UFO sightings and Alien Abductions are on the rise...Will you return to tell the story?

In case of alien abduction these dog tags may save your life. The crucial data an alien will need to get you back to Earth is die stamped into these dog tags.

The design is based on NASA research for the Pioneer 10 Space Mission that used a gold plaque attached to the craft to inform any Extraterrestrials of it's Earthly origin.
You can buy them for only $12.99 (US). I suggest you buy several sets of dog tags for all your close friends. Do not buy them for other people.


[Hat Tip: Bad Astronomy]

Convocation 2007

 
A few months ago I told you about my first convocation as a Professor [Bruce Alberts in Toronto]. Here's the formal photograph of the main participants. Don't we look pretty?




Monday's Molecule #58

 
This is one example of a very common molecule found in every cell. You have to give us the common name of this molecule and identify the species. You'll be pleased to know that I don't need the systematic IUPAC name for this one.

There's a direct connection between this molecule and Wednesday's Nobel Laureate. Your task is to figure out the significance of today's molecule and identify the Nobel Laureate who studied its function. (Hint: The Nobel Laureate is a Canadian—there aren't very many Canadian Nobel Laureates so this is a very big hint.)

The reward goes to the person who correctly identifies the molecule and the Nobel Laureate. Previous winners are ineligible for one month from the time they first collected the prize. There is one ineligible candidates for this week's reward because Sandwalk readers were not very successful in December. The prize is a free lunch at the Faculty Club.

THEME:

Nobel Laureates
Send your guess to Sandwalk (sandwalk(at)bioinfo.med.utoronto.ca) and I'll pick the first email message that correctly identifies the molecule and the Nobel Laureate. Note that I'm not going to repeat Nobel Laureates so you might want to check the list of previous Sandwalk postings.

Correct responses will be posted tomorrow along with the time that the message was received on my server. I may select multiple winners if several people get it right.

Comments will be blocked for 24 hours. Comments are now open.

UPDATE: We have a winner! This one proved to be far more difficult than I imagined. Everyone got the Nobel Laureate (Sidney Altman) but very few people got the molecule correct. Some people failed to identify the species correctly even though I specifically asked for the species. Most people said that the molecule is RNase P but that isn't quite correct.

The molecule is the M1 RNA subunit of RNase P from E. coli. The other subunit is a small protein called the C5 protein cofactor. This RNA is sometimes called RNA P and that would have been an acceptable answer.

Only one person got everything right and that response just arrived a few minutes ago. Congratulations to PonderingFool for knowing that the molecule was the M1 RNA component of E, coli RNase P and the Nobel Laureate is Sidney Altman.



Creation Science Papers

 
Phil Plait of Bad Astronomy must have had a great deal of free time on his hands now that the asteroids have missed us and the galaxy isn't going to be consumed by a hydrogen cloud for at least 40 million years. He was thinking of writing a paper for a new journal sponsored by Answers in Genesis [Creationists: publish and perish].

Phil was interested in the first two papers that were published in the Answers Research Journal. One was a geology paper and one was about microbiology. Phil wanted to know how good they were.

Being as relieved as him about the fact that the Earth survived the near miss, I decided I could spare a few minutes to read the microbiology article. It's by Alan L. Gillen from that famous center of research called Liberty University. Here's the abstract.
The world of germs and microbes has received much attention in recent years. But where do microbes fit into the creation account? Were they created along with the rest of the plants and animals in the first week of creation, or were they created later, after the Fall. These are some questions that creation microbiologists have been asking in recent years. Ongoing research, based on the creation paradigm, appears to provide some answers to these puzzling questions. The answers to these questions are not explicit in Scripture, so the answers cannot be dogmatic. However, a reasonable extrapolation from biological data and Scripture can be made about the nature of microbes in a fully mature creation. This article attempts to provide reasonable answers to when microbes were created and is meant to stimulate discussion and further research in this area.

Very little has been written in Bible commentaries or in creation literature on the subject of when microbes were created. Some have postulated that microbes were created on a single day of Creation, such as Day Three—when the plants were made. This is partially due to the “seed-like” characteristics that bacteria and fungi have—therefore classifying microbes as plants. In addition, we observe microbes (such as Escherichia coli) isolated in the lab and we tend to think of microbes as individual entities much like birds or fish or animals and, therefore, created on a single day. However, in nature, the vast majority of microbes live in biological partnerships, not in total isolation. The natural symbiosis of microbes with other creatures is the norm. Therefore, we postulate that microbes were created as “biological systems” with plants, animals, and humans on multiple days, as supporting systems in mature plants, animals, and humans. This idea is further supported by the work of Francis (2003). Francis calls microbial symbiotic systems a biomatrix, or organosubstrate. He proposes that microbes were created as a link between macroorganisms and a chemically rich but inert physical environment, providing a surface (i.e., substrate) upon which multicellular creatures can thrive and persist in intricately designed ecosystems. From the beginning, God made His creation fully mature, and complex forms fully formed. This would insure continuity and stability for the times to come. Although we cannot be certain as to specifically when the Creator made microbes, it is within His character to make entire interwoven, “packaged” systems to sustain and maintain life.
I didn't read any further.

Phil, the bad news is that this is a pile of crap. The good news is that you won't have to waste very much time writing a paper for this journal. You can probably knock it off in an afternoon.


[Image Credit: The Complete Idiot’s Guide to Just Doing It]

Scientific Illiteracy About Death Rates

 
Here's part of an article on ScienceDaily about death rates in New York City [New York City Death Rate Reaches Historic Low].
The death rate in New York City reached an all-time low in 2006, the Health Department reported today, as the number of deaths fell to 55,391 -- down from 57,068 in 2005 and 60,218 in 2001. Mortality declined in eight leading categories, including diabetes, HIV, chronic lung disease and kidney failure. The only leading killer that increased significantly was substance use (up 8%). Heart disease and cancer remained the city's biggest killers, claiming 21,844 lives and 13,116 lives, respectively. The figures come from the latest Annual Summary of Vital Statistics, the definitive registry of births and deaths in New York City.
The numbers of deaths are not death rates. This is one of my pet peeves. I get angry when newspaper reporters screw it up but this is much worse. It's from a website that's supposed to specialize in science ("Your Source for the Latest Research News").

The raw numbers are available at Summary of Vital Statistics 2006: The City of New York. They show that the death rate did, indeed, fall from 7.0 per 1000 citizens in 2004 to 6.7 per 1000 citizens in 2006. In 1916 it was 14.0 while in 1980, 1990, and 2000 it was 10.0, 10.1, and 7.6 respectively.

The absolute numbers of deaths tells you nothing about death rates. For all we know, the population of New York City could have fallen from 2004 to 2006 and the death rate could have gone up. (Incidentally, if you look at the raw data you'll see an interesting footnote. The rates in 2004-2006 were revised downwards when the 2007 census data for population was used. Previous estimates were based on the population according to the 2000 census.)


[Image Credit: New York City in 1916 from The University of Texas at Austin]

Sunday, January 13, 2008

Scientific Mistakes

 
During a recent discussion with undergraduates, they mentioned that it would be a good idea to discuss the more recent scientific papers in class. They seemed to be very impressed with a course that presented papers published within the past few months.

I pointed out that there's a problem with that kind of course. If the goal of a course is to teach fundamental principles and concepts then it's very unlikely that recent papers are going to advance that goal. Why is that? Because much of the scientific literature is either trivial or incorrect. You don't know that it's trivial or incorrect until some time has passed and other scientists react.1

If the goal of a course is to teach how science is done on a day-to-day basis, then a key part of that course should be to drive home the concept of skepticism. Don't believe everything you read in the latest journals. An important part of that teaching goal is to pick examples of important mistakes in older literature.

John Dennehy has helped us out this week by posting a "citation classic" that turned out to be wrong [This Week's Citation Classic: Being Wrong]. In my opinion, it's far more important to look at examples like this than to expose undergraduates to several dozen hot new papers that are supposedly at the cutting edge.

The paper that John choose is by Paul Boyer who subsequently won the Nobel Prize in Chemistry [Nobel Laureates: Paul Boyer and John Walker] for his work on the mechanism of ATP synthase [How Cells Make ATP: ATP Synthase].


1. Sometimes it takes a long time for scientists to react to mistakes in the literature. Wrong ideas can be perpetuated for decades after they've been refuted, especially if the original papers were widely referenced. I was reminded of this the other day when listening to a graduate student seminar—coincidently, on the structure of ATP synthase. The student posted an old-fashioned, out-of-date view of the citric acid cycle as an introduction to the function of ATP synthase. I have challenged my undergraduate biochemistry class to find a single example of a web site that gets the entire citric acid cycle correct. There's a prize. They can't use the IUBMB site, they can't use my sites, and they can't make one of their own. So far nobody has collected the prize.

How Much Junk in the Human Genome?

Ryan Gregory has another contribution to this question that's well worth a read [Is most of the human genome functional?].

Among other things, Ryan picks on the views of John Mattick who has got to be one of the worst scientists in the field. Whenever I read a paper by Mattick I revise my opinion of the value of peer-reviewed literature. It's bad enough that Mattick has silly ideas but it's even sadder that his "peer" reviewers don't recognize it.

Here's a quote from Mattick that I discussed in my article on the The Central Dogma of Molecular Biology. It's obvious that he doesn't understand the real meaning of the central dogma. Can you pick out the other conceptual flaws in this paragraph? [Hints: Worst Figure Ever and Dog Ass Plots.]
The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals. However, the latter may not be the case in complex organisms. A number of startling observations about the extent of non-protein coding RNA (ncRNA) transcription in the higher eukaryotes and the range of genetic and epigenetic phenomena that are RNA-directed suggests that the traditional view of genetic regulatory systems in animals and plants may be incorrect.

Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25:930-939.


Graduate Students Need to Publish Papers

 
In my field it takes five, or even six, years to complete a Ph.D. program. This time could be significantly reduced if there wasn't pressure on students to produce publishable work. The reduction in time is even more obvious at the M.Sc. level where it often take far more than two years to get a degree.

One could make a case for an M.Sc. degree that was not a "research" oriented degree. These programs would be useful for high school teachers, for example, or patent attorneys, or even physicians.

But those are exceptions. In most research departments the thesis is based on scientific research. Does that research have to produce results that can be published in the scientific literature? Yes it does.

T. Ryan Gregory explains why [Why would advisors encourage students to publish?]. (This is a repost of an article that he published earlier on Genomicron but it's still relevant and topical, especially in our department where we are grappling with the issue of long times to completion.)


[Photo Credit: Graduate students in the Department of Biochemistry 2007-2008.]

Friday, January 11, 2008

Test Your God Logic

 
Here's a quiz you can try to see if your positions on atheism and religion are consistent [Battleground God]. Be careful, this quiz has many pitfalls. I took three hits and a bullet but it's not because I'm illogical, in my opinion. It's because questions can be interpreted in several different ways.

Here's one of the questions that caused me trouble.
It is foolish to believe in God without certain, irrevocable proof that she exists.
I answered "true." What I mean by that is that you require evidence to believe in something. What the authors of the study mean is that "certain, irrevocable proof" is inconsistent with my answer about evolution! They say,
You stated earlier that evolutionary theory is essentially true. However, you have now claimed that it is foolish to believe in God without certain, irrevocable proof that she exists. The problem is that there is no certain proof that evolutionary theory is true - even though there is overwhelming evidence that it is true. So it seems that you require certain, irrevocable proof for God's existence, but accept evolutionary theory without certain proof. So you've got a choice:

Bite a bullet and claim that a higher standard of proof is required for belief in God than for belief in evolution.

Take a hit, conceding that there is a contradiction in your responses.
I did, indeed, reply "true" to the statement that evolutionary theory is essentially true. But that's only because I wasn't given the option of replying to the statement that evolution is a fact. I accept evolution because there is certain proof that it exists. I assumed, incorrectly as it turns out, that they were using "evolutionary theory" as a synonym for "evolution."

In order to be consistent I guess I should have replied that "evolutionary theory" is not essentially true.

Watch out for Question 15. It's also a trap.

Read the comments on FriendlyAtheist. Quite a few people got through the test with no hits. I wonder how they answered the question about evolution.


Vegetarian Humor

 
Today after class we were having a wide-ranging discussion about all kinds of issues when one of my students announced that she was a vegan. She claimed that all us meat-eaters were ignoring the slaughter of animals required to justify our habit.

I retorted with the standard reply that she was conveniently ignoring all the plants that had to die for her. Her response caught me off-guard—it was new to me but may be old hat to you. Anyway, she said, "I'm not vegan because I love animals, I'm a vegan because I hate plants!"

The original quote is ...
I am not a vegetarian because I love animals; I am a vegetarian because I hate plants.

                                             A. Whitney Brown
I like it. From now on I'll say that I'm an omnivore because I hate all living things!






The photo honors National Meatloaf Appreciation Day. At least one animal, and several plants, were seriously injured during the making of this photo.