More Recent Comments

Tuesday, February 14, 2012

The Cost of Introns

Michael Lynch estimates that the cost of adding an intron to an intronless gene is equivalent to adding about 31 bp of essential target (Lynch, 2010). This is roughly the number of base pairs in an average intron that have to be preserved in order for the intron to be properly spliced. Adding an intron increases the chances that a gene will be inactivated by mutation.

In spite of this deleterious cost, introns have spread in certain genomes; notably, in mammals and flowering plants. How do we explain the spread of introns? Is it consistent with the null hypothesis of random genetic drift?

According to Lynch the answer could be, yes. Here's what he says in his book The origins of genome architecture.
For newly arisen introns having no functional significance for the products of their host genes, the primary force opposing their ability to spread throughout a population is their excess mutation rate to defective allele(s), and because this force is expected to be quite weak, selection will be ineffective in preventing intron colonization in populations experiencing substantial levels of random genetic drift.
The selection coefficient for intron deletion has to be above a certain threshold in order to prevent introns from spreading. This threshold depends on the population size: in large populations the deleterious effect of introns is sufficient to ensure that they will be kept to a minimum, or eliminated entirely.

For species with small populations there will be a cutoff where the selection coefficient cannot overcome the effect of random genetic drift and intron insertion is effectively neutral.

Lynch calculates the cost of the extra target nucleotides as a function of the mutation rate (μ) and explains why the cutoff is 2Ngμ = 0.04 (Ng is the effective number of genes ~ 2Ne). You can estimate 2Ngμ by counting the nucleotide diversity at silent sites in protein-encoding genes (πs). Thus, a plot of number of introns vs πs [2Ngμ] is a test of the hypothesis.

Here's the figure from Lynch's book.


The data indicates that species with small values of πs the spread of introns cannot be prevented even though introns may be deleterious. The cutoff is about 0.04 as predicted.

This does not prove that intron proliferation in some species is due to random genetic drift but it does show that the hypothesis cannot be ruled out. There's no need to invoke adaptive explanations for the initial spread of introns in vertebrate and plants genomes.


Lynch, M. (2010) Rate, molecular spectrrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 107:961-968. [doi: 10.1073/pnas.0912629107]

Monday, February 13, 2012

Monday's Molecule #159

 
This is a very important molecule for many reasons. Versions of it are found in all living species. Its function is essential in most (all?) species.

Pieces of this molecule are found in many viruses. The molecule also plays a prominent role in the debate about junk DNA.

Identify the molecule—the common name will do. You don't have to specify the species but anyone who does will get an honorable mention if the winner doesn't. Post your answer in the comments. I'll hold off releasing any comments for 24 hours. The first one with the correct answer wins. I will only post correct answers to avoid embarrassment.

There could be two winners. If the first correct answer isn't from an undergraduate student then I'll select a second winner from those undergraduates who post the correct answer. You will need to identify yourself as an undergraduate in order to win. (Put "undergraduate" at the bottom of your comment.)

Some past winners are from distant lands so their chances of taking up my offer of a free lunch are slim. (That's why I can afford to do this!)

In order to win you must post your correct name. Anonymous and pseudoanonymous commenters can't win the free lunch. I'm about to leave for Los Angeles for a couple of weeks but I promise to organize lunch dates as soon as I get back at the end of February.

Winners will have to contact me by email to arrange a lunch date.

UPDATE: It's 7SL RNA from humans. The winner is Joseph C. Somody.

Winners
Nov. 2009: Jason Oakley, Alex Ling
Oct. 17: Bill Chaney, Roger Fan
Oct. 24: DK
Oct. 31: Joseph C. Somody
Nov. 7: Jason Oakley
Nov. 15: Thomas Ferraro, Vipulan Vigneswaran
Nov. 21: Vipulan Vigneswaran (honorary mention to Raul A. Félix de Sousa)
Nov. 28: Philip Rodger
Dec. 5: 凌嘉誠 (Alex Ling)
Dec. 12: Bill Chaney
Dec. 19: Joseph C. Somody
Jan. 9: Dima Klenchin
Jan. 23: David Schuller
Jan. 30: Peter Monaghan
Feb. 7: Thomas Ferraro, Charles Motraghi


Sunday, February 12, 2012

Is "Out-of-Africa" Dead or Just Severely Wounded?

 
Here's a good summary from the New York Times [DNA Turning Human Story Into a Tell-All]. The story is written by Alanna Mitchell. John Hawks links to it [Denisova in the news], suggesting that he thinks it's pretty accurate.

The opening paragraphs of the New York Times story emphasize the controversy ...
The tip of a girl’s 40,000-year-old pinky finger found in a cold Siberian cave, paired with faster and cheaper genetic sequencing technology, is helping scientists draw a surprisingly complex new picture of human origins.

The new view is fast supplanting the traditional idea that modern humans triumphantly marched out of Africa about 50,000 years ago, replacing all other types that had gone before.

Instead, the genetic analysis shows, modern humans encountered and bred with at least two groups of ancient humans in relatively recent times: the Neanderthals, who lived in Europe and Asia, dying out roughly 30,000 years ago, and a mysterious group known as the Denisovans, who lived in Asia and most likely vanished around the same time.

Their DNA lives on in us even though they are extinct. “In a sense, we are a hybrid species,” Chris Stringer, a paleoanthropologist who is the research leader in human origins at the Natural History Museum in London, said in an interview.


[Image Credit: The map is from The Human Journey.]

Is Neanderthal Your Distant Cousin or Your Ancestor?

 
The answer to the question is "mostly cousin" or "both," depending on whether you are African, Asian, or European. Lots of evidence suggests that people who migrated out of Africa in the past few hundred thousand years met and mated with indigenous populations of other humans who were already living in Asia and Europe.

The availability of many genome sequences coupled with our sophisticated understanding of population genetics allows workers to estimate how much Neanderthal DNA entered the modern European and Asian populations. It won't be long before similar studies are done with the Densovan genome.

There's lots of discussion about whether the strict "Out-of-Africa" scenario is still valid [Out of who knows where] [The scientists behind Mitochondrial Eve tell us about the "lucky mother" who changed human evolution forever]. There's little doubt that some version of the multiregional hypothesis is correct although it may not be as thorough as the original proponents argued.

John Hawks is an expert on this sort of thing and he's just posted some of his work on his blog [Which population in the 1000 Genomes Project samples has the most Neandertal similarity?]. He doesn't allow comments so you can ask questions here—he's been known to read Sandwalk when he gets bored doing real science.

I'll ask the first questions. John, what percentage of the genomes of Africans, Europeans, and Chinese are derived from the Neanderthal population? Your figure shows the amount of Neanderthal (Vi33.16) intrusion as something called "shared derived variants" but how much of the total genome does that represent? Is any of it in parts of the genome where there are genes?




[Image Credit: I don't know where this picture came from. I got it on Just Another Brooklyn Blog.]

Happy Birthday Charles Darwin

[Reposted from 2008.]

Charles Robert Darwin was born on this day in 1809. Darwin was the greatest scientist who ever lived.

In honor of his birthday, and given that this is a year of politics in America, I thought it would be fun to post something about Darwin's interactions with politicians. The historical account is from Janet Browne's excellent biography (Brown 2002).

William Gladstone (photo below) was an orthodox Christian. He was not a fan of evolution. In March 1877 Gladstone was leader of the Liberal party and a former Prime Minister of the most powerful country in the world. He was spending the weekend with John Lunnock—a well-known liberal—and a few other friends, including Thomas Huxley.

They decided to walk over to Darwin's House in Downe. This was 18 years after the publication of Origins and Darwin was a famous guy. The guests were cordially received by Darwin and his wife Emma. Darwin and Emma were life-long liberals and they were honored by Gladstone's visit. A few days later, Darwin wrote a note to his friend saying,

Our quiet, however, was broken a couple of days ago by Gladstone calling here.—I never saw him before & was much pleased with him: I expected a stern, overwhelming sort of man, but found him as soft & smooth as butter, & very pleasant. He asked me whether I thought that the United States would hereafter play a much greater part in the history of the world than Europe. I said that I thought it would, but why he asked me, I cannot conceive & I said that he ought to be able to form a far better opinion,—but what that was he did not at all let out.
A few years later Gladstone sent Darwin one of his essays on Homer. Darwin gratefully acknowledged the gesture.

In 1881, when Gladstone was Prime Minister again, Darwin and some of his friends petitioned Gladstone to award a pension to Alfred Russel Wallace, who was in dire financial straits at the time. Gladstone granted the request. Two months later Gladstone offered Darwin a position as trustee of the British Museum but Darwin declined. (Remember, Gladstone did not agree with Darwin about evolution, or religion.)

When Darwin died, Gladstone was instrumental in arranging for him to be buried in Westminster Abbey. The funeral was held on April 26, 1882. William Gladstone was too busy to attend. He went to a dinner at Windsor.


Brown, J. (2002) Charles Darwin: The Power of Place (Vol. II). Alfred A. Knopf, New York (USA)

Saturday, February 11, 2012

If I was in L.A. ...

 
It's windy, snowy, and -14°C. Ms. Sandwalk gets it exactly right on her blog [If I was in L.A.].

We'll soon be in Los Angeles for a couple of weeks visiting our granddaughter (and her parents). I hope to meet up with some of you while I'm there. Leave a comment or email me. My email address is "l" period "moran" and the domain name is "utoronto" period "ca".


Note to grammar police. I know the proper use of subjunctive mood. I suspect John Phillips does as well. Denny Doherty was Canadian and was almost certainly taught proper grammar in school. Mama Cass attended university so it's likely she too knew about subjunctives. As for Michelle ... well, three out of four ain't bad.

Friday, February 10, 2012

How to Turn a University Into a Glorified High School

 
Ian D. Clark is Professor of Public Policy and Governance here at the University of Toronto. He has attracted a lot of attention lately because he and his colleagues advocate the creation of Teaching-only Universities in Ontario. The scary part of this ridiculous idea is that it might soon become official policy of the Ontario government as described in a recent article by Louise Brown in The Toronto Star [Teaching-only universities would cut education costs, author says].
Undergraduate universities that focus on teaching only would create cosier classes, cut salary costs and boost student satisfaction, argues Ian Clark, the former head of the Council of Ontario Universities.

Moreover, he says professors at these new universities should be required to teach twice as many courses as usual — a full 80 per cent of their time with 10 per cent left for research and 10 per cent for administration.

Clark and professor David Trick are co-authors of a controversial new book that calls for new teaching-oriented universities where profs would have much higher course-loads. Simply by doubling the number of courses a professor teaches each semester to four from two could cut the operating cost of educating a student to $9,800 from $14,300 at a campus of 10,000, Clark noted Tuesday at a conference sponsored by the University of Toronto’s Ontario Institute for Studies in Education.

Having profs teach more courses is one cost-saving tip rumoured to be part of economist Don Drummond’s report next week to Premier Dalton McGuinty.

How Did the Zebra Get Its Stripes?

We've been discussing the adaptationist approach to biology on another thread and this is a good example to illustrate the issues. If I were to ask you how the zebra got its stripes, what would you think?

Would you immediately assume that it could be an evolutionary accident with no adaptive significance then start to wonder if you could rule out such an explanation? Can random genetic drift of neutral alleles explain the zebra's stripes?

Or would you immediately start thinking of adaptive explanations for why all three extant species of zebras have stripes but no other large mammals in the same environment are striped. Most other horses don't have prominent stripes but many have faint stripes on some parts of their bodies (Darwin, 1859).

I argue that you have to rule out the null hypothesis (drift) before invoking adaptationist explanations. In other words, the first question you need to ask is whether zebra stripes are adaptive. But that's not the adaptationist approach. Adaptationists begin with the assumption that stripes are adaptive, then they start looking for adaptive explanations.

What if the favorite adaptive explanation is refuted? What does an adaptationist do next? Gould and Lewontin (1978) provide the answer ...

Thursday, February 09, 2012

Remember Chris Mooney?

 
Chris Mooney has achieved some remarkable goals in the past ten years or so. He's an atheist accommodationist who doesn't understand atheism or accommodation. He's a science journalist who doesn't understand science or the important elements of science journalism (it's all about spin framing). He's a supporter of evolution but he's only interested in American politics—he doesn't actually understand evolution.

Nisbet & Mooney Reveal Their True Colors
Matthew Nisbet and Chris Mooney Video on Framing Science
Changing Minds Through Science Communication
For Once, Chris Mooney Talks Sense
The Future of Science Journalism
Science Journalism in Decline
Some scientists are astrologers, therefore science and astrology are compatible
Chris Mooney Changed His Mind
The Doctrine of Joint Belief
Chris Mooney and Sheril Kirshenbaum in Newsweek
Boring ....
The Difference between Truth and Framing
Chris Mooney vs Atheists: Part XXXIV
Chris Mooney Asks a Hard Question
The Great Accommodationist Dud

Chris is about to publish a new book called The Republican Brain: The Science of Why They Deny Science--and Reality. Like most professional writers he knows that he has to hype the book in order to get people to buy it so he's started the campaign with an article on HuffPost under their new "Science" category [Want to Understand Republicans? First Understand Evolution].

The main theme is that Republicans are genetically different from Democrats and that difference is due to evolution. No, I'm not kidding ....

Jerry Coyne says "huh?" [Chris Mooney, evolution, and politics]. Get on over to Coyne's blog website and join the fun.1 Can Mooney, the science journalist, screw up evolution? ... let's count the ways.


1. What would we do for fun if we didn't have Chris Mooney? We'd have to pick of the creationists all the time and thatgets boring.

Was Newton the Greatest Scientist Who Ever Lived?

Most of us know that Charles Darwin was the greatest scientist who ever lived but one still finds the occasional misguided physicist/mathematician who thinks that the honor should go to an eighteenth century Englishman named Isaac Newton (1642-1727) [Top Five Dead Scientists] [Westminster Abbey: Darwin vs Newton] [Books by Charles Darwin] [Why I'm Not a Darwinist].

Now we have more direct evidence.1 The Israel National Library has just put a pile of Newton's writings on line [Israel National Library uploads trove of Newton's theological tracts ]. We get to see direct example of how Newton thinks like a scientist.

My favorite is Newton's predictions about when the apocalypse will take place. He starts his calculation with the crowning of Charlemagne as Holy Roman Emperor in 800 AD and goes downhill from there [Newton on the date 2060 (early 18th century)].
In the instance displayed on this manuscript folio, Newton calculates a tentative date using the 1260 days (taken to be years) from Daniel in part to counter the claims of some of his contemporaries, who claimed that the end would come in the seventeenth or eighteenth century. Newton stood apart from contemporary interpreters who were predicting the imminent restoration of the Jews, the fall of the Catholic Church and the Second Coming of Christ. Nevertheless, Newton’s own fervent belief in these prophetic events is not in doubt. The abbreviation “A.C.” stands for Anno Christi (“the year of Christ”).
So then the time times and half a time are 42 months or 1260 days or three years and an half, reckoning twelve months to a year and 30 days to a month as was done in the Calendar of the primitive year. And the days of short lived Beasts being put for the years of lived [sic] kingdoms, the period of 1260 days, if dated from the complete conquest of the three kings A.C. 800, will end A.C. 2060. It may end later, but I see no reason for its ending sooner. This I mention not to assert when the time of the end shall be, but to put a stop to the rash conjectures of fanciful men who are frequently predicting the time of the end, and by doing so bring the sacred prophesies into discredit as often as their predictions fail. Christ comes as a thief in the night, and it is not for us to know the times and seasons which God hath put into his own breast.
There's lots more where this came from but I don't want to embarrass the Newton supporters any further.

By way of contrast, the real greatest scientist who ever lived was a non-believer who never would have treated the Bible as a scientific authority.


1. The information isn't new. It's just that we can now see for ourselves that Isaac Newton was remarkably unscientific in most of his writings.

P.S. Some losers are going to argue that Newton was still the greatest scientist and we should ignore the fact that his religious beliefs made him write many stupid anti-science treatises. That's like saying that Young Earth Creationists (like Newton) can be good scientists even though they believe the Earth is only 6000 years old.

Wednesday, February 08, 2012

The Mysterious Epigenome

 
Tom Woodward is the founder of the C.S. Lewis Society and the apologetics.org website. He has written a books (with James Gills) called The Mysterious Epigenome: What Lies Beyond DNA. You can tell from the title that this is another "evolution revolution" book capitalizing on the re-invention of a new word that means everything—and nothing.

Woodward kindly posted an article on apologetics.org that helps us decide whether this is a book worth reading.
The Avalanche


God’s love—a vast oceanic expanse? An aggressive love, “lavished” on mankind? Coming at us like an avalanche?

These ideas came to Dr. James Gills and me as we were working on our book The Mysterious Epigenome: What Lies Beyond DNA. As we tweaked the final manuscript, we were haunted over and over by a powerful, pivotal thought. As scientists continue pulling back curtain after curtain that had previously shrouded the chemical master-codes that control our DNA system (that is, the multiple integrated layers of our epigenetic “computer codes”), they were also revealing something in the realm of spirit. They had opened up a new kind of vista on the greatness of the Creator’s overwhelming intelligence—his boundless genius--which is placed on display in this bizarre biochemical landscape. The more we thought about and discussed the latest discoveries of the genome and epigenome, the more we were confronted with this sense of the cosmic architect’s “unlimited, off-the-chart wisdom” in creating and sustaining the micro-cosmos of life. At the same time, seeing the Creator’s engineering intelligence in this new light also made us ponder the striking parallel with other “overwhelming/incalculable/infinite” qualities that the Bible attributes to the Creator, such as his power, knowledge and love.
That's pretty much all you need to know but if you are a sucker for punishment anxious to know more you can read some excerpts on Tom Woodward's The Mysterious Epigenome: Effectively Popularizing Richard Sternberg's Revolutionary Thesis.

I was going to say that creationists like Woodward give the epigenome a bad name but then I realized that it isn't true. Epigenomics and epigenetics had bad names long before the creationists got wind of them.

UPDATE: Several readers noted that the DNA on the cover of the book is a left-handed helix. This doesn't inspire confidence, does it?

In case you thought that Disney World had the only fantasyland in Florida, check this out.




Michael Lynch on Evo-Devo

Michael Lynch had some cogent (and provocative, and true) words on adaptationism in his book The Origins of Genome Architectue [Michael Lynch on Adaptationism].

Here's what he has to say about evo-devo.
Consider the steady stream of recent books by authors striving to define a new field called evolutionary developmental biology (e.g., Arthur 1997; Gerhart and Kirschner 1997; Davidson 2001, 2006; Carrol et al. 2001; West-Eberhard 2003; Carrol 2005a; Kirschner and Gerhart 2005). The plots of all these books are similar: first, it is claimed that observations from developmental biology demonstrate major inadequacies in current evolutionary theory, and then a new view of evolution that eliminates many of the central shortcomings of the field is promised. Developmental biologists are correct in pointing out that evolutionary theory has not yet specifically connected genotype to phenotype's in a molecular/cell biological sense. However, extraordinary claims call for extraordinary evidence, and none of these treatises provide any formal example of the fundamental inability of evolutionary theory to explain patterns of morphological diversity. Those who argue that microevolutionary theory has made no contributions to our understanding of the evolution of form may wish to consult the substantial body of quantitative genetics literature on multivariant evolution. Such work is by no means fully satisfactory, as it is couched in terms of statistics (variances and covariances) rather than the molecular features of individual genes, but a more precise evolutionary framework for linking genes and morphology not be possible until a critical mass of generalities on the matter has emerged at the molecular, cellular and developmental levels.

For the vast majority of biologists, evolution is nothing more than natural selection. This view reduces the study of evolution to the simple documentation of differences between species, proclamation of a belief in Darwin, and concoction of a superficially reasonable tale of adopting the divergence (...). A common stance in cellular and developmental biology is that the elucidation of differences in molecular genetic pathways between two species (usually very distant species) completes the evolutionary story. No need to dig any deeper—because natural selection surely produce the end products, the population genetic details do not matter. In individual cases, this type of informal thinking may do little harm, but in the long-run it undermines the very scientific basis of with the evolutionary biology.

There are two fundamental issues here. First, the notion that interspecific differences at the molecular level reveal the mechanism of evolution ignores the fundamental distinction between the outcome of evolution and the events that lead to such changes. For example, although most animal developmental biologists argue that it was shocking to discover that the development of all animals is based on modifications of the same sets of ancient genes, many evolutionary biologists regard this view with some surprise. It is, of course, easy to criticize based on 20/20 hindsight, but we have known for decades that all eukaryotes share most of the same genes for transcription, translation, replication, nutrient uptake, core metabolism, cytoskeletal structure, and so forth. Why would we expect anything different for development? Although knowing that HOX genes play a central role in the development of all animals provides insights into the genetic scaffold from which body plans are built, it does not advance our knowledge of the evolutionary process much beyond noting that all vertebrates share a heritage of calcified skeletons. It need not even tell us that such genes were involved in the initial stages of differentiation (Alonso and Wilkins 2005). A vast chasm of stepwise (and partially overlapping) changes may separate today's products of evolution, and understanding those steps is what distinguishes evolutionary biology from comparative biology.


Michael Lynch on Adaptationism

I've been studying Michael Lynch's book The origins of Genome Architecture. It's a marvelous book, I wish everyone interested in evolution could read it and understand it.

The last chapter is very interesting. Lynch talks about the importance of understanding modern population genetics.
... I will comment on the current state of affairs in evolutionary biology, particularly the perception of softness in the field that has been encouraged by the propagation of evolutionary ideas by those with few intentions of being confined by the constraints of prior knowledge.
He also talks about adaptationism/panselectionism and about evolutionary-developmental biology. I'll get to evo-devo in another post [Michael Lynch on Evo-Devo] but here are some choice words about adapationism.
Despite the tremendous theoretical and physical resources now available, the field of evolutionary biology continues to be widely perceived as a soft science. Here I am referring not to the problems associated with those pushing the view that life was created by an intelligent designer, but to a more significant internal issue: a subset of academics who consider themselves strong advocates of evolution but who see no compelling reason to probe the substantial knowledge base of the field. Although this is a heavy charge, it is easy to document. For example, in his 2001 presidential address to the Society for the Study of Evolution, Nick Barton presented a survey that demonstrated that about half of the recent literature devoted to evolutionary issues is far removed from mainstream evolutionary biology.

With the possible exception of behavior, evolutionary biology is treated unlike any other science. Philosophers, sociologists, and ethicists expound on the central role of evolutionary theory in understanding our place in the world. Physicists excited about biocomplexity and computer scientists enamored with genetic algorithms promise a bold new understanding of evolution, and similar claims are made in the emerging field of evolutionary psychology (and its derivatives in political science, economics, and even the humanities). Numerous popularizers of evolution, some with careers focused on defending the teaching of evolution in public schools, are entirely satisfied that a blind adherence to the Darwinian concept of natural selection is a license for such activities. A commonality among all these groups is the near-absence of an appreciation of the most fundamental principles of evolution. Unfortunately, this list extends deep within the life sciences.

....

... the uncritical acceptance of natural selection as an explanatory force for all aspects of biodiversity (without any direct evidence) is not much different than invoking an intelligent designer (without any direct evidence). True, we have actually seen natural selection in action in a number of well-documented cases of phenotypic evolution (Endler 1986; Kingsolver et al. 2001), but it is a leap to assume that selection accounts for all evolutionary change, particularly at the molecular and cellular levels. The blind worship of natural selection is not evolutionary biology. It is arguably not even science. Natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology.

It should be emphasized here that the sins of panselectionism are by no means restricted to developmental biology, but simply follow the tradition embraced by many areas of evolutionary biology itself, including paleontology and evolutionary ecology (as cogently articulated by Gould and Lewontin in 1979). The vast majority of evolutionary biologists studying morphological, physiological, and or behavioral traits almost always interpret the results in terms of adaptive mechanisms, and they are so convinced of the validity of this approach that virtually no attention is given to the null hypothesis of neutral evolution, despite the availability of methods to do so (Lande 1976; Lynch and Hill 1986; Lynch 1994). For example, in a substantial series of books addressed to the general public, Dawkins (e,g., 1976, 1986, 1996, 2004) has deftly explained a bewildering array of observations in terms of hypothetical selection scenarios. Dawkins's effort to spread the gospel of the awesome power of natural selection has been quite successful, but it has come at the expense of reference to any other mechanisms, and because more people have probably read Dawkins than Darwin, his words have in some ways been profoundly misleading. To his credit, Gould, who is also widely read by the general public, frequently railed against adaptive storytelling, but it can be difficult to understand what alternative mechanisms of evolution Gould had in mind.
I agree with everything Lynch writes except that I have a pretty good idea what alternative mechanisms Gould proposed.


Must a Gene Have a Function?

Biology is such a messy subject.1 It's impossible to come up with simple definitions of fundamental concepts in biology because there are exceptions to everything. In the case of "gene," there are so many exceptions that it seems hopeless to propose a general definition of such an important term. Nevertheless, we need some basic ground rules to prevent the situation from getting out-of-hand.

In an earlier posting from 2007 [What Is a Gene?], I suggested the following ...
This essay describes various modern definitions of physical genes (Gene-D). I like to define a gene as “a DNA sequence that’s transcribed” but that’s a bit too brief for a formal definition. We need to include something that restricts the definition of gene to those entities that are biologically significant. Hence,

A gene is a DNA sequence that is transcribed to produce a functional product.

This eliminates those parts of the chromosome that are transcribed by accident or error. These regions are significant in large genomes; in fact, the confusion between accidental transcripts and real transcripts is responsible for the overestimates of gene number in many genome projects. (In technical parlance, most ESTs are artifacts and the sequences they come from are not genes.)
Let's not quibble about all of the exceptions. Most of them are covered in my original article and in the comments there. I want to concentrate here on the idea that a gene has to have a "function" of some sort. As I explained in the comments ....
I don't know if I can come up with a catchy definition of "function." What I mean is that the transcript or it's product has to do some biochemical duty in order to qualify. It doesn't have to be an essential function but it has to make a difference of some sort.
This is important because there's a growing tendency to label all kinds of things as "genes" just because they produce small RNA molecules or, in some cases, a small protein. In most cases the products have no known biological function.

Here's a couple of examples.

De Novo Protein-Encoding Genes

It's plainly obvious that new genes must arise from time to time in various lineages. Lot's of people are interested in the evolution of humans and in particular the changes that distinguish us from our closest cousins. Almost all of the changes can be explained by alterations in the timing or location of orthologous gene expression but that doesn't exclude the possibility that entirely new genes might arise de novo in some lineages.

Let's just think about genes that encode proteins. There are three steps required for the de novo creation of a new protein-encoding gene. (1) A part of the ancestral genome must be transcribed. (2) The transcript must contain an open reading frame with a start and stop codon. (3) The new protein must have a function.

That last step needs explaining. If the new protein doesn't have a function then the putative new gene is no different than a pseudogene or a mutant gene that produces a truncated protein because of a premature stop codon. It's also indistinguishable from bits of the genome that are accidentally transcribed and just happen to have an open reading frame.

Wu et al. (2010) looked at the evolution of new genes in the lineage leading to humans. The title of their paper is: "De Novo Origin of Human Protein-Coding Genes." I want to challenge their definition of "gene" by suggesting that what they've really discovered are "potential" or "candidate" genes that don't deserve to be called "genes" until one discovers a biological function for their products.

The authors searched the human genome (build 56) for annotated "genes" with small open reading frames greater that 100 codons long. Then they examined the corresponding loci in the chimpanzee and orangutan genomes looking for case where there was no open reading frame in the other apes. Various expressed RNA databases and two expressed peptide databases were screened to see if the candidate genes were expressed as RNA and protein. They found 27 examples. These are the candidates for de novo genes in humans.

Their collection did not contain some of the de novo "genes" reported by others. As it turns out, those "genes" were annotated in previous versions of the human genome (builds 40-55) but were dropped from the latest versions because there were no homologues in the other ape genomes. By using those older builds, Wu et al. discovered another 33 candidates for a total of 60 putative new protein-encoding genes in the human genome.

Wu et al. concede that the expression levels of these candidate genes are "very low" but unfortunately they don't give us any specific levels. This is important because there's plenty of evidence that the expressed RNA databases contain spurious transcripts [How to Evaluate Genome Level Transcription Papers].

I wonder how many spurious peptides are in the peptide databases? Wu et al report that one of the peptides used to identify an earlier example of a de novo gene (Knowles and McLysaght, 2009) has been removed from the current build of PeptideAtlas. What happened to it?

The authors are aware of the fact that function is important, especially if they want to argue that these new genes conferred some selective advantage on our hominid ancestors. The only "evidence" they offer is that the putative genes are expressed at a low level in testis and brains but at an even lower level in other tissues. This is no evidence at all since we've known for fifty years that the complexity of RNA sequences in brain and testis is much higher than in other tissues. We still don't know whether that's due to elevated spurious transcription in those tissues of whether it is biologically significant.

Are these 60 candidates really new "protein-coding genes"? I don't think so. I don't think they can be called "genes" until it has been demonstrated that the products have a biological function. Guerzoni and McLysaght (2010) seem to agree because they write,
The observation by Wu et al. that some of the candidate de novo genes are expressed at their highest in brain tissues and testis is interesting, but by no means proves they are functional. A major challenge remains to demonstrate functionality of the de novo genes.

Genes that Encode Functional RNAs

The people who annotate the human genome are somewhat skeptical of these new genes and that's why so many putative genes have disappeared from the more recent builds. (But the Ensembl group still lists 434 "novel protein-coding genes.")

However, they don't seem to be as skeptical when it comes to genes that produce small RNAs. The most recent Ensembl build (GRCh37.p5, Feb 2009), for example, lists 12,523 RNA genes [Ensembl: Human Genome].

What are the criteria they use to prove that these are really genes? It can't have anything to do with biological function since it's simply not true that the human genome contains more that twelve thousand genes that produce an RNA whose function has been demonstrated.

Should that be a requirement before declaring that a bit of transcribed DNA is a gene? You're damn right it should because otherwise every bit of DNA that's accidentally transcribed in some tissue at some time during development qualifies as a gene. That makes no sense [What is a gene, post-ENCODE?].


1. That's why it's much more difficult than physics where there's talk about unifying the entire discipline under a single theory of everything. :-)

Guerzoni D, McLysaght A. (2011) De novo origins of human genes. PLoS Genet. 2011 Nov;7(11):e1002381. Epub 2011 Nov 10. [PLoS Genetics]

Knowles, D.G. and McLysaght, A. (2009) Recent de novo origin of human protein-coding genes. Genome Res. 19:1752-1759. PLoS Genet. 2011 Nov;7(11):e1002379. Epub 2011 Nov 10. [doi: 10.1101/gr.095026.109]

Wu, D.D., Irwin, D.M., and Zhang, Y.P. (2010) De novo origin of human protein-coding genes. [PLoS Genetics]

Monday, February 06, 2012

How Much of Our Genome Is Sequenced?

I'm getting ready for a class on the size and composition of the human genome so I thought I'd check to see the latest estimate of its size. Recall that in an earlier posting I concluded that the size of the human genome was 3,200,000,000 bp (3,200,000 kb, 3,200 Mb, 3.2 Gb) [How Big Is the Human Genome?].

You might think that all you have to do is check out the human genome websites and look up the exact size. That doesn't work because not all of the human genome has been sequenced and organized into a contiguous assembly of 24 different strands (one for each chromosome). So that prompts the question, how much of the human genome has actually been sequenced?1

The latest assembly is GRCh37 Patch Release 7 (GRCh37.p7), released on Feb. 3, 2012. If you look at the data for this assembly you will see an estimate of the "Total Sequenced Bases in the Assembly." The number is 3,173,036,847 bp or 3.17 Gb. This value is close to estimates of the genome size from the years before the first draft of the genome sequence was published.

I was suspicious of this number since we know that there are many gaps in the human genome sequence. The largest gaps cover highly repetitive parts of the genome—mostly around the centromeres and other heterochromatic regions. There were also gaps at the locations of several gene clusters (e.g. ribosomal RNA genes) where it's impossible to determine the exact number of copies. In the case of ribosomal RNA gene clusters, these gaps have now been closed.

Deanna Church posted a few comments on my earlier posting. She's with the Genome Reference Consortium (GRC). That's the group responsible for updating the human genome. Deanna explained that "Total Sequenced Bases in the Assembly" is not an accurate representation of the truth.2 What it actually means is total sequenced bases plus estimated sizes of the gaps. In other words, it's a good estimate of the size of the genome.

So, how much of the genome is actually sequenced and organized into "scaffolds," or contiguous stretches of DNA? You can see the actual numbers by clicking on Ungapped Lengths on the NCBI website.

The total number of sequenced base pairs that have been organized into scaffolds and placed on a particular chromosome is 2,861,332,606 bp. An additional 6,110,758 bp have been sequenced but the blocks of sequence cannot be placed in the assembly. Most of this unassigned sequence is on chromosomes 1,4,9, and 17 but some of it can't even be associated with a particular chromosome.

If we assume that the true haploid genome size is 3.2 Gb, or 3,200 Mb, then the sequenced and assigned part of the genome represents 89.6% and the unassigned sequenced part is 0.2%.

We can say that only 90% of the human genome has been sequenced and the remaining 10% falls into 357 gaps scattered throughout the genome. (Every chromosome has unsequenced gaps but some have more than others and it doesn't depend on the size of the chromosome.)

The The Wellcome Trust Sanger Institute is part of the Genome Reference Consortium but it maintains its own website on the human genome [Whole Genome]. The data on the e!Ensembl page refers to build CRCh37.p5 from Feb. 2009 but it also says the data was updated in Dec. 2011.

According to the Sanger Institute, the size of the sequenced genome is 3,283,984,159 bp and the "golden path length" is 3,101,804,739 bp. I've tried to find out what these numbers mean but if the information is present on the Ensembl website then it's very well hidden.

Are you interested in the number of genes? Here's the data from Ensembl. It indicates that the human genome contains 33,399 genes! [What Is a Gene?] [What is a gene, post-ENCODE?] This inflated value is calculated by including 12,523 genes that make an RNA product that's not translated. This is almost certainly a highly inflated number.

The data indicates that there are 181,744 gene transcripts or between 5 and 9 transcripts per gene depending on how you count the genes. I don't believe there are this many biologically functional transcripts per gene. I think the actual number is much closer to one (1) [Genes and Straw Men].


1. It certainly doesn't "beg the question." That means something else entirely [Begging the Question].

2. That's a euphemism for "It's a lie!"