We've been discussing the adaptationist approach to biology on another thread and this is a good example to illustrate the issues. If I were to ask you how the zebra got its stripes, what would you think?
Would you immediately assume that it could be an evolutionary accident with no adaptive significance then start to wonder if you could rule out such an explanation? Can random genetic drift of neutral alleles explain the zebra's stripes?
Or would you immediately start thinking of adaptive explanations for why all three extant species of zebras have stripes but no other large mammals in the same environment are striped. Most other horses don't have prominent stripes but many have faint stripes on some parts of their bodies (Darwin, 1859).
I argue that you have to rule out the null hypothesis (drift) before invoking adaptationist explanations. In other words, the first question you need to ask is whether zebra stripes are adaptive. But that's not the adaptationist approach. Adaptationists begin with the assumption that stripes are adaptive, then they start looking for adaptive explanations.
What if the favorite adaptive explanation is refuted? What does an adaptationist do next? Gould and Lewontin (1978) provide the answer ...
More Recent Comments
Friday, February 10, 2012
Thursday, February 09, 2012
Remember Chris Mooney?
Chris Mooney has achieved some remarkable goals in the past ten years or so. He's an atheist accommodationist who doesn't understand atheism or accommodation. He's a science journalist who doesn't understand science or the important elements of science journalism (it's all about
Nisbet & Mooney Reveal Their True Colors
Matthew Nisbet and Chris Mooney Video on Framing Science
Changing Minds Through Science Communication
For Once, Chris Mooney Talks Sense
The Future of Science Journalism
Science Journalism in Decline
Some scientists are astrologers, therefore science and astrology are compatible
Chris Mooney Changed His Mind
The Doctrine of Joint Belief
Chris Mooney and Sheril Kirshenbaum in Newsweek
Boring ....
The Difference between Truth and Framing
Chris Mooney vs Atheists: Part XXXIV
Chris Mooney Asks a Hard Question
The Great Accommodationist Dud
Chris is about to publish a new book called The Republican Brain: The Science of Why They Deny Science--and Reality. Like most professional writers he knows that he has to hype the book in order to get people to buy it so he's started the campaign with an article on HuffPost under their new "Science" category [Want to Understand Republicans? First Understand Evolution].
The main theme is that Republicans are genetically different from Democrats and that difference is due to evolution. No, I'm not kidding ....
Jerry Coyne says "huh?" [Chris Mooney, evolution, and politics]. Get on over to Coyne's
1. What would we do for fun if we didn't have Chris Mooney? We'd have to pick of the creationists all the time and thatgets boring.
Was Newton the Greatest Scientist Who Ever Lived?
Most of us know that Charles Darwin was the greatest scientist who ever lived but one still finds the occasional misguided physicist/mathematician who thinks that the honor should go to an eighteenth century Englishman named Isaac Newton (1642-1727) [Top Five Dead Scientists] [Westminster Abbey: Darwin vs Newton] [Books by Charles Darwin] [Why I'm Not a Darwinist].
Now we have more direct evidence.1 The Israel National Library has just put a pile of Newton's writings on line [Israel National Library uploads trove of Newton's theological tracts ]. We get to see direct example of how Newton thinks like a scientist.
My favorite is Newton's predictions about when the apocalypse will take place. He starts his calculation with the crowning of Charlemagne as Holy Roman Emperor in 800 AD and goes downhill from there [Newton on the date 2060 (early 18th century)].
By way of contrast, the real greatest scientist who ever lived was a non-believer who never would have treated the Bible as a scientific authority.
Now we have more direct evidence.1 The Israel National Library has just put a pile of Newton's writings on line [Israel National Library uploads trove of Newton's theological tracts ]. We get to see direct example of how Newton thinks like a scientist.
My favorite is Newton's predictions about when the apocalypse will take place. He starts his calculation with the crowning of Charlemagne as Holy Roman Emperor in 800 AD and goes downhill from there [Newton on the date 2060 (early 18th century)].
In the instance displayed on this manuscript folio, Newton calculates a tentative date using the 1260 days (taken to be years) from Daniel in part to counter the claims of some of his contemporaries, who claimed that the end would come in the seventeenth or eighteenth century. Newton stood apart from contemporary interpreters who were predicting the imminent restoration of the Jews, the fall of the Catholic Church and the Second Coming of Christ. Nevertheless, Newton’s own fervent belief in these prophetic events is not in doubt. The abbreviation “A.C.” stands for Anno Christi (“the year of Christ”).There's lots more where this came from but I don't want to embarrass the Newton supporters any further.
So then the time times and half a time are 42 months or 1260 days or three years and an half, reckoning twelve months to a year and 30 days to a month as was done in the Calendar of the primitive year. And the days of short lived Beasts being put for the years of lived [sic] kingdoms, the period of 1260 days, if dated from the complete conquest of the three kings A.C. 800, will end A.C. 2060. It may end later, but I see no reason for its ending sooner. This I mention not to assert when the time of the end shall be, but to put a stop to the rash conjectures of fanciful men who are frequently predicting the time of the end, and by doing so bring the sacred prophesies into discredit as often as their predictions fail. Christ comes as a thief in the night, and it is not for us to know the times and seasons which God hath put into his own breast.
By way of contrast, the real greatest scientist who ever lived was a non-believer who never would have treated the Bible as a scientific authority.
1. The information isn't new. It's just that we can now see for ourselves that Isaac Newton was remarkably unscientific in most of his writings.
P.S. Some losers are going to argue that Newton was still the greatest scientist and we should ignore the fact that his religious beliefs made him write many stupid anti-science treatises. That's like saying that Young Earth Creationists (like Newton) can be good scientists even though they believe the Earth is only 6000 years old.
Wednesday, February 08, 2012
The Mysterious Epigenome
Tom Woodward is the founder of the C.S. Lewis Society and the apologetics.org website. He has written a books (with James Gills) called The Mysterious Epigenome: What Lies Beyond DNA. You can tell from the title that this is another "evolution revolution" book capitalizing on the re-invention of a new word that means everything—and nothing.
Woodward kindly posted an article on apologetics.org that helps us decide whether this is a book worth reading.
That's pretty much all you need to know but if you areThe Avalanche
God’s love—a vast oceanic expanse? An aggressive love, “lavished” on mankind? Coming at us like an avalanche?
These ideas came to Dr. James Gills and me as we were working on our book The Mysterious Epigenome: What Lies Beyond DNA. As we tweaked the final manuscript, we were haunted over and over by a powerful, pivotal thought. As scientists continue pulling back curtain after curtain that had previously shrouded the chemical master-codes that control our DNA system (that is, the multiple integrated layers of our epigenetic “computer codes”), they were also revealing something in the realm of spirit. They had opened up a new kind of vista on the greatness of the Creator’s overwhelming intelligence—his boundless genius--which is placed on display in this bizarre biochemical landscape. The more we thought about and discussed the latest discoveries of the genome and epigenome, the more we were confronted with this sense of the cosmic architect’s “unlimited, off-the-chart wisdom” in creating and sustaining the micro-cosmos of life. At the same time, seeing the Creator’s engineering intelligence in this new light also made us ponder the striking parallel with other “overwhelming/incalculable/infinite” qualities that the Bible attributes to the Creator, such as his power, knowledge and love.
I was going to say that creationists like Woodward give the epigenome a bad name but then I realized that it isn't true. Epigenomics and epigenetics had bad names long before the creationists got wind of them.
UPDATE: Several readers noted that the DNA on the cover of the book is a left-handed helix. This doesn't inspire confidence, does it?
In case you thought that Disney World had the only fantasyland in Florida, check this out.
Michael Lynch on Evo-Devo
Michael Lynch had some cogent (and provocative, and true) words on adaptationism in his book The Origins of Genome Architectue [Michael Lynch on Adaptationism].
Here's what he has to say about evo-devo.
Here's what he has to say about evo-devo.
Consider the steady stream of recent books by authors striving to define a new field called evolutionary developmental biology (e.g., Arthur 1997; Gerhart and Kirschner 1997; Davidson 2001, 2006; Carrol et al. 2001; West-Eberhard 2003; Carrol 2005a; Kirschner and Gerhart 2005). The plots of all these books are similar: first, it is claimed that observations from developmental biology demonstrate major inadequacies in current evolutionary theory, and then a new view of evolution that eliminates many of the central shortcomings of the field is promised. Developmental biologists are correct in pointing out that evolutionary theory has not yet specifically connected genotype to phenotype's in a molecular/cell biological sense. However, extraordinary claims call for extraordinary evidence, and none of these treatises provide any formal example of the fundamental inability of evolutionary theory to explain patterns of morphological diversity. Those who argue that microevolutionary theory has made no contributions to our understanding of the evolution of form may wish to consult the substantial body of quantitative genetics literature on multivariant evolution. Such work is by no means fully satisfactory, as it is couched in terms of statistics (variances and covariances) rather than the molecular features of individual genes, but a more precise evolutionary framework for linking genes and morphology not be possible until a critical mass of generalities on the matter has emerged at the molecular, cellular and developmental levels.
For the vast majority of biologists, evolution is nothing more than natural selection. This view reduces the study of evolution to the simple documentation of differences between species, proclamation of a belief in Darwin, and concoction of a superficially reasonable tale of adopting the divergence (...). A common stance in cellular and developmental biology is that the elucidation of differences in molecular genetic pathways between two species (usually very distant species) completes the evolutionary story. No need to dig any deeper—because natural selection surely produce the end products, the population genetic details do not matter. In individual cases, this type of informal thinking may do little harm, but in the long-run it undermines the very scientific basis of with the evolutionary biology.
There are two fundamental issues here. First, the notion that interspecific differences at the molecular level reveal the mechanism of evolution ignores the fundamental distinction between the outcome of evolution and the events that lead to such changes. For example, although most animal developmental biologists argue that it was shocking to discover that the development of all animals is based on modifications of the same sets of ancient genes, many evolutionary biologists regard this view with some surprise. It is, of course, easy to criticize based on 20/20 hindsight, but we have known for decades that all eukaryotes share most of the same genes for transcription, translation, replication, nutrient uptake, core metabolism, cytoskeletal structure, and so forth. Why would we expect anything different for development? Although knowing that HOX genes play a central role in the development of all animals provides insights into the genetic scaffold from which body plans are built, it does not advance our knowledge of the evolutionary process much beyond noting that all vertebrates share a heritage of calcified skeletons. It need not even tell us that such genes were involved in the initial stages of differentiation (Alonso and Wilkins 2005). A vast chasm of stepwise (and partially overlapping) changes may separate today's products of evolution, and understanding those steps is what distinguishes evolutionary biology from comparative biology.
Michael Lynch on Adaptationism
I've been studying Michael Lynch's book The origins of Genome Architecture. It's a marvelous book, I wish everyone interested in evolution could read it and understand it.
The last chapter is very interesting. Lynch talks about the importance of understanding modern population genetics.
The last chapter is very interesting. Lynch talks about the importance of understanding modern population genetics.
... I will comment on the current state of affairs in evolutionary biology, particularly the perception of softness in the field that has been encouraged by the propagation of evolutionary ideas by those with few intentions of being confined by the constraints of prior knowledge.He also talks about adaptationism/panselectionism and about evolutionary-developmental biology. I'll get to evo-devo in another post [Michael Lynch on Evo-Devo] but here are some choice words about adapationism.
Despite the tremendous theoretical and physical resources now available, the field of evolutionary biology continues to be widely perceived as a soft science. Here I am referring not to the problems associated with those pushing the view that life was created by an intelligent designer, but to a more significant internal issue: a subset of academics who consider themselves strong advocates of evolution but who see no compelling reason to probe the substantial knowledge base of the field. Although this is a heavy charge, it is easy to document. For example, in his 2001 presidential address to the Society for the Study of Evolution, Nick Barton presented a survey that demonstrated that about half of the recent literature devoted to evolutionary issues is far removed from mainstream evolutionary biology.I agree with everything Lynch writes except that I have a pretty good idea what alternative mechanisms Gould proposed.
With the possible exception of behavior, evolutionary biology is treated unlike any other science. Philosophers, sociologists, and ethicists expound on the central role of evolutionary theory in understanding our place in the world. Physicists excited about biocomplexity and computer scientists enamored with genetic algorithms promise a bold new understanding of evolution, and similar claims are made in the emerging field of evolutionary psychology (and its derivatives in political science, economics, and even the humanities). Numerous popularizers of evolution, some with careers focused on defending the teaching of evolution in public schools, are entirely satisfied that a blind adherence to the Darwinian concept of natural selection is a license for such activities. A commonality among all these groups is the near-absence of an appreciation of the most fundamental principles of evolution. Unfortunately, this list extends deep within the life sciences.
....
... the uncritical acceptance of natural selection as an explanatory force for all aspects of biodiversity (without any direct evidence) is not much different than invoking an intelligent designer (without any direct evidence). True, we have actually seen natural selection in action in a number of well-documented cases of phenotypic evolution (Endler 1986; Kingsolver et al. 2001), but it is a leap to assume that selection accounts for all evolutionary change, particularly at the molecular and cellular levels. The blind worship of natural selection is not evolutionary biology. It is arguably not even science. Natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology.
It should be emphasized here that the sins of panselectionism are by no means restricted to developmental biology, but simply follow the tradition embraced by many areas of evolutionary biology itself, including paleontology and evolutionary ecology (as cogently articulated by Gould and Lewontin in 1979). The vast majority of evolutionary biologists studying morphological, physiological, and or behavioral traits almost always interpret the results in terms of adaptive mechanisms, and they are so convinced of the validity of this approach that virtually no attention is given to the null hypothesis of neutral evolution, despite the availability of methods to do so (Lande 1976; Lynch and Hill 1986; Lynch 1994). For example, in a substantial series of books addressed to the general public, Dawkins (e,g., 1976, 1986, 1996, 2004) has deftly explained a bewildering array of observations in terms of hypothetical selection scenarios. Dawkins's effort to spread the gospel of the awesome power of natural selection has been quite successful, but it has come at the expense of reference to any other mechanisms, and because more people have probably read Dawkins than Darwin, his words have in some ways been profoundly misleading. To his credit, Gould, who is also widely read by the general public, frequently railed against adaptive storytelling, but it can be difficult to understand what alternative mechanisms of evolution Gould had in mind.
Must a Gene Have a Function?
Biology is such a messy subject.1 It's impossible to come up with simple definitions of fundamental concepts in biology because there are exceptions to everything. In the case of "gene," there are so many exceptions that it seems hopeless to propose a general definition of such an important term. Nevertheless, we need some basic ground rules to prevent the situation from getting out-of-hand.
In an earlier posting from 2007 [What Is a Gene?], I suggested the following ...
Here's a couple of examples.
De Novo Protein-Encoding Genes
It's plainly obvious that new genes must arise from time to time in various lineages. Lot's of people are interested in the evolution of humans and in particular the changes that distinguish us from our closest cousins. Almost all of the changes can be explained by alterations in the timing or location of orthologous gene expression but that doesn't exclude the possibility that entirely new genes might arise de novo in some lineages.
Let's just think about genes that encode proteins. There are three steps required for the de novo creation of a new protein-encoding gene. (1) A part of the ancestral genome must be transcribed. (2) The transcript must contain an open reading frame with a start and stop codon. (3) The new protein must have a function.
That last step needs explaining. If the new protein doesn't have a function then the putative new gene is no different than a pseudogene or a mutant gene that produces a truncated protein because of a premature stop codon. It's also indistinguishable from bits of the genome that are accidentally transcribed and just happen to have an open reading frame.
Wu et al. (2010) looked at the evolution of new genes in the lineage leading to humans. The title of their paper is: "De Novo Origin of Human Protein-Coding Genes." I want to challenge their definition of "gene" by suggesting that what they've really discovered are "potential" or "candidate" genes that don't deserve to be called "genes" until one discovers a biological function for their products.
The authors searched the human genome (build 56) for annotated "genes" with small open reading frames greater that 100 codons long. Then they examined the corresponding loci in the chimpanzee and orangutan genomes looking for case where there was no open reading frame in the other apes. Various expressed RNA databases and two expressed peptide databases were screened to see if the candidate genes were expressed as RNA and protein. They found 27 examples. These are the candidates for de novo genes in humans.
Their collection did not contain some of the de novo "genes" reported by others. As it turns out, those "genes" were annotated in previous versions of the human genome (builds 40-55) but were dropped from the latest versions because there were no homologues in the other ape genomes. By using those older builds, Wu et al. discovered another 33 candidates for a total of 60 putative new protein-encoding genes in the human genome.
Wu et al. concede that the expression levels of these candidate genes are "very low" but unfortunately they don't give us any specific levels. This is important because there's plenty of evidence that the expressed RNA databases contain spurious transcripts [How to Evaluate Genome Level Transcription Papers].
I wonder how many spurious peptides are in the peptide databases? Wu et al report that one of the peptides used to identify an earlier example of a de novo gene (Knowles and McLysaght, 2009) has been removed from the current build of PeptideAtlas. What happened to it?
The authors are aware of the fact that function is important, especially if they want to argue that these new genes conferred some selective advantage on our hominid ancestors. The only "evidence" they offer is that the putative genes are expressed at a low level in testis and brains but at an even lower level in other tissues. This is no evidence at all since we've known for fifty years that the complexity of RNA sequences in brain and testis is much higher than in other tissues. We still don't know whether that's due to elevated spurious transcription in those tissues of whether it is biologically significant.
Are these 60 candidates really new "protein-coding genes"? I don't think so. I don't think they can be called "genes" until it has been demonstrated that the products have a biological function. Guerzoni and McLysaght (2010) seem to agree because they write,
Genes that Encode Functional RNAs
The people who annotate the human genome are somewhat skeptical of these new genes and that's why so many putative genes have disappeared from the more recent builds. (But the Ensembl group still lists 434 "novel protein-coding genes.")
However, they don't seem to be as skeptical when it comes to genes that produce small RNAs. The most recent Ensembl build (GRCh37.p5, Feb 2009), for example, lists 12,523 RNA genes [Ensembl: Human Genome].
What are the criteria they use to prove that these are really genes? It can't have anything to do with biological function since it's simply not true that the human genome contains more that twelve thousand genes that produce an RNA whose function has been demonstrated.
Should that be a requirement before declaring that a bit of transcribed DNA is a gene? You're damn right it should because otherwise every bit of DNA that's accidentally transcribed in some tissue at some time during development qualifies as a gene. That makes no sense [What is a gene, post-ENCODE?].
In an earlier posting from 2007 [What Is a Gene?], I suggested the following ...
This essay describes various modern definitions of physical genes (Gene-D). I like to define a gene as “a DNA sequence that’s transcribed” but that’s a bit too brief for a formal definition. We need to include something that restricts the definition of gene to those entities that are biologically significant. Hence,Let's not quibble about all of the exceptions. Most of them are covered in my original article and in the comments there. I want to concentrate here on the idea that a gene has to have a "function" of some sort. As I explained in the comments ....
A gene is a DNA sequence that is transcribed to produce a functional product.
This eliminates those parts of the chromosome that are transcribed by accident or error. These regions are significant in large genomes; in fact, the confusion between accidental transcripts and real transcripts is responsible for the overestimates of gene number in many genome projects. (In technical parlance, most ESTs are artifacts and the sequences they come from are not genes.)
I don't know if I can come up with a catchy definition of "function." What I mean is that the transcript or it's product has to do some biochemical duty in order to qualify. It doesn't have to be an essential function but it has to make a difference of some sort.This is important because there's a growing tendency to label all kinds of things as "genes" just because they produce small RNA molecules or, in some cases, a small protein. In most cases the products have no known biological function.
Here's a couple of examples.
De Novo Protein-Encoding Genes
It's plainly obvious that new genes must arise from time to time in various lineages. Lot's of people are interested in the evolution of humans and in particular the changes that distinguish us from our closest cousins. Almost all of the changes can be explained by alterations in the timing or location of orthologous gene expression but that doesn't exclude the possibility that entirely new genes might arise de novo in some lineages.
Let's just think about genes that encode proteins. There are three steps required for the de novo creation of a new protein-encoding gene. (1) A part of the ancestral genome must be transcribed. (2) The transcript must contain an open reading frame with a start and stop codon. (3) The new protein must have a function.
That last step needs explaining. If the new protein doesn't have a function then the putative new gene is no different than a pseudogene or a mutant gene that produces a truncated protein because of a premature stop codon. It's also indistinguishable from bits of the genome that are accidentally transcribed and just happen to have an open reading frame.
Wu et al. (2010) looked at the evolution of new genes in the lineage leading to humans. The title of their paper is: "De Novo Origin of Human Protein-Coding Genes." I want to challenge their definition of "gene" by suggesting that what they've really discovered are "potential" or "candidate" genes that don't deserve to be called "genes" until one discovers a biological function for their products.
The authors searched the human genome (build 56) for annotated "genes" with small open reading frames greater that 100 codons long. Then they examined the corresponding loci in the chimpanzee and orangutan genomes looking for case where there was no open reading frame in the other apes. Various expressed RNA databases and two expressed peptide databases were screened to see if the candidate genes were expressed as RNA and protein. They found 27 examples. These are the candidates for de novo genes in humans.
Their collection did not contain some of the de novo "genes" reported by others. As it turns out, those "genes" were annotated in previous versions of the human genome (builds 40-55) but were dropped from the latest versions because there were no homologues in the other ape genomes. By using those older builds, Wu et al. discovered another 33 candidates for a total of 60 putative new protein-encoding genes in the human genome.
Wu et al. concede that the expression levels of these candidate genes are "very low" but unfortunately they don't give us any specific levels. This is important because there's plenty of evidence that the expressed RNA databases contain spurious transcripts [How to Evaluate Genome Level Transcription Papers].
I wonder how many spurious peptides are in the peptide databases? Wu et al report that one of the peptides used to identify an earlier example of a de novo gene (Knowles and McLysaght, 2009) has been removed from the current build of PeptideAtlas. What happened to it?
The authors are aware of the fact that function is important, especially if they want to argue that these new genes conferred some selective advantage on our hominid ancestors. The only "evidence" they offer is that the putative genes are expressed at a low level in testis and brains but at an even lower level in other tissues. This is no evidence at all since we've known for fifty years that the complexity of RNA sequences in brain and testis is much higher than in other tissues. We still don't know whether that's due to elevated spurious transcription in those tissues of whether it is biologically significant.
Are these 60 candidates really new "protein-coding genes"? I don't think so. I don't think they can be called "genes" until it has been demonstrated that the products have a biological function. Guerzoni and McLysaght (2010) seem to agree because they write,
The observation by Wu et al. that some of the candidate de novo genes are expressed at their highest in brain tissues and testis is interesting, but by no means proves they are functional. A major challenge remains to demonstrate functionality of the de novo genes.
Genes that Encode Functional RNAs
The people who annotate the human genome are somewhat skeptical of these new genes and that's why so many putative genes have disappeared from the more recent builds. (But the Ensembl group still lists 434 "novel protein-coding genes.")
However, they don't seem to be as skeptical when it comes to genes that produce small RNAs. The most recent Ensembl build (GRCh37.p5, Feb 2009), for example, lists 12,523 RNA genes [Ensembl: Human Genome].
What are the criteria they use to prove that these are really genes? It can't have anything to do with biological function since it's simply not true that the human genome contains more that twelve thousand genes that produce an RNA whose function has been demonstrated.
Should that be a requirement before declaring that a bit of transcribed DNA is a gene? You're damn right it should because otherwise every bit of DNA that's accidentally transcribed in some tissue at some time during development qualifies as a gene. That makes no sense [What is a gene, post-ENCODE?].
1. That's why it's much more difficult than physics where there's talk about unifying the entire discipline under a single theory of everything. :-)
Guerzoni D, McLysaght A. (2011) De novo origins of human genes. PLoS Genet. 2011 Nov;7(11):e1002381. Epub 2011 Nov 10. [PLoS Genetics]
Knowles, D.G. and McLysaght, A. (2009) Recent de novo origin of human protein-coding genes. Genome Res. 19:1752-1759. PLoS Genet. 2011 Nov;7(11):e1002379. Epub 2011 Nov 10. [doi: 10.1101/gr.095026.109]
Wu, D.D., Irwin, D.M., and Zhang, Y.P. (2010) De novo origin of human protein-coding genes. [PLoS Genetics]
Monday, February 06, 2012
How Much of Our Genome Is Sequenced?
I'm getting ready for a class on the size and composition of the human genome so I thought I'd check to see the latest estimate of its size. Recall that in an earlier posting I concluded that the size of the human genome was 3,200,000,000 bp (3,200,000 kb, 3,200 Mb, 3.2 Gb) [How Big Is the Human Genome?].
You might think that all you have to do is check out the human genome websites and look up the exact size. That doesn't work because not all of the human genome has been sequenced and organized into a contiguous assembly of 24 different strands (one for each chromosome). So that prompts the question, how much of the human genome has actually been sequenced?1
The latest assembly is GRCh37 Patch Release 7 (GRCh37.p7), released on Feb. 3, 2012. If you look at the data for this assembly you will see an estimate of the "Total Sequenced Bases in the Assembly." The number is 3,173,036,847 bp or 3.17 Gb. This value is close to estimates of the genome size from the years before the first draft of the genome sequence was published.
I was suspicious of this number since we know that there are many gaps in the human genome sequence. The largest gaps cover highly repetitive parts of the genome—mostly around the centromeres and other heterochromatic regions. There were also gaps at the locations of several gene clusters (e.g. ribosomal RNA genes) where it's impossible to determine the exact number of copies. In the case of ribosomal RNA gene clusters, these gaps have now been closed.
Deanna Church posted a few comments on my earlier posting. She's with the Genome Reference Consortium (GRC). That's the group responsible for updating the human genome. Deanna explained that "Total Sequenced Bases in the Assembly" is not an accurate representation of the truth.2 What it actually means is total sequenced bases plus estimated sizes of the gaps. In other words, it's a good estimate of the size of the genome.
So, how much of the genome is actually sequenced and organized into "scaffolds," or contiguous stretches of DNA? You can see the actual numbers by clicking on Ungapped Lengths on the NCBI website.
The total number of sequenced base pairs that have been organized into scaffolds and placed on a particular chromosome is 2,861,332,606 bp. An additional 6,110,758 bp have been sequenced but the blocks of sequence cannot be placed in the assembly. Most of this unassigned sequence is on chromosomes 1,4,9, and 17 but some of it can't even be associated with a particular chromosome.
If we assume that the true haploid genome size is 3.2 Gb, or 3,200 Mb, then the sequenced and assigned part of the genome represents 89.6% and the unassigned sequenced part is 0.2%.
We can say that only 90% of the human genome has been sequenced and the remaining 10% falls into 357 gaps scattered throughout the genome. (Every chromosome has unsequenced gaps but some have more than others and it doesn't depend on the size of the chromosome.)
The The Wellcome Trust Sanger Institute is part of the Genome Reference Consortium but it maintains its own website on the human genome [Whole Genome]. The data on the e!Ensembl page refers to build CRCh37.p5 from Feb. 2009 but it also says the data was updated in Dec. 2011.
According to the Sanger Institute, the size of the sequenced genome is 3,283,984,159 bp and the "golden path length" is 3,101,804,739 bp. I've tried to find out what these numbers mean but if the information is present on the Ensembl website then it's very well hidden.
Are you interested in the number of genes? Here's the data from Ensembl. It indicates that the human genome contains 33,399 genes! [What Is a Gene?] [What is a gene, post-ENCODE?] This inflated value is calculated by including 12,523 genes that make an RNA product that's not translated. This is almost certainly a highly inflated number.
The data indicates that there are 181,744 gene transcripts or between 5 and 9 transcripts per gene depending on how you count the genes. I don't believe there are this many biologically functional transcripts per gene. I think the actual number is much closer to one (1) [Genes and Straw Men].
You might think that all you have to do is check out the human genome websites and look up the exact size. That doesn't work because not all of the human genome has been sequenced and organized into a contiguous assembly of 24 different strands (one for each chromosome). So that prompts the question, how much of the human genome has actually been sequenced?1
The latest assembly is GRCh37 Patch Release 7 (GRCh37.p7), released on Feb. 3, 2012. If you look at the data for this assembly you will see an estimate of the "Total Sequenced Bases in the Assembly." The number is 3,173,036,847 bp or 3.17 Gb. This value is close to estimates of the genome size from the years before the first draft of the genome sequence was published.
I was suspicious of this number since we know that there are many gaps in the human genome sequence. The largest gaps cover highly repetitive parts of the genome—mostly around the centromeres and other heterochromatic regions. There were also gaps at the locations of several gene clusters (e.g. ribosomal RNA genes) where it's impossible to determine the exact number of copies. In the case of ribosomal RNA gene clusters, these gaps have now been closed.
Deanna Church posted a few comments on my earlier posting. She's with the Genome Reference Consortium (GRC). That's the group responsible for updating the human genome. Deanna explained that "Total Sequenced Bases in the Assembly" is not an accurate representation of the truth.2 What it actually means is total sequenced bases plus estimated sizes of the gaps. In other words, it's a good estimate of the size of the genome.
So, how much of the genome is actually sequenced and organized into "scaffolds," or contiguous stretches of DNA? You can see the actual numbers by clicking on Ungapped Lengths on the NCBI website.
The total number of sequenced base pairs that have been organized into scaffolds and placed on a particular chromosome is 2,861,332,606 bp. An additional 6,110,758 bp have been sequenced but the blocks of sequence cannot be placed in the assembly. Most of this unassigned sequence is on chromosomes 1,4,9, and 17 but some of it can't even be associated with a particular chromosome.
If we assume that the true haploid genome size is 3.2 Gb, or 3,200 Mb, then the sequenced and assigned part of the genome represents 89.6% and the unassigned sequenced part is 0.2%.
We can say that only 90% of the human genome has been sequenced and the remaining 10% falls into 357 gaps scattered throughout the genome. (Every chromosome has unsequenced gaps but some have more than others and it doesn't depend on the size of the chromosome.)
The The Wellcome Trust Sanger Institute is part of the Genome Reference Consortium but it maintains its own website on the human genome [Whole Genome]. The data on the e!Ensembl page refers to build CRCh37.p5 from Feb. 2009 but it also says the data was updated in Dec. 2011.
According to the Sanger Institute, the size of the sequenced genome is 3,283,984,159 bp and the "golden path length" is 3,101,804,739 bp. I've tried to find out what these numbers mean but if the information is present on the Ensembl website then it's very well hidden.
Are you interested in the number of genes? Here's the data from Ensembl. It indicates that the human genome contains 33,399 genes! [What Is a Gene?] [What is a gene, post-ENCODE?] This inflated value is calculated by including 12,523 genes that make an RNA product that's not translated. This is almost certainly a highly inflated number.
The data indicates that there are 181,744 gene transcripts or between 5 and 9 transcripts per gene depending on how you count the genes. I don't believe there are this many biologically functional transcripts per gene. I think the actual number is much closer to one (1) [Genes and Straw Men].
1. It certainly doesn't "beg the question." That means something else entirely [Begging the Question].
2. That's a euphemism for "It's a lie!"
Monday's Molecule #158
This molecule is responsible for one of the distinguishing features of an entire group of species. Sadly, most undergraduates have never heard of this molecule and they never study the fundamental process that it represents. In my experience, about 90% of all introductory biochemistry courses skip the relevant chapter(s) in the textbooks. There's no reasonable excuse for that omission. It's just bad teaching.
Identify the molecule—the common name will do. Post your answer in the comments. I'll hold off releasing any comments for 24 hours. The first one with the correct answer wins. I will only post correct answers to avoid embarrassment.
There could be two winners. If the first correct answer isn't from an undergraduate student then I'll select a second winner from those undergraduates who post the correct answer. You will need to identify yourself as an undergraduate in order to win. (Put "undergraduate" at the bottom of your comment.)
Some past winners are from distant lands so their chances of taking up my offer of a free lunch are slim. (That's why I can afford to do this!)
In order to win you must post your correct name. Anonymous and pseudoanonymous commenters can't win the free lunch.
Winners will have to contact me by email to arrange a lunch date.
UPDATE: The molecule is phycocyanobilin the light absorbing pigment in cyanobacteria (and some other species). This blue pigment is found in large structures called phycobilosomes and it is the reason why cyanobacteria were called blue-green algae. The winners are Thomas Ferraro and Charles Motraghi (undergraduate).
Winners
Nov. 2009: Jason Oakley, Alex Ling
Oct. 17: Bill Chaney, Roger Fan
Oct. 24: DK
Oct. 31: Joseph C. Somody
Nov. 7: Jason Oakley
Nov. 15: Thomas Ferraro, Vipulan Vigneswaran
Nov. 21: Vipulan Vigneswaran (honorary mention to Raul A. Félix de Sousa)
Nov. 28: Philip Rodger
Dec. 5: 凌嘉誠 (Alex Ling)
Dec. 12: Bill Chaney
Dec. 19: Joseph C. Somody
Jan. 9: Dima Klenchin
Jan. 23: David Schuller
Jan. 30: Peter Monaghan
Saturday, February 04, 2012
An Ode to λ
I grew up in the phage group and spent many summers at the phage meetings in Cold Spring Harbor. Back then (late 1960s, early 1970s), the best scientists worked on bacteriophage λ (lambda) and the rest of us just tried to keep up.
A number of key insights in molecular biology came from studying this small virus that infects Escherichia coli and if you didn't know about that research you were really out of the loop.
But by 1990 it was already apparent that a new generation of students was growing up in ignorance of the fundamental concepts learned from studying bacteriophage and bacteria. I remember asking a class what they knew about the genetic switch in bacteriophage λ and getting nothing but blank looks! Everyone worked on eukaryotes by then and the knowledge acquired by the phage group was not relevant.
I tried to teach that knowledge in my classes. In my textbook I devoted 27 pages to describing the regulation of phage genes (in a chapter on "Gene Expression and Development"). Other instructors didn't care.
Here's a short list of things we learned from studying λ. How many have you learned?
Friday, February 03, 2012
Carnival of Evolution #44
This month's Carnival of Evolution (44th version) is hosted by The Atavism, a blog written by David Winter, a PhD student in evolutionary genetics [Proceedings of the 44th Carnival of Evolution].
The next Carnival of Evolution (March) needs a host. Contact Bjørn Østman at Carnival of Evolution if you want to volunteer. Meanwhile, you can submit your articles for next month's carnival at Carnival of Evolution.
Welcome to the 44th monthly meeting of the Society for the Blogging of Evolution. As you can see below, we had a large number of submissions this month and, in order to have only a single track of talks and get people to the banquet with sufficient energy to enjoy themselves, some submissions have been included in a poster session following the last of the talks. Submissions were grouped purely on their subject material, and a submission included in the poster-session shouldn't be viewed as inferior to any featured as a talk.
I hope you enjoy a day's worth of reading, and remind you that a host is still required for next month's meeting. Sign up with Bjørn (bjorn[at]bjornostman.com) if you are interesting in helping out.
The next Carnival of Evolution (March) needs a host. Contact Bjørn Østman at Carnival of Evolution if you want to volunteer. Meanwhile, you can submit your articles for next month's carnival at Carnival of Evolution.
The Arsenic Affair: No Arsenic in DNA!
The "arsenic affair" began with a NASA press conference on Dec. 2, 2010 announcing that a new species of bacteria had been discovered. The species was named GFAJ-1 (Get Felisa a Job), by the lead author Felisa Wolfe-Simon. GFAJ-1 was grown in a medium that lacked phosphate and contained high concentrations of arsenic. The paper, published that day on the Science website, claimed that arsenic was replacing phosphorus in many of the cell's molecules, including nucleic acids.
Here's a(bad) video of the press conference. The high quality version from NASA is no longer available and some other YouTube videos don't allow embedding.
Here's a
Thursday, February 02, 2012
A Mormon Tale: The Romney Connection
My wife and our children are cousins of Mitt Romney. This is the story of their common ancestor James Hood and his Mormon descendants.A Mormon Tale
The Romney Connection
Hannah Hood Hill arrived in Salt Lake City when she was eight years old. She lived there with her father Archibald Newell Hill and his four wives. (Hannah’s mother, Isabella Hood, died at Winters Quarters in 1847.)
On May 10, 1862 Hannah Hood Hill married Miles Park Romney. Miles was born on August 18, 1843 in Nauvoo. His parents had been converted to the Church of the Latter Day Saints while living in England
Miles Romney (1806-1877) and his wife Elizabeth Gaskell (1809-1884) lived in the Liverpool area. Following their baptism, they sailed for New Orleans and made their way up the Mississippi by steamboat arriving at Nauvoo in 1841. This was a year before the Hill family arrived with Hannah Hood Hill.
The Hill family moved directly to Utah when Nauvoo was evacuated but the Romney family went to Missouri where they moved around from town to town until finally settling in St. Louis. In 1850, they were able to afford the move to Salt Lake City, Utah where they became reacquainted with the Hill family. Miles Park Romney was seven years old and Hannah Hood Hill was eight or nine.
Miles and Hannah had eleven children including Gaskell Romney (1871-1955). Miles Park Romney was sent on a mission to England Before their first child (Isabell 1863-1919) was born. While in England he preached for several years in the area around Liverpool (former home of his parents). He came back to Salt Lake City with a boatload of new English converts.
- Glasgow to Ontario
- Ontario to Nauvoo
- Nauvoo to Utah
- The Romney Connection
Hannah Hood Hill arrived in Salt Lake City when she was eight years old. She lived there with her father Archibald Newell Hill and his four wives. (Hannah’s mother, Isabella Hood, died at Winters Quarters in 1847.)
On May 10, 1862 Hannah Hood Hill married Miles Park Romney. Miles was born on August 18, 1843 in Nauvoo. His parents had been converted to the Church of the Latter Day Saints while living in England
Miles Romney (1806-1877) and his wife Elizabeth Gaskell (1809-1884) lived in the Liverpool area. Following their baptism, they sailed for New Orleans and made their way up the Mississippi by steamboat arriving at Nauvoo in 1841. This was a year before the Hill family arrived with Hannah Hood Hill.
The Hill family moved directly to Utah when Nauvoo was evacuated but the Romney family went to Missouri where they moved around from town to town until finally settling in St. Louis. In 1850, they were able to afford the move to Salt Lake City, Utah where they became reacquainted with the Hill family. Miles Park Romney was seven years old and Hannah Hood Hill was eight or nine.
Miles and Hannah had eleven children including Gaskell Romney (1871-1955). Miles Park Romney was sent on a mission to England Before their first child (Isabell 1863-1919) was born. While in England he preached for several years in the area around Liverpool (former home of his parents). He came back to Salt Lake City with a boatload of new English converts.
Wednesday, February 01, 2012
A Mormon Tale: Navoo to Utah
My wife and our children are cousins of Mitt Romney. This is the story of their common ancestor James Hood and his Mormon descendants.A Mormon Tale
Nauvoo to Utah
It was 1846 and the Mormons were preparing to leave Nauvoo for Utah. Many of them had crossed the Mississippi the previous year to prepare for the trip west. The Mormon town of Montrose, Iowa, had been settled some years earlier but now its population swelled to several thousand. Many blacksmiths, carpenters, and wainwrights set up shops to build wagons and carts.
The main exodus from Nauvoo began on February 4, 1846 with an advance party under Brigham Young. Archibald Newell Hood and his brother, Alexander Hill, were part of this advance party. The plan was to make it to Utah and establish a colony to receive the main body that would arrive later in the year. Here’s the description of what happened from the Wikipedia article on The Mormon Trail.
- Glasgow to Ontario
- Ontario to Nauvoo
- Nauvoo to Utah
- The Romney Connection
It was 1846 and the Mormons were preparing to leave Nauvoo for Utah. Many of them had crossed the Mississippi the previous year to prepare for the trip west. The Mormon town of Montrose, Iowa, had been settled some years earlier but now its population swelled to several thousand. Many blacksmiths, carpenters, and wainwrights set up shops to build wagons and carts.
The main exodus from Nauvoo began on February 4, 1846 with an advance party under Brigham Young. Archibald Newell Hood and his brother, Alexander Hill, were part of this advance party. The plan was to make it to Utah and establish a colony to receive the main body that would arrive later in the year. Here’s the description of what happened from the Wikipedia article on The Mormon Trail.
Tuesday, January 31, 2012
Monday's Molecule #157
You need to pay close attention in order to identify this molecule correctly.
Post your answer in the comments. I'll hold off releasing any comments for 24 hours. The first one with the correct answer wins. I will only post correct answers to avoid embarrassment.
There could be two winners. If the first correct answer isn't from an undergraduate student then I'll select a second winner from those undergraduates who post the correct answer. You will need to identify yourself as an undergraduate in order to win. (Put "undergraduate" at the bottom of your comment.)
Some past winners are from distant lands so their chances of taking up my offer of a free lunch are slim. (That's why I can afford to do this!)
In order to win you must post your correct name. Anonymous and pseudoanonymous commenters can't win the free lunch.
Winners will have to contact me by email to arrange a lunch date.
UPDATE: The molecule is L-sedoheptulose 1,7-bisphosphate or L-altro-hept-2-ulose 1,7-bisphosphate. The D isomer is part of the pentose phosphate cycle and the Calvin cycle. The winner is Peter Monaghan.
Winners
Nov. 2009: Jason Oakley, Alex Ling
Oct. 17: Bill Chaney, Roger Fan
Oct. 24: DK
Oct. 31: Joseph C. Somody
Nov. 7: Jason Oakley
Nov. 15: Thomas Ferraro, Vipulan Vigneswaran
Nov. 21: Vipulan Vigneswaran (honorary mention to Raul A. Félix de Sousa)
Nov. 28: Philip Rodger
Dec. 5: 凌嘉誠 (Alex Ling)
Dec. 12: Bill Chaney
Dec. 19: Joseph C. Somody
Jan. 9: Dima Klenchin
Jan. 23: David Schuller
Subscribe to:
Posts
(
Atom
)