Here's what he said ....
We discovered some pretty surprising things in reading out the human genome sequence. Here are four highlights.I don't know if Francis Collins still believes these four things and still believes they were discovered by those who sequenced the human genome. However, I think it's worthwhile for me to give you a slightly different perspective on what I did NOT think were highlights of the human genome sequence.
1. Humans have fewer genes than expected. My definition of a gene here—because different people use different terminology—is a stretch of DNA that codes for a particular protein. There are probably stretches of DNA that code for RNAs that do not go on to make proteins. That understanding is only now beginning to emerge and may be fairly complicated. But the standard definition of “a segment of DNA that codes for a protein” gives one a surprisingly small number of about 30,000 for the number of human genes. Considering that we’ve been talking about 100,000 genes for the last fifteen years (that’s what most of the textbooks still say), this was a bit of a shock. In fact, some people took it quite personally. I think they were particularly distressed because the gene count for some other simpler organisms had been previously determined. After all, a roundworm has 19,000 genes, and mustard weed has 25,000 genes, and we only have 30,000? Does that seem fair? Even worse, when they decoded the genome of the rice, it looks as if rice has about 55,000 genes. So you need to have more respect for dinner tonight! What does that mean? Surely, an alien coming from outer space looking at a human being and looking at a rice plant would say the human being is biologically more complex. I don’t think there’s much doubt about that. So gene count must not be the whole story. So what is going on?
2. Human genes make more proteins than those of other critters. One of the things going on is that we begin to realize that one gene does not just make one protein in humans and other mammals. On the average, it makes about three, using the phenomenon of alternative splicing to create proteins with different architectures. One is beginning to recover some sense of pride here in our genome, which was briefly under attack, because now we can say, “Well, we don’t have very many genes but boy are they clever genes. Look what they can do!”
3. The male mutation rate is twice that of females. We also discovered that simply by looking at the Y chromosome and comparing it to the rest of the genome—of course, the Y chromosome only passes from fathers to sons, so it only travels through males—you can get a fix on the mutation rate in males compared to females. This was not particularly good news for the boys in this project because it seems that we make mistakes about twice as often as the women do in passing our DNA to the next generation. That means, guys, we have to take responsibility for the majority of genetic disease. It has to start somewhere; the majority of the time, it starts in us. If you are feeling depressed about that, let me also point out we can take credit for the majority of evolutionary progress, which after all is the same phenomenon.
4. “Junk” DNA may not be junk after all. I have been troubled for a long time about the way in which we dismissed about 95% of the genome as being junk because we didn’t know what its function was. We did not think it had one because we had not discovered one yet. I found it quite gratifying to discover that when you have the whole genome in front of you, it is pretty clear that a lot of the stuff we call “junk” has the fingerprints of being a DNA sequence that is actually doing something, at least, judging by the way evolution has treated it. So I think we should probably remove the term “junk” from the genome. At least most of it looks like it may very well have some kind of function.
- Number of Genes: I don't question the fact that Francis Collins and many of his colleagues were surprised at the low number of genes in the human genome when the draft sequence first came out. Nor do I question his explanation; namely, hubris. But he should also add lack of knowledge of the scientific literature. Knowledgeable scientists knew that there should be no more than 30,000 genes and, furthermore, the discoveries in developmental biology led them to be quite comfortable with the idea that all complex eukaryotes would have about the same number of genes. I've written about the misconceptions of leading scientists over the number of genes in False History and the Number of Genes and Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome. It astonishes me that scientists working on the human genome were not aware of the scientific literature on the number of genes in the human genome.
It also astonishes me that scientists in the 21st century could still have been defining a gene as "a stretch of DNA that codes for a particular protein." I'm gradually becoming aware of the fact that this is/was a common mistake that persists to this day. It means that undergraduate biochemistry and molecular biology lecturers are not doing their job.
I also written about The Deflated Ego Problem and how human chauvinists are trying to cope with the idea that they may just be animals like all other animals. In that post I outlined seven ways that scientists with deflated egos are going to try and restore humans to top spot in the evolutionary tree [see also: Vertebrate Complexity Is Explained ...]. The seven ways are:
- Alternative Splicing
- The Abundance of Small RNAs
- The Function of Pseudogenes
- Regulatory Sequences
- The Unspecified Anti-Junk Argument
- Post-translational Modification
- Alternative Splicing: Right on cue! There are two main problems with the idea that alternative splicing is going to assuage your deflated ego. The first is that the logic of the argument is questionable. It only works if alternative splicing is common in humans—giving rise to lots more proteins—and uncommon in those "lower" organisms that we want to be superior to. But the same data that's used to infer alternative splicing in humans also shows multiple transcripts for Drosophila, mouse, and Arabidopsis genes. This argument isn't going to save you unless you invoke special pleading [An Example of Faulty Logic from Cold Spring Harbor].
The second problem is that alternative splicing is just as spurious as pervasive transcription and transcription factor binding. It's true that you can detect lots of different transcripts from most human genes. Many of them contain different combinations of introns and exons and they start and stop at many locations. The "variants" are almost always extremely rare [The most important rule for publishing a paper on alternative splicing ]. That's consistent with the high error rate of splicing [Splicing Error Rate May Be Close to 1% ]. These are nonfunctional mistakes. There are genuine examples of alternative splicing producing different proteins but these genes make up fewer than 5% of all genes in the human genome and they usually are not unique to humans or mammals.
The argument that transcripts of most human genes are alternatively spliced does not make any sense on many levels [Two Examples of "Alternative Splicing" and Making Sense in Biology and A Challenge to Fans of Alternative Splicing].
- High Mutation Rate in Males: We knew about this long before the human genome sequence was determined. The original discovery is attributed to J.B.S. Haldane in 1947 (see Crow, 1997 for a review). I don't know why Francis Collins didn't know about this before 2000.
- “Junk” DNA may not be junk after all: He's wrong about that. We know that about 90% of our genome is junk. Most of us knew that before the human genome sequence was published and the sequence provided solid evidence that we were right. For example, it turned out that half the genome was bits and pieces of defective transposons just as predicted. It turned out that there were only about 30,000 genes, just as expected. It turned out that there were large stretches of highly repetitive DNA, just as expected. It turned out that 20% of the genome consisted of highly variable intron sequences (junk), just as expected. And we soon learned that much of the rest of the genome was not conserved, just as expected.