More Recent Comments

Wednesday, November 21, 2007

Bacteria Genomes Are Degrading

 
At one point in his talk last night Kirk Durston mentioned the bacterial flagella. He acknowledged that the "Darwinists" have proposed an evolutionary pathway from a Type III secretory structure to flagella.

This pathway is improbable, according to Durston, because flagella are more complicated than secretory pores so flagella have to evolve first.

What? Yes, that's right. Scientists have now shown that the most primitive bacteria were very complex and evolution has been all downhill from then on. Modern bacteria are less complex. Thus the type III secretory apparatus had to evolve from the more complex bacterial flagella. (The actual situation is complicated [Evolution in (Brownian) space: a model for the origin of the bacterial flagellum]. What I'm addressing here is the claim of general loss of information in bacterial evolution.)

I suggested that this was bull not correct and Durston responded with a slide showing the scientific papers that proved it. The most important paper was
Mira, A., Ochman, H. and Moran N.A. (2000) Deletional bias and the evolution of bacterial genomes. Trends Genet. 17:589-96. [PubMed]
I asked Durston what would happen if I called Nancy Moran (no relation, that's her above) and asked her whether she agreed that primitive bacteria were complex and all modern bacterial lineages are losing information. He affirmed that she would and that's what modern evolutionary biologists are saying. There are other papers that say the same thing. He accused me of not being aware of them.

This is the abstract of the Mira et al. (2000) paper.
Although bacteria increase their DNA content through horizontal transfer and gene duplication, their genomes remain small and, in particular, lack nonfunctional sequences. This pattern is most readily explained by a pervasive bias towards higher numbers of deletions than insertions. When selection is not strong enough to maintain them, genes are lost in large deletions or inactivated and subsequently eroded. Gene inactivation and loss are particularly apparent in obligate parasites and symbionts, in which dramatic reductions in genome size can result not from selection to lose DNA, but from decreased selection to maintain gene functionality. Here we discuss the evidence showing that deletional bias is a major force that shapes bacterial genomes.
I think it's pretty obvious from the abstract that they're discussing a particular problem in bacterial evolution; namely selection for small compact genomes. This point is clear in the paper as well.

At no point in the paper do the authors suggest anything close to what Durston says. There's no mention of primitive bacteria having the full complexity of all modern species including the myxobacteria and photosynthetic bacteria etc. Why in the world do the Intelligent Design Creationists have to lie about things like this? (I assume it's a lie because the only other possibility is ignorance and a Ph.D. student in biophysics can't be stupid enough to misunderstand such a key principle of evolution, right?)

Naturally in a forum like this Durston had me at a disadvantage. He was displaying the scientific papers and I had to admit that I had not read them recently enough to comment. The point was not lost on some members of the audience. The atheist scientist was trumped by the religious graduate student who was more aware of the scientific literature.

"Frustrating," doesn't begin to cover it ...


44 comments :

Anonymous said...

This sounds like an increasingly common creationist theme. For example, YEC John Sanford wrote Genetic Entropy and the Mystery of the Genome:

http://www.amazon.com/Genetic-Entropy-Mystery-Genome-Sanford/dp/1599190028

Basically, since the fall, genomes have been deteriorating. It's Muller's Ratchet gone mad.

Timothy V Reeves said...

Sounds interesting. I've made a promise to myself to look into it a bit more. Hope I can keep teh promise..

Steve LaBonne said...

This is why scientists vs. creationists "debates" are nearly always a losing proposition for science. The creationists will just lie for Jesus, all the day long (think of Behe and that stack of papers in Dover- shameless), and the yahoo audience will be quite satisfied that they "won".

Anonymous said...

Larry: Your Evolution is a Fact and a Theory event at CFI should be promoted at the next lecture of this class. Perhaps you and/or some of the people from UTSA and FAC can stop by and give a quick plug to the event before class and/or post up ads and write the details on the board. Invite O'Leary herself--that would make this a very interesting inaugural Cafe Inquiry!

T Ryan Gregory said...

Moran's work has shown that genome size variation in bacteria is probably not a result of selection for faster cell division (at least, not in the comparison of culturable species) but rather is based on loss of genes among endosymbionts and parasites. This occurs primarily due to the influence of drift in combination with the well-known bias toward deletion over insertion on small scales. When bacteria become endosymbiotic, two things happen that are particularly relevant. First, they come to make use of host proteins and therefore do not use all of the genes that a free-living species uses, and second, they experience a population bottleneck such that genes that are at most under weak selection can be lost by drift. Thus, endosymbiotic bacteria exhibit small genomes without selection being necessary as the explanation.

It is evolutionary through and through and has nothing to do with what Durston said.

Anonymous said...

I'm a little surprised that a crusty old curmudgeon like you let him off the hook so easily. Why not just call his bluff and say flat out that you thought he was lying? The only way out for him would have been to put up or shut up. He would've had to discuss Mira et al (2000) in detail.

You could've arranged with any creationists in the audience that one of their number would contact your namesake to find out if the paper really said what Durston claimed it was saying, the proviso being that they would have to publish the answer on some agreed website so all could see.

The thing about these debates is that they are as much if not more about entertainment than information. People were disappointed with the debate between Christopher Hitchens and Dinesh D'Souza because there was no clear-cut win by Hitchens. They wanted to see D'Souza reduced to stuttering incoherence. They wanted to see the little twerp squelched. They wanted to see blood on the carpet.

In this case, it needed to be his blood not yours. You should've gone for the jugular and said that either he'd read the paper and misunderstood it - in which case he was incompetent - or he'd read it, understood it but was misrepresenting it - in which case he was a liar. When he came back at you about your reading of the paper you could then have said this was now a case of word-against-word and the best solution was to go to the horse's mouth - ask the paper's authors what they meant it to say.

Whatever his other failings, Hitchens is good in these debates because he's not inhibited by the scientist's natural caution about overstating a case. He can be as outrageous as he likes because he knows its just theatre and an audience likes a good show. Any scientist who wants to get into a debate with these people could learn a thing or two from him.

Sigmund said...

Ian, Larrys point was that he hadn't read the paper in question. He cannot have an opinion about the data presented in a particular paper when he hasn't read it yet. That was the whole point why this paper was introduced and why many other papers are introduced and purposefully misinterpreted by creationists.
Nobody has read all papers on any one subject so to have a list of random papers handy is an effective tactic to counter any scientific attack. Remember, theres only one real commandment for scientists "Thou shalt not lie". For creationists this one is apparently optional.

Timothy V Reeves said...

Where have I gone wrong here Larry? As a result of your encounter with Kirk Durston, I found my way to this Starter’s Paper. Most of his points ‘falsifying’ evolution I could see a possible way out, but the following one stumped me (Editors note: ‘stumped’ is a cricketing term).

Durston, using what I assume are valid and correctly interpreted references, suggests that if we take genes encoding functioning proteins and imagine them to be embedded in ‘sequence space’ (that is the network formed by linking all possible DNA sequences separated by single base changes), then the regions surrounding these gene do not, as rule, encode for stable folded proteins and therefore proteins that do anything useful. He claims that computer simulations suggest that the shufflings of evolution are very unlikely to be able to ‘jump’ across the desert of rubbish that isolates genes encoding functioning proteins. (In fact he says that the best you’ll get out of evolution is at most 100 bits of DNA encoded information when in fact you need at least 500 bits for the average protein).

So where’s the wool being pulled over my eyes? Now, even if Durston’s point is correct that doesn’t mean we then jump on the ID bandwagon, but ostensively there seems to me a problem here for the current mechanism of evolutionary theory. Could there be ‘fibrils’ of structural viability through ‘sequence space’ that are difficult to pick up? (bit like the connectivity the Mandelbrot set) – after all, sequence space is a highly complex beast. Or what?

Has Durston bowled me a googly? (Editor’s note: apologies for the cricketing terms – try google=bowl+googly for meaning)

Anonymous said...

In the discussion you mention Timothy, Durston references the work of Douglas Axe. Larry points to this research in the thread below and Arthur Hunt links to a discussion he wrote of these papers at Pandas Thumb:

http://pandasthumb.org/archives/2007/01/92-second-st-fa.html

The Blanco paper is not one I'm familiar with (Art - maybe a PT post is in order!?). For completeness sake this is the reference. I've bolded what I assume to be the crucial point from the IDists perspective (also note that WoK reveals that this paper was referenced by Meyer in his now imfamous article and also Axe):

We have examined the conformational properties of 27 polypeptides whose sequences are hybrids of two natural protein domains with 8% sequence identity and different structures. One of the natural sequences (spectrin SH3 domain) was progressively mutated to get closer to the other sequence (protein G B1 domain), with the only constraint of maintaining the residues at the hydrophobic core. Only two of the mutants are folded, each of them having a large sequence identity with one of the two natural proteins. The rest of the mutants display a wide range of structural properties, but they lack a well-defined three-dimensional structure, a result that is not recognized by computational tools commonly used to evaluate the reliability of structural models. Interestingly, some of the mutants exhibit cooperative thermal denaturation curves and a signal in the near-ultraviolet circular dichroism spectra, both typical features of folded proteins. However, they do not have a well-dispersed nuclear magnetic resonance spectrum indicative of a defined tertiary structure. The results obtained here show that both the hydrophobic core residues and the surface residues are important in determining the structure of the proteins, and suggest that the appearance of a completely new fold from an existing one is unlikely to occur by evolution through a route of folded intermediate sequences.

Durston, funnily enough, doesn't discuss the conclusions of the paper in their entirety (haven't checked to see if Meyer and Axe do). It's less a googly, more of an outright beamer:

It is argued that for highly optimized sequences (high foldability) the sequence space allowed could be large enough to overlap with other spaces determining another fold, so that it could be reached within a valid
(all the sequences being foldable) evolutionary trajectory (Govindarajan & Goldstein, 1997b). Our experimental results suggest that this is unlikely to happen. However, nature could have other means to preserve evolving sequences not able to fold.
For example, whenever the function of the original folded sequence is maintained by gene duplication,
one gene will be able to become non-functional (pseudogene) as long as the other remains functional
(Haldane, 1933). The fact that most of the hybrid sequences survived in the bacterial environment, and that there are several examples of proteins that exist in ``native unfolded states'' (Weinreb et al., 1996), indicates that the avoidance
of aggregation or degradation is not a major obstacle to create protein structure diversity
through random sequence drifting.


There are also a whole bunch of papers dealing with protein evolution and fitness landscapes and whatnot. I can't be arsed going into them right now. However, a couple that caught my eye are:

Alexander, P.A. et al. (2007) The design and characterization of two proteins with 88% sequence identity but different structure and function. PNAS, 104, 11963-11968.

These authors conclude that:

The fact that 49 aa in these proteins are compatible with both folds shows that the essential information determining a fold can be highly concentrated in a few amino acids and that a very limited subset of interactions in the protein can tip the balance from one monomer fold to another. This delicate balance helps explain why protein structure prediction is so challenging. Furthermore, because a few mutations can result in both new conformation and new function, the evolution of new folds driven by natural selection for alternative functions may be much more probable than previously recognized.

A slightly earlier paper is:

Aita, T. et al. (2003) An in silico Exploration of the Neutral Network in Protein Sequence Space. Journal of Theoretical Biology, 221, 599-613.

Designating amino-acid sequences that fold into a common main-chain structure as “neutral sequences” for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (α, β,α/β and α+β types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the “interchange” regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a “ball” in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5–30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.


In short, I'm not massively knowledgable in this area, but it seems that Durston is presenting a somewhat biased look. People more familiar with this research (plus more informed about evolution generally) than I would no doubt be able to fully enlighten you as to this significance of this supposed barrier to evolution.

Timothy V Reeves said...

Thanks for that Stevef, I'll give it some study.

Kirk Durston said...

After reading Larry's outrageous misrepresentation of my lecture, I have to confess that I'm upset, so permit me to rant a bit, and then I'll calm down. There is a Shakespearean phrase that goes something like, 'methinks the lady protests too loudly'. It is often the case that the person who is most free with their accusations is the one who is most guilty. Larry's version of my lecture is a massive misrepresentation. I note that some of those who offer comments to Larry's post demonstrate a thoughtful and collegial attitude. That's more like it. Forget Larry's version of Mira. Read the entire paper for yourself. My point was that there is evidence that the functional information in bacterial genomes is degrading. There are a few other papers that suggest it may also be happening in certain eukaryotes as well. You can call this evolution if you wish, but the trend is going in the wrong direction. This is not 'proof' of information degradation over time, it is merely evidence for it. If we step out of the creative story telling world of the Darwinist and into the real world, this should not be such a shock. That is the way things go in the real world. It just means that the Darwinist has to be a little more creative in his story telling ... maybe even argue that this is predicted by Darwin's theory. However, we must not confuse creative story telling with doing science.

Now that I have calmed down somewhat after seeing myself massively misrepresented, let me try to respond to one or two points raised in some of the comments.

1. stevef, makes a number of thoughtful comments re. Blanco's paper and others. By way of response, if regions of stable folded sequence space seem to be surrounded by non-folding sequence space which have no phenotypic effect, then, as Blanco suggests, these regions will have to be crossed via a random walk since natural selection cannot operate in regions with no phenotypic effect. Since the same point in sequence space can be crossed more than once in a random walk, the probability of crossing a non-folding region decreases by orders of magnitude, and the appropriate equations can be derived. That is not to say that it cannot be crossed by a redundant, duplicate gene evolving its way through sequence space. If the next area of folding sequence space is not too far away, it would be reasonable to expect that it could be crossed. Axe's work suggest the regions are not close. What Larry did not mention is that I suggested that one way life could evolve in spite of folding sequence space being quantized is if there are 'chains' of islands. One can imagine an archipelago of the necessary islands of folding sequence space for organic life which could make it relatively easy for even a random walk to perform an evolutionary search and successfully find them all. Whether this is the case or not is a research problem (which I also mentioned in my lecture). The tools and data are already available, at least for the protein families, to test this hypothesis out.

2. stevef also makes the very good point that more than one fold and function can be coded for by sequences that have a high degree of sequence similarity. Larry neglected to mention that I pointed this out in my lecture. In the lecture, I referred to these islands of folding sequence space as coding for 'fold/function' sets. There is evidence that the same sequence can produce two entirely different folds. There is also evidence for meta-stable states in protein folding, where two or more folds may be possible even though energy is not minimized for one of them. Closely related is the evidence that with only a small change, sometimes as little as one amino acid, an entirely different conformation can result. I also hold that the 'distance' in functional information between folds/functions within the same fold/function set is very low and can easily be attained by evolutionary processes. This is not the problem. The problem is crossing the non-folding regions of sequence space to find other locations that code for a different fold/function set. My argument is that if one downloads one or two thousand aligned sequences from PFAM, what one is looking at is a record of an evolutionary search that organic life has performed for that region of sequence space. My hypothesis is that the sequences that produce stable, functional folds are preserved and those that don't are not. Thus, life has been 'mapping' sequence space in terms of the folded functional regions. That map is available to us through data bases such as PFAM. Of course, the universe will not last long enough for life to map all of sequence space, but I have found that 500 sequences start to level out at a particular functional information level. Typically, I like to use at least 1,000 to get a more accurate estimate of the size of sequence space for any particular protein family. One can then use computational methods to measure the evolutionary distance between protein domains. Using this method, one can sort domains into fold sets and get a very good idea of how far apart different folds sets are, when it comes to structural domains.

Larry Moran said...

Kirk Durston says,

My point was that there is evidence that the functional information in bacterial genomes is degrading.

That's not what you said. You said that scientists have concluded that primitive bacteria were advanced and scientists have shown that evolution is just leading to a loss of information.

I challenged you on that very point. I asked you whether Nancy Moran would agree if I phoned her. I specifically asked you whether she would agree that ancient bacteria were complex and everything is downhill from then on.

You said "yes" that's what the paper said and then you pointed to the reference on the screen and asked me if I had read the paper.

Like a typical Intelligent Design Creationist you lie about the real science. And like a typical Intelligent Design Creationist you try to wiggle your way out of it whenever someone points out your lies.

Now that I have calmed down somewhat after seeing myself massively misrepresented, let me try to respond to one or two points raised in some of the comments.

I have not misrepresented you on this point. You were dead wrong about your claim on Tuesday night. Why not pretend to be a scientist and admit it?

Kirk Durston said...

Larry, it is dishonest of you to leave out the part where I specifically addressed you and stated that I would assume that N. Moran would likely hold to the idea that the information contained in ancient bacterial genomes had been achieved through normal evolutionary processes, but were now undergoing a 'net deletional bias'. It is important that you not misrepresent people by conveniently leaving out the bits that might dull your point. By leaving that part out, it paints a dishonest picture of what I was suggesting. Frankly, I do not know or care what she actually believes, but she put her name on that paper as a co-author. Larry, science is not about who believes what or how many people believe this or that. As I already suggested to those who read your version of my lecture, they need to read the paper for themselves. It would be more productive if you did not waste your time personally attacking people and thoughtfully discussed the issues like some of the others who commented do.

Larry Moran said...

Kirk Durston says,

Larry, it is dishonest of you to leave out the part where I specifically addressed you and stated that I would assume that N. Moran would likely hold to the idea that the information contained in ancient bacterial genomes had been achieved through normal evolutionary processes, but were now undergoing a 'net deletional bias'. It is important that you not misrepresent people by conveniently leaving out the bits that might dull your point. By leaving that part out, it paints a dishonest picture of what I was suggesting.

You are correct. You went out of your way to state that Nancy Moran did not dispute evolution. According to you, she merely believes that ancient bacteria were very complex and all their genomes have degraded since then.

I did not address this issue. I don't think anyone thought you were claiming that Nancy Moran was an Intelligent Design Creationist.

As I already suggested to those who read your version of my lecture, they need to read the paper for themselves. It would be more productive if you did not waste your time personally attacking people and thoughtfully discussed the issues like some of the others who commented do.

I have read the paper and I posted the reference so anyone else can read it as well. It should be perfectly obvious that she is talking about genomic deletions in a particular context. There's nothing in the paper that addresses the point you were claiming—namely that ancient bacteria were more complex than modern bacteria.

You said that "scientists" have now concluded that ancient bacteria were complex and ever since then bacterial lineages have been losing information. That's a lie. By now anyone who has read the paper will see that it's a lie.

Larry Moran said...

Kirk Durston says,

What Larry did not mention is that I suggested that one way life could evolve in spite of folding sequence space being quantized is if there are 'chains' of islands.

Right. Since I didn't even discuss the details of your hypothesis it's not surprising that I didn't mention this point.

However, now that you bring it up, I did ask you during the lecture to tell us how many possible islands there are in the adaptive landscape. I wanted to know what proof you had that the islands of known folds are the only possible folds that could exist. It's an important point. Are there "chains" of islands throughout the landscape? How would you know?

You looked very puzzled when I asked this question, like you had never thought of it before. After listening to you mumble something that made no sense I told you to proceed and forget about it.

stevef also makes the very good point that more than one fold and function can be coded for by sequences that have a high degree of sequence similarity. Larry neglected to mention that I pointed this out in my lecture.

Again, it's hardly surprising that I didn't point this out since I never described the specifics of your hypothesis. I was only addressing the flaws in the logic.

Kirk, did you actually read my postings or are you just imagining what I said? Why not address the actual points that I raised instead of trying to move the goalposts?

Kirk Durston said...

I've certainly have thought about how many different protein folds there might be and have several papers on my hard drive that discuss it. It is an area of interest in my own research. I wasn't 'puzzled', but instead surprised that you would suggest there were 'millions' in light of the literature that is out there on this subject. There is ongoing research into the classification of protein folds. The general trend seems to be towards thousands or a few tens of thousands, but certainly not millions as you suggested. For example, see NATURE | VOL 416 | 11 APRIL 2002 | www.nature.com page 657, 'A periodic table for protein structures' by Taylor. See also 'Estimating the Number of Protein Folds and Families from Complete Genome Data' Yuri I. Wolf1,2, Nick V. Grishin1* and Eugene V. Koonin in JMB, Vol 299, 897-905. Your thinking was that if sequence space is filled with 'millions' of folds, then finding novel fold sets via an evolutionary search is not that difficult. However, current science suggests that it isn't millions, but thousands. Finding a novel fold in sequence space is a genuine problem that Koonin suggests can be surmounted if there are an infinite number of worlds (see Eugene Koonin, ‘The cosmological model of eternal inflation and the transition from chance to biological evolution in the history of life’, Biology Direct, 6/27/2007). I also pointed this out in my lecture.

NickM said...

That last comment assumes that you know what papers the creationist is going to talk about before the talk, which you usually don't.

NickM said...

Hmm. My previous comment was actually referring to this post which is no longer the last:

In this case, it needed to be his blood not yours. You should've gone for the jugular and said that either he'd read the paper and misunderstood it - in which case he was incompetent - or he'd read it, understood it but was misrepresenting it - in which case he was a liar. When he came back at you about your reading of the paper you could then have said this was now a case of word-against-word and the best solution was to go to the horse's mouth - ask the paper's authors what they meant it to say.

RPM said...

You can call this evolution if you wish, but the trend is going in the wrong direction.

This is a tell-tale sign that you do not understand evolutionary biology. There is no direction. There is no march of progress. There is no goal. It's just change over time.

Larry Moran said...

Kirk Durston says,

I've certainly have thought about how many different protein folds there might be and have several papers on my hard drive that discuss it. It is an area of interest in my own research. I wasn't 'puzzled', but instead surprised that you would suggest there were 'millions' in light of the literature that is out there on this subject.

Just as I expected. You didn't understand the question.

I wasn't questioning how many different folds are known in biology. I was questioning your assumption that these are the only possible folds. I asked you whether there could be thousands of other peaks in the landscape that don't happen to be part of the sample that survived.

In other words, how do you know there aren't many connecting chains between each peak that's represented by a modern protein family?

Larry Moran said...

Kirk Durston says,

Finding a novel fold in sequence space is a genuine problem that Koonin suggests can be surmounted if there are an infinite number of worlds (see Eugene Koonin, ‘The cosmological model of eternal inflation and the transition from chance to biological evolution in the history of life’, Biology Direct, 6/27/2007). I also pointed this out in my lecture.

I was amused that you quoted Koonin's work and even more amused that you specifically referred to the comments of his reviewers.

I'm sure others will not be surprised that Intelligent Design Creationists have homed in on Koonin's ideas like a moth to a flame.

As a student of this field you certainly must know that Koonin's ideas about evolution are far from mainstream. In fact, it seems likely that Koonin doesn't understand the basic principles of evolution [Eugene Koonin and the Biological Big Bang Model of Major Transitions in Evolution].

Many people predicted that the Intelligent Design Creationists would gleefully quote Koonin in support of the idea that evolution is in trouble. You guys are so predictable.

ERV said...

hehehehe!

PUBJACK!!!

Kirk Durston said...

Well Larry, since you are so interested in honesty, you did suggest 'millions' in my lecture, not thousands like you just claimed here. Current research in classifying protein folds is not merely concerned with just known protein folds, contrary to what you just implied in your response. Rather, we are concerned with how many different folds are permitted by physics. The general consensus is that regions of sequence space that code for stable folded proteins are exceedingly rare. Your hopeful suggestion is that there are bridges or filaments that may help connect these exceedingly rare regions. That would be a massive stroke of luck indeed. Some work suggests that regions of fast folding sequence space are surrounded by slow-folding, low stability shells (see PNAS -- Nelson and Onuchic 95 (18): 10682). Quite apart from this, in my lecture I presented a method to test what you suggest, that consisted in mapping sequence space with the data from PFAM. In general, what I see from my own work is that the larger the protein, the more sparse stable, folding sequence space seems to be. It may be that two islands of fold/function sets are close together, but there is no reason at present to suspect that all protein folds necessary for biological life are fortuitously close together in sequence space.

I am aware that various Darwinists are now tryng to oust Koonin from the fold with the idea that 'real' scientists will not ask embarrassing questions about Darwinian theory, or point out problems that could potentially bring the entire 19th century theory crashing down around the ears of its devotees. Koonin isn't the only person pointing out the extremely low probabilities involved in 'finding' folding proteins. In my lecture, I pointed out how you can find out for yourself how scarce stable folded proteins are. Simply download the entire aligned set of proteins for a family. Choose one that gives you at least 1,000 sequences. You can then go down each column in the aligned set and compute the frequency of occurrence of each amino acid at each site. I've written some software that will do just that. Once you've done that, you are now in a position to estimate the size of sequence space for that particular protein family and, from that, the probability that a sequence will fall into that region. I've done that for 35 proteins. Bottom line: Koonin is pointing out a serious problem that Darwinists do not want to acknowledge and the problem is thinly veiled in the PFAM data base for the whole world to see. It's time for Darwinists to pull their head out of the sand and do the work. Ostracizing those people who are asking embarrassing questions is not good science. You need to do the work. If you think there are filaments joining different island fold sets, do the work. The computational capability and the data is already available. Do the work. Those that are, are starting to ask embarrassing questions that you probably do not want to hear. So just do the work yourself if that is what it takes.

ERV said...

I am aware that various Darwinists are now tryng to oust Koonin from the fold with the idea that 'real' scientists will not ask embarrassing questions about Darwinian theory...
**VOMIT**

Excuse me, do you even know who Koonin is, other than the fact he is the owner of a publication you want to *commandeer*? No one is trying to *oust* him from any *folds*. Hes one of the supporters of one of my favorite ideas. My mentors have always encouraged thinking games, whether youre ultimately right or wrong-- Theyre fun, and you learn things (Im sure Koonin learned some things from his reviewers... thats why we have peer review).

The problem with professional scientists writing their thinking games for others to read NOW is you Creationists coming in and pubjacking and quote-mining like a bunch of jerks and ruining the fun for the rest of us.

I repeat, **VOMIT**

Anonymous said...

There is a Shakespearean phrase that goes something like, 'methinks the lady protests too loudly'.

Thanks. That was a work of fiction, IIRC.

It is often the case that the person who is most free with their accusations is the one who is most guilty.

Thanks. Looks like another ID theory in the making. You guys should formalise that or something.

Watch out for the mean peer reviewers though. They might try to oust it or something. Then they will try to cover their tracks of course. Watch out for bigfoot though. Kudos.

Anonymous said...

Kirk:

Personally, I would agree with you that stable, and especially functional, regions of sequence space are fairly isolated. I've seen a lot of published and unpublished work to that effect.

However, your larger conclusion is entirely unfounded. If even true that the regions are highly quantized, it may simple be that they are so only from the perspective of current biochemistry and evolutionary fitness demands.

There is also much research showing simpler peptides have much more common stable folds. In a precursor to the modern cell, proteins with a reduced amino acid composition were very likely.

And further, the demands upon specificity and efficiency of these proteins were likely far less. It's even possible that initial production of protein was a non-functional by-product of the metabolic system.

In such cases, the distance between stable regions was far more accessible, and the modern, but inaccessible regions of stable sequence space were originally populated by "primitive" precursors.

So it's not that we once had a more "advanced" system that has since degraded, but that we had a more generic system that has since specialized. And that last sentence is not a terrible definition of evolutionary processes.

Anonymous said...

Mr. Durston (Kirk, if I may),

Not to get off topic here, but I think this is at least somewhat relevant. You are the director of the New Scholars Society. http://www.newscholars.com/ The statement of faith of your society reads, in part:

"The sole basis of our beliefs is the Bible, God's infallible written Word, the 66 books of the Old and New Testaments. We believe that it was uniquely, verbally and fully inspired by the Holy Spirit, and that it was written without error (inerrant) in the original manuscripts. It is the supreme and final authority in all matters on which it speaks."

I have heard you speak, and as best as I can summarize it (please correct me if I am mistaken), your exegetical understanding of the Bible, which you apparently believe to be inerrant and the supreme final authority, is that God created the universe and everything in it in 6 days about 6000 years ago. Furthermore, you believe that at least some science supports this claim. I’d like to know what science you think supports such a young age for the earth. Is the Mira, et al. paper part of this “supporting evidence”?

Torbjörn Larsson said...

IANAB, but as there is so much independent evidence that supports evolution beyond reasonable doubt, it is IMHO to put the cart before the horse to ask if there is any difficulties for the process pathways.

The right question would be, how is any observed difficulties surmounted? And if there now is a reasonable effort to classify protein folding and stability, this look like an exciting area.

Btw, Timothy links to New Scholars Society, which purports to be "... an affiliation of Canadian, christian university professors. Our motto is "Petere Veritas", which means, "pursue truth"." But instead of pursuing facts and other truths by science it's library of "Available papers" is filled with apologetics (mostly by Durston), which as we all know for a fact at most can be assured to be valid logic (it fails though) but never truth as there is no evidence.

Durston says,

a Shakespearean phrase that goes something like, 'methinks the lady protests too loudly'.

"Hamlet: Madam, how like you this play? Queen: The lady doth protest too much, methinks. [Hamlet Act 3, scene 2, 222–230]"

Which is an ironic way to start a rant.

Finding a novel fold in sequence space is a genuine problem that Koonin suggests can be surmounted if there are an infinite number of worlds (see Eugene Koonin, ‘The cosmological model of eternal inflation and the transition from chance to biological evolution in the history of life’, Biology Direct, 6/27/2007).

Koonin's cosmological thinking is IMO confused, in much the same way that creationists confuse a priori probabilities with a posteriori likelihoods.

The weak anthropic principle is used in cosmology to predict likely values (conditional on the weak anthropic principle). It is not preferred as an open unfalsifiable “just so” description for finetunings or other low likelihood scenarios.

I don't think it does anyone credit to refer to these speculative and from a physics view IMHO odd papers of Koonin. (Who seems to interchange a spat of solid papers by speculative ones, much as Max Tegmark in cosmology. And like Tegmark, maybe Koonin sometimes strike out.)

If we step out of the creative story telling world of the Darwinist

Either you haven't read Moran on evolution much, or you are in creationist mode.

Torbjörn Larsson said...

Hes one of the supporters of one of my favorite ideas.

Contrary to what my previous comment may suggest, as a naive layman I like this particular paper too. It builds a large speculative model perhaps without (yet) suggesting much of testable predictions, compared to say Forterre's more minimal "three viruses, three domains" hypothesis. But I find it really stimulating, and I'm glad professionals reacts the same - perhaps something eventually comes out of this or similar suggestions.

And besides, anything that attempts to embed viruses and other non-cellular evolutionary biological replicators into the old domain description must be lauded, right? :-P

Anonymous said...

I pointed out how you can find out for yourself how scarce stable folded proteins are. Simply download the entire aligned set of proteins for a family. Choose one that gives you at least 1,000 sequences. You can then go down each column in the aligned set and compute the frequency of occurrence of each amino acid at each site. I've written some software that will do just that. Once you've done that, you are now in a position to estimate the size of sequence space for that particular protein family and, from that, the probability that a sequence will fall into that region.

No, you are not in a position to estimate that from your phylogenetic exercise! You'd have been better off sticking with your "protein families are few and far between" argument. Looking within single protein families, you are looking at a phylogenetically biased, related sample! Therefore lots of the proteins will be similar because they inherited similar forms from ancestors, not necessarily because they have explored the entire available sequence space for that family!

This is kiddie phylogenetics - no offense, kiddie stuff has its place - but this is like estimating the probability of limb numbers changing through evolution by looking at the variation in limb number within mammals.

Timothy V Reeves said...

Reading and re-reading ‘The Hulk vs The Thing’ exchange between Larry and Kirk and endeavoring to boil it down to something definite I conclude that my original query doesn’t have an unequivocal answer: Larry suggests that chains/filaments/fibrils may run through sequence space facilitating standard evolutionary mechanisms. Kirk suggests that this is an area of active research, although he feels that the direction this research is going in favors his views. (That is, he is extrapolating what he sees as a trend).

As a theist who favors evolution, (but I try not to be dogmatic), let me ask Kirk this theological question: there is a measure of uncertainty as to whether the set of stable/functional proteins are sufficiently connected to facilitate evolution, but how would Kirk react if it was discovered that sequence space was so resourced to favor evolution? What if God created a contingent world with the necessary features and potential to ‘seek and find’ the complex adaptive systems of life? Why is the absence of a sequence space connected with fibrils/chains/filaments so important to Kirk? In our state of knowledge a physical world resourced to evolve complex adaptive systems seems at least to have the status of mathematical possibility. If this possibility should eventually become unequivocal (some might regard the paleontological evidence in favour of evolution as unequivocal) would Kirk lose his killer evidence for a Creative deity? I assume Kirk thinks of complex configurations of life as necessarily ‘given’ just as the backdrop of some more basic physical regime is inevitably a ‘given’. ‘Givens’ raise philosophical questions about their ultimate origin. True, until some hard irreducible given is arrived at we can put off the challenging day when we have to face the intractable epistemological problems that the ultimate ontology presents. But as a theist who favors evolution let me express the opinion that it causes needless provocation to raise the epistemological barriers presented by ‘given’ contingences too early in our probings toward ultimate origins.

lee_merrill said...

> Windy: Looking within single protein families, you are looking at a phylogenetically biased, related sample! Therefore lots of the proteins will be similar because they inherited similar forms from ancestors, not necessarily because they have explored the entire available sequence space for that family!

Not to speak for Kirk, but if I'm understanding correctly, the point is not that the entire space has been explored, but that a sample was explored comparable to what other protein families explored.

The question then becomes whether this sample was representative, the general papers on this subject seem to indicate that they are, so then empirical results matching theoretical ones gives reason to accept the conclusion.

Kirk Durston said...

A couple quick comments and my apologies to all those who raised points that I won't have time to respond to. I can't even afford to take the time I have been taking to respond to these threads, and I'll likely have to bow out of this discussion after today .... but please at least see comment (1) below:

1. Please see my suggestion for a live event as proposed in my response to 'Kirk's argument for God' thread that Larry started.

2. I would appreciate it if individuals could abstain from speculating about what Kirk Durston believes and then posting their assumptions. For example, the post about how old Kirk Durston thinks the universe is. There are different exegetical views on Genesis 1 & 2, and I'm not about to bet the farm on any particular view, nor do I feel qualified or knowledgeable enough to do so. The NSS permits it's members to work through their own exegetical approach to the issue. I do want to note, however, that Genesis 1:1 states that 'in the beginning, God created the heavens (cosmos) and the earth and the earth was without form (presumably not yet accreted) and empty. There is no indication whatsoever when this occurred. Therefore, I think it is exceedingly risky for theologians to decide when this happened. As for me, if the origin of the cosmos was 15 billion years ago, so be it.

2. Re. Windy's and lee merrill's comments re. exploring sequence space: Windy, I look at entire protein families (no phylogenetic restrictions). I tend to agree with Lee but I want to add a few things. First, I've looked at several universal proteins which, presumably, have been in place since the LUCA. Presumably, they've been around for a long time in all known life forms and have had plenty of time to sample their area of stable folding sequence space. How can we test this? One way is to plot the functional information vs. sample size. Functional information can be defined as the information required to find any sequence that produces the given functional fold. As mutations, indels, etc. occur, sequence space is sampled, those areas that still enable the same fold/function can be preserved in the record of the genomes of life. If just one sequence will work, then the information required to find it is extremely high. However, if 10 sequences will work, then the functional information required to find a working sequence is reduced. For an average protein of 300 amino acids, there will not be time enough in the history of the universe to completely sample all of 300 amino acid sequence space. The question is, however, 'has there been an adequate sampling of sequence space to give us a decent idea as to its size?' A fair amount of sampling seems to have gone on, inferring from our observations that, at many sites, all common 20 amino acids are observed to occur. We can do better than this, however. What I did was to write an algorithm that computed functional information for various sample sizes and then I plotted the results. If there is an adequate sampling of sequence space, the curve should approach a horizontal asymptote after an adequate sample size has been reached. I found that one needed at least 500 sequences before the curve started to level out, although this is also dependent upon the size of the protein. For this reason, if PFAM cannot supply me with at least 500 sequences for a protein family, then I have not analyzed it. My preference is for at least 1,000 aligned sequences. Of course, PFAM is not error free either. It uses, among other things, a HMM for a significant part of its sorting. False positives will lower the horizontal asymptote. Also, assuming site independence will lower the asymptote as well. bottom line: according to this method, we have many protein families which appear to offer a good representative search of sequence space such that we can get a reasonable estimate of its size. By the way, my current phase of research involves locating pairwise and higher order relationships within a entire protein family structural domain. Initial results with ubiquitin suggest that this is a more effective way to expose the 'secrets' of the structure for a given domain. In other words, I may be able to estimate the size of a given fold-set area with far fewer samples than 1,000. This, by the way, is incidental to my research. My goal is to be able to computationally predict 3-D structure of domains that have not yet had their structure determined through experimental methods such as NMR or X-ray crystallography.

2. Re. Larry's suggestion of fibrils, bridges, etc.: within a given region of stable folding sequence space, fibrils and bridges seem to exist. Think of a sponge floating on an ocean of non-stable folding sequence space. One or two, or maybe even three entirely distinct folds may be encoded within this fold-set region. I would describe it as a fold family, the exact fold which will be determined by the meta-stable states permitted by that area of sequence space. I've seen a more detailed graphical illustration of Doug Axe's work, which portrays a fold-set region as a tight cluster of islands joined by short bridges or even not joined, but still very closely situated to the main island. Darwinian processes can readily take place in a fold-set 'island' or 'cluster' to fine tune a protein for the given organism. The problem as I see it is that these fold-set clusters or islands appear to be extremely rare in sequence space such that finding another fold-set island or cluster for an average structural domain will require a random walk search across non-stable-folding sequence space that will take more trials than what we have available in the entire mutational history of organic life (in general). I doubt that the islands are uniformly distributed in sequence space so it is possible that some are attainable within a reasonable search time and others are not.

2. re. anonymous's point that simple proteins are easier to obtain that those that are not: I certainly agree. In fact, the degree of simplicity can be estimated by computing the functional information required to find them and my own results support this. As a result, instead of concerning myself with entire proteins, I am beginning to focus on structural domains, which are the simplest, stand-alone component of a protein. My working hypothesis is, 'if the structural domains can be found within a reasonable search of sequence space, then the origin of entire proteins should be vastly simplified.' I've only begun the work in that area and have only looked in detail at ubiquitin so far. I'd like to look at 50 different structural domains over the next year and only then will I be in a position to start drawing some good inferences.

3.I really cannot afford to spend this kind of time, as much as I am enjoying civl discussion on these issues. I'll have to bow out today, but please see (1) for my proposal of a live seminar chaired by Larry, within which everyone's questions and objections could be raised and, hopefully, responded to.

Anonymous said...

Btw, Timothy links to New Scholars Society, which purports to be "... an affiliation of Canadian, christian university professors. Our motto is "Petere Veritas", which means, "pursue truth"."

That's some rich irony. I love how the ID folks are desperate for the trappings of legitimacy, if not the substance. Their "Latin" motto is not only grammatically incorrent -- "to pursue truth" should be "petere veritatem" -- but the primary meaning of the verb petere is usually a hostile sense. Their motto might as well mean "to attack the truth".

SteveF said...

In terms of wandering around through sequence space, I glanced at a paper quickly today. Haven't entirely got to grips with it yet (I'm a geologist so this is way out of my area of expertise), so I might be wrong. Anyway, what the authors did was:

To each of the 1000 arbitrarily chosen sequence segments the following simple procedure was first applied. The initial segment was consecutively compared with all 20 aa fragments of the proteomic database. After the first similar sequence fragment was encountered (60% identity), the same search with this new fragment was conducted and continued with each succeeding fragment. The process stops when no new fragments could be found. The sequence segments, similar to their immediate neighbors, make a “walk”, a pair-wise connected list of 20 aa long sequences. The same procedure was applied to 1000 fragments taken from shuffled sequences.

Resulting in the conclusion that:

The results of this work suggest simple procedures for exploring the space of natural protein sequence fragments. Contrary to random sequence space, the walks in natural space are significantly longer. This means, that the natural space, for the same number of sequence fragments, has substantially more connections between similar fragments

My tentative first reading is that these connections will facilitate movement around sequence space. Could be misreading it though.

Frenkel, Z.M. and Trifonov, E.N. (2007) Walking through protein sequence space. Journal of Theoretical Biology, 244, 77-80

Larry Moran said...

Kirk Durston says,

Well Larry, since you are so interested in honesty, you did suggest 'millions' in my lecture, not thousands like you just claimed here.

I don't remember if I said millions or thousands. It doesn't matter. The point is whether you have any evidence for the total number of *possible* stable folded proteins as opposed to the total number of observed folds that have survived selection and drift.

Current research in classifying protein folds is not merely concerned with just known protein folds, contrary to what you just implied in your response.

Current research is focused almost exclusively on classifying known folds and in trying to discover whether they are evolutionarily related. There's some work on trying to identify the universe of all possible protein folds but not much compared to the work on existing folds.

I notice in your responses you refer frequently to Pfam for data in support of your case. That database only includes existing proteins, as you well know.

Finally, I am quite familiar with analyzing the amino acid sequences of proteins families. In a modern protein there are many restraints on those sequences that have nothing to do with the "fold" and everything to do with function. For example, some amino acid residues are invariant (or nearly so) because they make up part of the active site of the enzyme. Others are constrained because they serve as binding sites for other emzymes or ligands.

It is perfectly reasonable to imagine that these sites were more relaxed in the past when the protein was first evolving. Thus, the amount of sequence space taken up by existing proteins is considerably less that the amount taken up by that protein before it was selected for specific functions like binding allosteric effectors or other proteins.

How do you account for this in your analysis of existing gene families?

Kirk, I'm not pretending that I have the answer to the problem. What I object to is your conclusion. You've done a few calculations that make a prediction and your prediction is falsified by the data. Therefore, you conclude that proteins are intelligently designed. That's not science.

In another posting I'm going to think about how you would present this result in your thesis. I hope you'll respond.

Anonymous said...

Not to get off topic here, but I think this is at least somewhat relevant. You are the director of the New Scholars Society. http://www.newscholars.com/ The statement of faith of your society reads, in part:

"The sole basis of our beliefs is the Bible, God's infallible written Word, the 66 books of the Old and New Testaments. We believe that it was uniquely, verbally and fully inspired by the Holy Spirit, and that it was written without error (inerrant) in the original manuscripts. It is the supreme and final authority in all matters on which it speaks."

Who would ever think such a thing. There is no possible way to know that. Even if they pretend like they talk to voices in their heads (which they do), there is no way to know that their imaginary voice friends even know what they're talking about. That's just kooky.

Anonymous said...

I look at entire protein families (no phylogenetic restrictions). I tend to agree with Lee but I want to add a few things. First, I've looked at several universal proteins which, presumably, have been in place since the LUCA. Presumably, they've been around for a long time in all known life forms and have had plenty of time to sample their area of stable folding sequence space.

No, I didn't mean that you sample only a limited amount of organisms. I meant the relatedness of the protein family. It has its own phylogenetic contingency and constraints in its evolution, as pointed out above.

A fair amount of sampling seems to have gone on, inferring from our observations that, at many sites, all common 20 amino acids are observed to occur.

Stable folded proteins aren't that rare, then?

CC said...

The more one reads Mr. Durston, the more one gets just plain weirded out by his bizarre positions on certain topics.

Here's Kirk from an earlier comment on making assumptions:

"2. I would appreciate it if individuals could abstain from speculating about what Kirk Durston believes and then posting their assumptions."

On the other hand, individuals would have no need to speculate about what Kirk believes if he just explained it clearly and concisely. Typically, the only time one needs to speculate is when the speaker/writer is maddeningly vague and evasive, as is Kirk when he continues:

"For example, the post about how old Kirk Durston thinks the universe is. There are different exegetical views on Genesis 1 & 2, and I'm not about to bet the farm on any particular view, nor do I feel qualified or knowledgeable enough to do so. The NSS permits it's members to work through their own exegetical approach to the issue. I do want to note, however, that Genesis 1:1 states that 'in the beginning, God created the heavens (cosmos) and the earth and the earth was without form (presumably not yet accreted) and empty. There is no indication whatsoever when this occurred. Therefore, I think it is exceedingly risky for theologians to decide when this happened. As for me, if the origin of the cosmos was 15 billion years ago, so be it."

Um ... wha? So how old does Kirk think the universe is? Scientifically speaking, that is. And, keep in mind, this is an eminently fair question.

If proponents of ID want to be taken seriously, then they have an obligation to demonstrate that they understand and accept the basic findings of science. And one of those basic findings is that the universe appears to be around 15 billions years old.

But notice that Kirk doesn't explicitly take that position. Instead, he makes it clear that he's not taking any position whatever, and simply admitting that, if that's the value, he's good with that.

But that's an amazingly inadequate answer and, besides, one is probably not asking Kirk what he thinks the age is from a religious perspective. Instead, one might want to know how old Kirk thinks the universe is based on his understanding of the scientific evidence, and it's really not acceptable for him to just brush it off with, "Hey, if you want 15 billion, that's fine with me."

But it's not fine with us, Kirk, as it shows an amazing contempt for what the scientific evidence shows. So if someone gets the chance, they might want to put the question directly to Kirk, "Based purely on the scientific evidence, what is your opinion of the age of the universe?"

And if the answer is anything but a firm and unambiguous, "Oh, about 15 billion years," then we have real credibility problems here.

Anonymous said...

Kurt writes: "My point was that there is evidence that the functional information in bacterial genomes is degrading."

Does Kurt mean net? Generically? Universally? Or sporadically? What's the context of that comment? The context certainly matters because the response is either "In some some lineages under a particular set of conditions? Well, duh." or "An irreversible trend seen across all bacteria? Stop the presses!"

And what of the opposite? Aren't there mechanisms that reverse the degradation and increase "information" (such as duplication and horizontal transfer)?

Why not mention some of Howard's other papers on the subject:

H. Ochman, J. G. Lawrence & E. A. Groisman, Nature 405, 299-304 (18 May 2000) "Lateral gene transfer and the nature of bacterial innovation"

J. G. Lawrence* and H. Ochman, PNAS Vol 95, Issue 16, 9413-9417(August 4, 1998) "Molecular archaeology of the Escherichia coli genome"

It seems our little friends, E. coli and S. typhimurium have been sampling megabases of newly acquired sequences since their split.

Anonymous said...

Oops, sorry. I meant Kirk.

Anonymous said...

And what about the fact that almost all proteins evolve not from a random sequence but by exaptation from another protein with another function? That eliminates most of the need for "walking", doesn't it?

Anonymous said...

... and that it [the Bible] was written without error (inerrant) in the original manuscripts. It is the supreme and final authority in all matters on which it speaks."

Could someone remind me again how many books of the Bible are available in the original manuscripts? Thank you.

Torbjörn Larsson said...

Functional information can be defined as the information required to find any sequence that produces the given functional fold. As mutations, indels, etc. occur, sequence space is sampled, those areas that still enable the same fold/function can be preserved in the record of the genomes of life. If just one sequence will work, then the information required to find it is extremely high. However, if 10 sequences will work, then the functional information required to find a working sequence is reduced.

So, bar technical reasons, "functional information" is simply the likelihood for a "a working sequence"? Why not say so?

Oh, I forgot, creationists wants to infer "information" from agency instead of likelihood from process.