More Recent Comments

Friday, August 03, 2012

On the Evolution of New Enzymes: Completely Different Enzymes Can Catalyze Similar Reactions

It's often quite difficult to imagine how a new enzyme activity could have evolved "from scratch." After all, aren't enzymes highly complex proteins with very specific folds? What's the probability of stringing together just the right amino acids by chance in order to get a new enzyme?

In many cases, new enzymes evolve from primitive enzymes that catalyzed similar reactions [see The Evolution of Enzymes from Promiscuous Precursors]. It's quite easy to see how this could happen by gene duplication and there are tons of examples.

But what about the first primitive enzymes themselves? Presumably, they evolved all on their own. When scientists think of this problem, they usually think in terms of evolving a specific modern enzyme. This looks like a long shot, similar to the probability that a specific person will win the lottery tomorrow. What they don't realize is that this is an unnecessarily restrictive scenario.

There are many possible ways of catalyzing a given metabolic reaction. What we see today is the "lucky winner"—the one enzyme that happened to win the lottery. There were many other possible enzymes that could have evolved and that makes the overall probabilities much more reasonable. There could be a million structurally distinct proteins capable of catalyzing a particular reaction. What you should be thinking about is the probability that any one of those possible enzymes will evolve and not the probability that a specific enzyme will evolve. It's like calculating the probability that anyone in a large city will win the lottery—a much more reasonable number.

Is there any evidence that many different enzymes can catalyze a similar reaction? Yes, there are plenty of examples of completely different enzymes, existing in different species, that catalyze the same reactions. Some of these examples are very well known, such as the two different aldolase enzymes of the glycolysis pathway [Aldolase in Gluconeogenesis & Glycolysis]. That example is covered in every biochemistry textbook.

There are also examples of parallel evolution in the citric acid cycle. At the beginning of the cycle, for example, there are two completely different enzymes catalyzing the formation of acetyl-CoA [Some Bacteria Don't Need Pyruvate Dehydrogenase]. Several other reaction in the pathway are catalyzed by different enzymes in different species of bacteria.

This suggests that early forms of life evolved several different enzymes for the same reaction although one of them might have taken over because it was more efficient (or lucky). Eugene Koonin calls this non-orthologous gene displacement (NOGD) and it's one of the reasons why the set of genes common to all species is surprisingly small [The Core Genome].

The reason I decided to write about this is the discovery of some new enzymes in cyanobacteria. Cyanobacteria are photosynthetic bacteria that have the same complex photosynthesis pathway as algae and plants. In fact, the chloroplasts of algae and plants are derived from cyanobacteria.

It's been known for a long time that cyanobacteria are missing one of the common enzymes of the citric acid cycle. It's enzyme #4 in the figure at the top of the post. The common name of this enzyme is α-ketoglutarate dehydrogenase but nowadays it's called by its more formal name in the scientific literature: 2-oxoglutarate dehydrogenase. Cyanobacteria also exhibit low levels of enzyme #5 in the pathway; succinly-CoA synthetase.

Together, these two enzymes catalyze these reactions.


A recent paper by Zhang and Bryant (2011) reports on the discovery of two new enzymes in cyanobacteria: α-ketoglutarate decarboxylase and succinic semialdehyde dehydrogenase. Together these two new enzyme catalyze the conversion of α-ketoglutarate to succinate. What this means is that cyanobacteria can complete the citric acid cycle using two enzymes that are completely different than the ones used in most other species. It's another example of parallel evolution.


This is one more example of the evolution of different enzymes catalyzing the same reaction. It's further evidence that the earliest forms of life may have evolved lots of different enzymes with similar functions and it suggests that the specific enzymes we see today are just the lucky ones that arose first. There were many other possible enzymes that could have done the job just as well.


[Photo Credit: Cyanobactreria]

Zhang, S., and Bryant, D.A. (2011) The tricarboxylic acid cycle in cyanobacteria. Science 334:1551-1553. [PubMed] [DOI: 10.1126/science.1210858]

76 comments :

Barbara said...

Really cool!

Atheistoclast said...

Don't try and weasel out of this one, Larry.

The first paragraph still holds no matter how much you want to claim that different enzymes can end up performing similar reactions. Just to follow up, there 3.78*10^97 ways of forming a single protein domain consisting of 75 residues from all 20 amino acids. But only a tiny fraction are likely to prove functional in any way. That what the theory of evolution is up against when accounting for the elusive origins of these peptide sequences.

Mikkel Rumraket Rasmussen said...

Not this dumb shit again. Functional you say, functional in what fucking environment? To state that only a tiny fraction are going to be functional is to state you have an absolute knowledge of all possible intra and extra-cellular invironments. Further, what may be a misfolding, non-functional piece of junk in an E. Coli at 37 degrees C may be vitally important highly functional enzyme in a bacteria at a hydrothermal vent and 70 degrees C.

There is no such thing as an absolutist view in terms of function at protein sequence space. Take a simple example like a protein that binds a specific sequence of DNA. If that protein simply has to prevent transcription, if that is it's "function", then an almost inconcievable number of possible proteins can do the job, because there's an almost inconcievable number of possible combinations of bases in some given stretch of DNA whereto the protein binds. And don't even get me started on protein-to-protein binding spots.

TL/DR: You're talking bullshit again clastie.

Atheistoclast said...

In any environment. A string of 100 glycine residues is going to be junk no matter what. Why do you think peptide sequences are evolutionarily conserved?

I'll tell you, shall I? Because any variation in the amino acid characters is limited due to functional constraints. And,yes, correct folding is an important factor. The enzymes in living organisms are not made of highly specific arrangements of amino acids that cannot be produced by chance - as Larry correctly states. Deal with it!

Allan Miller said...

A string of 100 glycine residues is going to be junk no matter what.

You sure?

http://pubs.rsc.org/en/Content/ArticleLanding/2011/SM/c1sm05726j

and

http://gozips.uakron.edu/~mattice/ps674/helix.html

Repetitive acids form some of the basic scaffolding modules out of which many proteins both catalytic and structural are made. Once a module exists, it can be duplicated and inserted, ad lib. It's not just a series of hopeful pulls on an amino-acid one-armed bandit.

Richard Edwards said...

That's really interesting. Do chaperones have any tendency to increase the likelihood of different protein sequences to converge on similar structures? (I know that they can buffer the affects of mutations.) I'm not suggesting that they somehow squash completely different sequences into the same structure but wondering whether they act to reduce the total structure space that the sequence space feeds into?

chemicalscum said...

Listen IDiot any random string of amino acids forming a peptide will have some enzymatic activity for a wide range of reactions. These activities will be very very low compared to modern functional enzymes. How for cells in the early biotic environment of 4M years ago (you know 4M-6K years before the creation and long before humans invented your god)such low level activities were metabolically valuable. If the enzyme is replicated by a nucleic acid template, then any point mutation in the template that produces a peptide with a small improvement for a specific metabolic function that is enough to enable the cells to reproduce faster than before, will give that cell a selective advantage. Natural selection will operate on this. Slowly bit by bit over deep time modern highly specific and highly active enzymes will evolve. Now go away.

Anonymous said...

Is there anyone on the Internet that is as wrong as often as Atheistoclast? My god, it's impressive.

Diogenes said...

But Atheistoclast is invoking what I call the "fallacy that fallacies don't matter." He's already been informed that it's a fallacy to compute the probability of a pre-specified outcome (this is sometimes called the Lottery Winner fallacy, or the bridge hand fallacy.)

What you have to do is, you have to compute the cumulative probability over all possible peptide or protein sequences that have any function, not this particular function.

And yet, while he knows this is a fallacy, still Atheistoclast responds with a typical bullprob calculation:

3.78*10^97 ways of forming a single protein domain consisting of 75 residues from all 20 amino acids

Which has so many problems, I don't know where to begin. That number isn't even right even if it was correct to pre-specify a highly specific enzymatic reaction. Even in the case of modern enzymes of specific functions, you can mutate maybe two-thirds of their amino acids without affecting function (this varies from protein to protein). So even for a pre-specified function, that's way, way off.

And of course it's Lottery Winner fallacy to pre-specify the enzyme function. To sum up, you need to compute cumulative probability over:

1. All functions that are beneficial
2. All sequences that produce each function from 1.

Which are huge corrections to Atheistoclast's bullprob.

Moreover, it's completely arbitrary that he pulled out 75 residues for the length of his protein. Protein domains can be functional with 35 residues.

Moreover, proteins are often built up from repeating motifs or elements, like secondary structure elements. A four-helix bundle fold or a beta barrel are all made out of repeating secondary structures. They could be assembled from many copies of short peptides. So even 35 is too high a threshold for a peptide of minimal function.

Atheistoclast knows that his bullprob is based on fallacious arguments. But he invokes the final fallacy, the "Fallacy that fallacies don't matter."

Sure, his probability is off by an astronomically huge number. But-- it's very small! So, a very small number must disprove evolution, right? Even if it's computed with astronomically large errors built in.

Very bad math doesn't matter when it gives big numbers!

Billy said...

Why do you think peptide sequences are evolutionarily conserved?

(Rolls eyes and facepalm)

Because they are related by descent?

I'll tell you, shall I? Because any variation in the amino acid characters is limited due to functional constraints

Tell that to a colleague who sequentially mutated the first 40 aa s of IKBa to alanine. Only 4 mutations had any effect on function

Atheistoclast said...

Diogenes, let's do some math then!

Let us take your protein domain consisting of 35 residues. Let us also make it variable such that for every residue, 4 amino acids can be substituted there but with no effect on function. The probability of finding a functional sequence at random then becomes the inverse of: (20/4)*35 = 2.91 * 10^24.

As for the variability of enzyme sequences, you are correct. Up to 2/3 of the amino acids of the Jingewei alcohol dehydrogenase gene (derived from retroposed Adh) have mutated. But the core reactive chemistry consisting of about 90 residues has been stringently preserved. Hence, I am justified in using the example of conserved protein domains.

It is true that within protein motifs, there are sub-motifs. But they all have to synergistically function together. You can reduce the function of a protein only so much.

Atheistoclast said...

If peptide sequences were so variable in their amino acid characters, we would see big differences between species. Why is something like the homeodomain ,or actin for that matter, evolutionarily conserved to the extent that 80% of the sequence is the same in all taxa? I find it very difficult to believe that a sequence can tolerate substitutions up to 90%.

Billy said...

I find it very difficult to believe that a sequence can tolerate substitutions up to 90%.

What you have trouble believing is irrelevant. The fact is, the experiments have been done and that is the result.

Atheistoclast said...

Did your "colleague" publish his "finding"? Was this some in vitro test that has no real-world connection?

Allan Miller said...

Why is something like the homeodomain ,or actin for that matter, evolutionarily conserved to the extent that 80% of the sequence is the same in all taxa? I find it very difficult to believe that a sequence can tolerate substitutions up to 90%.

These are not catalytic proteins, enzymes, but regulatory/structural. Such proteins are typically more highly conserved. Among the most conserved of all, for example, are histones. Both they and the homeodomain have fundamental roles in DNA management. Once established, and further functions develop in their presence, there is very little scope for amendment without fatally disabling DNA maintenance. They were not necessarily tightly specified on Day 1, but have become central and conserved. Likewise actin, a fundamental cytoskeletal component.

There is much more lability possible with catalytic enzymes, generally. But not to the point where descendants should have obliterated all trace of ancestry, which is a separate signal you are confusing with constraint. As in "If peptide sequences were so variable in their amino acid characters, we would see big differences between species."

Billy said...

Why do you use colleague and finding in inverted commas?

Negative results like this are of no general interest, so tend not to be published (which means there are more examples like this that you will never know of). It is however in his thesis (Control of NF-kappa B Dependent Transcription by I Kappa B Alpha and P53, University of St Andrews), which effectively has been peer reviewed. You can even request a copy of it if you wish, but I doubt you'll read it.

The 36 non disruptive mutants all worked in cells and showed identical behaviour to wild type protein - so yes, it had "real world connection"

Billy said...

These are not catalytic proteins....

Actually, 20% is a big difference. I wonder if the creationist can tell us how long it would take to acquire that level of mutation? What he also omitted to mention was that the homeodomain can bind DNA nonspecifically too (as can lots of DNA binding proteins).
A quick BLAST search on the homeodomain sequence reveals that there are not hits available for all taxa, so either he is using wikipedia or just making stuff up - like the stuff others refuted above

Atheistoclast said...

True enough. But you will find that the really important part of enzymes - the active site located in the catalytic cleft - is even more conserved than DNA-binding domains. Like I say, take the case of the Jingwei gene that codes for an alcohol dehydrogenase and see for yourself.

Billy said...

As in Jingwei the protein that is the result of a gene fusion event and acquired different functions because of this?

Mikkel Rumraket Rasmussen said...

Hence, I am justified in using the example of conserved protein domains.
No you aren't because your conserved domain constitutes a "hilltop" selection originally had to either climb to from some nearby valley(meaning that the reason it's conserved is that it's best at the job, not that it's the ONLY possible choice), or there's an alternative, possibly entirely different and unrelated catalytic activity "nearby" from which the extant function could be reached by drift.

It is true that within protein motifs, there are sub-motifs. But they all have to synergistically function together. You can reduce the function of a protein only so much.
Yes and then some of them will become non-function in their extant environments, or as they lose their original activity they gain a new one.

The recurrent theme in ALL your posts is that you keep assuming an absolutist view of protein sequence space where if the current function is broken, then the protein is completely dead and useless. This is wrong and everyone here has been telling you how and why now several times. I've been telling you this for ages, they've been telling you this for even longer on talkrational and you've been ignoring it on pandasthump for at least as long. Stop being so fucking dishonest.

Richard Edwards said...

Atheistoclast, why stop at actin? Ubiquitin is almost 100% identical across all eukaryotes. Clearly, then, NO substitutions are ever allowed.

Proteins vary wildly in their level of functional constraint and it is very rare for any protein to be 80% conserved "across all taxa". Indeed, many human proteins aren't even 80% conserved across all mammals. (Once you go out to all vertebrates, most are well under 80% identity - and that represents maybe 1/7 of life's history.)

Obviously, if you only look at the "important parts of enzymes" those will be more conserved - but how much is important? Often, for enzymatic activity, only a handful of residues are key for activity and the rest of the conservation is structural - and structures can be conserved across vastly divergent proteins at the sequence level to the extent that sometimes you only realise they are homologous when you can align based on structure.

The thing that really seems to constrain sequence evolution a lot is protein-protein interactions, as these frequently involve a lot more sites that small molecular interactions. Very highly conserved proteins - such as ubiquitin and actin - are probably so highly constrained because they interact with so many proteins. Something that does not - or has recently evolved from non-coding DNA - would not have this level of constraint. Neither to largely unstructured proteins. It's analagous to why the Genetic Code is largely fixed - any changes would adversely affect too many things at once.

Richard Edwards said...

1. Maths where you pluck assumptions from thin air is pretty meaningless.

2. In your toy example, I think you mean the probability of finding an identically functioning sequence, which is very, very, different to finding "a functional sequence" (i.e. any). (See comment above that you are "replying" to!)

3. One example protein does not extrapolate to all proteins. In fact, to have any weight, you need to take the LEAST constrained enzyme we know about and extrapolate from that. You can't point at a 5' person, do some rough maths and conclude that it is impossible for NBA players to dunk. (Well, you can but guess how you would be received.)

4. Why does the random sequence have to appear fully functional?

5+. What everyone else has said. Just read Rumraket's comment and realised I am probably wasting my time...

Allan Miller said...

The recurrent theme in ALL your posts is that you keep assuming an absolutist view of protein sequence space where if the current function is broken, then the protein is completely dead and useless.

A common feature of such arguments, trotted out and refuted ad nauseam from Hoyle onwards, is that people seem to mislead themselves by their mental model of 'protein space'. Protein and nucleic acid monomers are given letters, we know how unlikely it would be to get long meaningful English sentences to mutate into others, or arise from scratch, ergo evolution impossible. Someone has to write the sentences, and they have to be spelt rite.

But there is no requirement that the notional spaces containing all possibilities of n-letters-from-v-variants have any relationship in terms of their density of 'well-formed strings', nor the clustering of such strings. All such spaces have v^n positions, and the probability of randomly hitting a particular one is 1 in v^n. Simply by increasing v or n, one can increase the post-hoc wonderment massively. The chance of hitting that sequence is so tiny that if you wrote all the zeroes out ... yadda yadda yadda. The real statistic of interest is the proportion of 'well-formed strings' in the whole space. And of relevance to evolution is the local clustering of other functional strings. Who gives a shit if most of long-string protein space gives an amorphous blob?

If there were one amino acid in the world, and you picked n 'randomly', the chance of getting a specific string n bits long would be 1. But what use is such a peptide? Well, as Atheistoclast inadvertently illustrated, polymeric acids have properties. They can perform structural roles. Add an amino acid to the library, and you have 2^n possibilities. Another makes 3^n, and so on. These spaces rapidly get bigger, but they are never explored by lottery. So how small was the first peptide with catalytic functional value to the organism that made it? How much catalytic function (of some kind) was there in its neighbourhood?

That is, of course, the $64,000 question(s). But determination of phase space based upon a 20-acid library - because modern proteins incorporate 20 acids - is misleading. How does adding acids to the library - making v^n bigger - make evolution less likely?

Atheistoclast said...

The point I am making is that the core reactive chemistry behind enzymes is always conserved in terms of the amino acid characters. We see this in even genes that have undergone "rapid evolution" such as Jingwei. Yes, sometimes just one residue is key, but the other residues provide the context that allows it to behave in the way that it does.

Thanks for pointing out the case of ubiquitin. Yes, it is ultra-conserved in all eukaryotes and protein-protein interactions do indeed constrain evolution. But you still need to explain how such a highly specific sequence of around 75 residues could have emerged. Are you really suggesting it was fortuitously fashioned from non-coding DNA?

Atheistoclast said...

1. The Math is good.

2. Well, in order to find any functional sequence, we just need to consider the number of all known peptide sequences for a given length. That lowers things somewhat, but not where it becomes anything other than highly improbable.

3. You still need to be able to account for the evolutionary origination of each and every specific functional sequences, not just one in general.

muhaha.

Richard Edwards said...

No. Read the comment of chemicalscum. Do you even understand how evolution works? Or how many millions of years (and generations and populations) it has been going to reach its current state?

Allan Miller said...

How essential is ubiquitin's sequence for its function? It's a molecular label. It has generic function, not specific. But it is now embedded deep within cellular mechanisms, so options for non-catastrophically changing that label are very limited. But how many other 75-residue peptides could have done a similar job, at the time it first arose? I don't know, but neither do you, yet your demand is equivalent to claiming that unless someone can show the number was x, it must have been 1.

It's a bit like like http://, although not entirely as arbitrary. Try changing that tag and see what happens to your web connectivity. "But it's highly conserved. Explain how that one string out of 45^7 possibilities could have emerged".

Once again, you are picking fundamental structural/regulatory proteins and arguing that their constraints can be applied to the world of enzymes.

Richard Edwards said...

1. The maths is not good.

2. The chances of stumbling across a handful of residues in the right conformation for activity is nothing like the scenario you present. I study convergently evolved protein motifs and they occur a lot. (These are interaction motifs and not catalytic sites but a lot of functionality is driven by protein interactions so it is still relevant.) In fact, the longer a peptide sequence gets, the higher the chance that it will contain a small subset of functional residues. If this is beneficial, selection and imperfect replication can do the rest.

3. No, I do not. I need to demonstrate that, in principle, your objection is flawed. I'm not even sure I need to do that. You need to demonstrate how an alternative explanation of the observed data makes more sense. Any sense would be a start.

The good news for Larry, I guess, is that if a Creator god or Intelligent Designer exists, it must really love biochemists. Designing some similar proteins with different (or promiscuous) function and some very different proteins with similar function can only be driven by a desire to make sure that to be sure of protein function, we have to do the biochemistry and we cannot just tell by looking at it. You should definitely remove the "Intelligent" from ID - such a random and haphazard approach is not intelligent design. (Unless it was a committee working to a tight deadline who started cutting corners at the end.)

Richard Edwards said...

Oooh, I like the http analogy. (Very timely too, with Tim Burners-Lee at the Olympic opening ceremony.) I'll be using that one in future.

Mikkel Rumraket Rasmussen said...

Nice http:// analogy Alan, I'm going to use that in the future :)

It's really spot-on. It's not that there's something special and unique about the symbol combination h-t-t-p-:-/-/ that makes it possible for it to do it's "job", that's not the reason it's "universally conserved". It simply got there first and going back and trying to change it now would require reworking so much underlying software to get it all working again.

Atheistoclast said...

1. Then what's your math? Even getting 6 residues in position together is on average 1 in 20^6, i.e 1 in 64 million. If a methionine or tryptophan amino acid is involved, this could be higher.

2. And it isn't just a matter of having a few key residues in the right combination. They need to be in the right part of the sequence, particularly given how proteins fold. You are being overly reductionist in your thinking. You need to consider context specificity and how everything relates to each other.

3. You fail to understand that life needs to be robust. If an enzyme can do the work of another, should there be some problem, than that is a good thing. Understand redundancy.

Larry Moran said...

Is there anyone on the Internet that is as wrong as often as Atheistoclast? My god, it's impressive.

lol

Maybe we should start ranking the IDiots using the Bozorgmehr scale where you get a score of 10 if you are wrong as often as he is.

I think there are some Idiots out there who might get a 9.5 or even a 9.6 but that's as close as they get.

Larry Moran said...

Chaperones can affect the RATE of folding but not the final shape of the protein.

Larry Moran said...

@Joe Bozorgmehr,

I know I'm probably wasting my time, but do you have an explanation for the origin of proteins? Please share with us your views on how the first proteins formed in primitive prokaryotic cells about 3.5 billion years ago.

Richard Edwards said...

I thought they also helped to prevent mis-folding? E.g. http://www.ncbi.nlm.nih.gov/pubmed/19494908

Atheistoclast said...

Larry, I assume the problem of the primordial origins of proteins is something best left for the origin of life research. LUCA is estimated to have had at the very least 250 genes, with most of them coding for proteins. Frankly, speculating on this elusive subject is well beyond my ken.

If you want to know how new classes of proteins, and in particular new protein motifs, originated then I have a paper right now under review which discusses this. I believe that these domains can be easily created through directed evolution by artificial selection, involving a lot of intragenic recombination and gene flow within a population.

Mikkel Rumraket Rasmussen said...

Is there anyone on the Internet that is as wrong as often as Atheistoclast? My god, it's impressive.
You have to love the 100xglycine can NEVER be functional blanket blind assertion.

Then Allan digs up a paper with proteins with over 200 repeats of glycine. This is comedy.

I like the idea of the Bozorgmehr scale of CreIDiocy, abbreviated BOZO. "David Klinghoffer has 7.3 BOZOs"

Anonymous said...

Yes, but preventing misfolding is about preventing the protein from aggregating, and/or from sticking into some local minima. Not about different sequences converging to a structure determined by the chaperone.

Diogenes said...

@Atheistoclast-
How many times do we have to repeat this: every part of your math is bad? And if just one part of your math is bad, we don't care a bit what your bullprob value is?

What you are doing is the typical creationist trick of taking the conservation of one kind of protein (small, DNA-binding) and then asserting that level of conservation applies to all proteins, even very big ones, even ones that don't interact with big substrates like DNA-- which is invalid for many reasons-- for one thing, experimental evidence shows that no big proteins are conserved as highly as small DNA-binding proteins (here by "as conserved" I mean the fraction of conserved residues / all residues.)

As I already explained, the fraction of conserved residues in proteins varies a great deal from one protein to the next.

It's apples vs. oranges. You can't compare small protein domains to big ones, and you can't compare proteins that interact with big things (like DNA) to proteins that interact with, say, a sugar or a calcium ion.

Moreover, it's completely vague what evolutionary step or steps Atheistoclast is computing the probability for. The emergence of the first protein? OK, then you must sum over all sequences that would confer all functions that would be beneficial to all forms of self-replicating chemistry.

But he's applying the level of conservation level seen in highly specific proteins some 550 million years after the Cambrian explosion, and saying that level of conservation must have existed in the first protein that was beneficial to a self-replicating system-- in what? The RNA world, or what?

A small protein that interacts with a big substrate (e.g. DNA) may be highly conserved. But a big protein that interacts with a small substrate will be mostly variable.

Let us also make it variable such that for every residue, 4 amino acids can be substituted there but with no effect on function.

Nonsense-- most proteins are not conserved at every single residue, and in fact I don't know of any 35-residue proteins that are conserved at every residue, except those which interact with very large substrates like DNA, and maybe those which interact with multiple proteins like SH3 domains. (The reason why is explained below.) If I'm wrong give me a counter-example with a reference to a paper showing a sequence alignment. I'd like to see the sequence alignment.

Let's be clear that I'm using standard nomenclature here.

Non-conserved - any residue will do at that position
Conserved - only certain residues will do
Invariant - only one residue will do at that position

So Atheistoclast was wrong to start out by assuming that all residues in a 75-residue protein were invariant. Now, he is wrong to assume that every single position in a 35-residue protein must be conserved to a subset of 4 amino acid types. But such proteins are almost unheard of-- except those which interact with very large substrates like DNA, and maybe those which interact with multiple proteins like SH3 domains.

While such proteins exist, they're rare, and there's no reason to believe the first proteins would necessarily have to have that property.

As I already mentioned, size of protein vs. size of substrate will affect the fraction of conserved residues.

Diogenes said...

Another factor is the overall size of the protein domain. Why? The fraction of surface residues vs. core residues will affect level of conservation. In a small protein, there will be a high surface-to-core ratio. In a large protein, the ratio of surface-to-core will be lower.

A. Residues on the surface that do not interact with substrates are least likely to be conserved.

B. Residues on the surface that interact with substrates are most likely to be conserved.

C. For residues buried in the core, it's more complicated but if they're in a salt bridge (paired charges), they're highly conserved. If a changes produced large changes in volume of sidechain within the core, it will not be allowed.

Anyway, Atheistoclast is wrong to assume that 100% of all residues in a protein will be conserved.

Atheistoclast is also wrong to take the conservation level of small proteins that interact with big substrates (like homeobox) and asserting that that applies to all proteins, including the earliest proteins.

Many experimental results make hash of Atheistoclast's claims of 100% conservation in proteins.

No one here has yet mentioned Wells' research into alanine scanning mutagenesis of protein-protein binding sites. Wells systematically replaced every single residue on the surfaces of protein-binding proteins with alanine (basically amputating every side chain, one by one) and typically he found that only one or two residues-- the "hot spot"-- were responsible for most of the binding energy.

I would like a reference to Jingwei alcohol dehydrogenase, because I simply do not believe that one protein domain of an enzyme can have 90 invariant residues, as claimed. I've seen hundreds of sequence alignments and I've never seen that. I'd like a reference to a paper containing a sequence alignment.

Anonymous said...
This comment has been removed by the author.
Allan Miller said...

1. Then what's your math? Even getting 6 residues in position together is on average 1 in 20^6, i.e 1 in 64 million.

Do you find that people just end up telling you where to go? I'm sure someone has been through this. Texas sharpshooter fallacy. And ... ratcheting. A more detailed response depends what you are using as your evidence of specificity - evolutionary conservation or actual experimental substitution and assay. You'd have to be more specific.

Just pulling that 20^n rabbit out the hat - numeroproctology, as someone memorably called it on talk.origins

2. And it isn't just a matter of having a few key residues in the right combination. They need to be in the right part of the sequence, particularly given how proteins fold.

Yes, incredible, isn't it? So the functional density of the region of sequence space accessible by all known means of mutational modification to peptide x is ... ?

3. You fail to understand that life needs to be robust. If an enzyme can do the work of another, should there be some problem, than that is a good thing. Understand redundancy.

Should there be a problem, the usual response is that the cell dies, and others that don't have a problem carry on regardless. Redundancy needs to be regularly exercised to beneficial effect to retain its place in the genome - otherwise it would just decay, effectively neutral.

Frankly, speculating on this elusive subject is well beyond my ken.

You aren't usually so coy!

The whole truth said...

atheistoclast,

Are you claiming that a supernatural designer-god is actively directing evolution by tinkering with proteins, recombination, gene flow, etc.?

You said that speculating on primordial origins of proteins is well beyond your ken, but haven't you heard that "ID is all about origins"? That's what joe g says, and he claims to know all about ID. It's also what you say, whether you'll admit it or not. Everything you say boils down to questioning or asserting ultimate origins. You use a lot of sciency sounding words but what you're really saying is 'You evolutionists can't prove where everything came from originally.'

For instance, this comment of yours:

"3. You still need to be able to account for the evolutionary origination of each and every specific functional sequences, not just one in general."

In other words, 'Evolutionists must show the ultimate origin of the very first thing that ever happened and absolutely every single teeny tiny step that has occurred since.'

And of course it's obvious that you are really really really trying to find gaps that you can force fit your chosen god into, and especially the ultimate origins 'gap'.

Richard Edwards said...

But therefore they DO affect the final shape of the protein? By promoting stable folding, are chaperones not biasing the structure space towards globular folds, thereby increasing the likelihood that a protein of a given sequence will adopt certain conformations over others?

Whether or not this has a big impact on the probability that sequences can converge on function, the other thing that Atheistoclast and friends always seem to forget is that proteins are dynamic entities that are often frequently changing conformation. They are not rigid structures stuck in a single pose - the fact that we get this in crystal structures is just an artefact of growing crystals. The more a flexible a given protein is, the more it will explore its local structure space and the more likely it is to bind a substrate. Many proteins then change conformation and/or are stabilised upon binding.

The idea that a monomeric protein would have to stumble upon an exact shape is just nonsense.

Atheistoclast said...

All I am prepared to say is that directed evolution by artificial selection overcomes the problem of a rough fitness landscape within sequence space. I refuse to speculate on who or what the director or artificer is. But I have my suspicions.

Anonymous said...

If by affect you mean avoid local minima for "optimal" minima then yes. But not the same as making different proteins converge into a single structure.

Agreed about structure. Measurements of permanence time for water molecules buried well within protein structures are expressed in milliseconds.

Atheistoclast is just an imbecile who think that research consists on reading wikipedia, a few abstracts, especially if it is old scientific literature, and looking at the sentences rather than at the data, while avoiding at all costs looking at articles, and the data, that prove him wrong. If he is truly a PhD student, then the educational system is collapsing. We have too many incompetents in graduate school.

Atheistoclast said...

@Diogenes:

I never said that all residues in a peptide sequence will be conserved! That is hardly ever the case. But it is true that critical sub-motifs of between 3-6 residues tend to be ultra-conserved.

As for Jingwei, I suggest reading these papers:

Wang, W.; Zhang, J.; Alvarez, C.; Llopart, A.; Long, M. The origin of the Jingwei gene and the complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster. Mol Biol Evol 2000, 17, 1294–1301.

Zhang, J.; Yang, H.; Long, M.; Li, L.; Dean, A.M. Evolution of enzymatic activities of testis-specific short-chain dehydrogenase/ reductase in Drosophila. J Mol Evol 2010, 71, 241–249.

Any BLAST search will show that 92 of the original 272 residues of retroposed Adh have been conserved.

Richard Edwards said...

"Any BLAST search will show that 92 of the original 272 residues of retroposed Adh have been conserved."

Are you saying that if you compare these two Drosophila proteins, they share 92 identical residues? This just means that 92 residues have not yet changed. Observed "conservation" is determined by both functional constraint and TIME. Just because 99% of a human protein is conserved in chimp, that does not mean that 99% of that protein is under functional constraint.

It is a large (and erroneous) jump from the observation that a pair of homologues share 92/272 amino acids to stating that "the core reactive chemistry consisting of about 90 residues has been stringently preserved". (From your previous use of "preserved", I am assuming that by "conserved" you mean totally conserved, i.e. invariant.)

Richard Edwards said...

Sorry, I did not mean to imply convergence on a single structure. I simply meant that if chaperones promoted a certain type of folding, it might increase the tendency for different sequences to adopt similar kinds of structures (e.g. globular folds), which might in turn increase the likelihood of functional convergence. The increase might be trivially small and irrelevant, though. (I don't know that much about protein folding, to be honest - the bits of proteins I work on tend to be intrinsically unstructured.)

Anonymous said...

So you are another victim of Keith's! (An indirect one it would seem.) :-)

Richard Edwards said...

I suppose I am - although I would say "beneficiary" rather than "victim". :op My training was in Genetics rather than Biochemistry so I have always had more of a general focus on how sequence relates to function rather than any universal notion of structure=function. It never even occurred to me that people would be resistant to the notion that parts of proteins - particularly the ends and the linkers between globular domains - were unstructured or highly flexible.

I suppose the idea of fully intrinsically unstructured proteins is a bit more surprising, given the importance of the Unfolded Protein Response in stress etc. but then these are proteins with very different properties to an unfolded globular protein. Whatever the underlying biology, the concept is certainly very useful in the field of protein-peptide interactions. Predicted disorder filters are very effective at improving the prediction and detection of short, linear motifs in proteins.

The whole truth said...

atheistoclast said:

"All I am prepared to say is that directed evolution by artificial selection overcomes the problem of a rough fitness landscape within sequence space. I refuse to speculate on who or what the director or artificer is. But I have my suspicions."

So, you have your "suspicions", eh? And you say that it would be speculation for you to reveal your suspicions, and that you refuse to do that, even though you speculate about plenty of other things.

Are you afraid to reveal your "suspicions" about who or what the alleged director or artificer is because you believe it's the imaginary christian god or some other imaginary god and your real agenda will then be apparent? Do you really believe that it's not apparent already?

Why do you IDiots hide behind a sciency sounding facade? Why are you so afraid of honestly revealing your actual beliefs and agenda? Why do you all play immature, dishonest, sneaky games and why do you think that anyone with a clue will buy the snake oil you're trying to sell?

You and the other IDiots/creationists can argue against science or evolution or certain aspects of science or evolution until the end of time but you'll never get respect from rational, intelligent, scientifically literate people because you're fundamentally dishonest, and it shows. Your biggest mistake is thinking that you're fooling anyone with a clue.

It's hard to think of anything more maddening than trying to deal with people who constantly lie and/or try to hide their religious beliefs and agenda, and especially when those people go on and on and on about how they have the corner on good morals.

And even IF you or another IDiot were right about some particular thing regarding evolution you'd have a hard time convincing anyone and you should understand why. Rational, honest people don't trust or respect liars and sneaky game players.

If you IDiots want respect you have to earn it, and that means that at the very least you'll all have to be open and honest about your beliefs and agenda at all times. Even if you were to do that you still may not get respect but you might have a chance at it. If you IDiots keep up the sciency sounding but completely dishonest and bogus 'Intelligent Design inference/hypothesis/theory' crap (which is just creationism relabeled) you'll be thought of and treated as lying, arrogant religious zealots who will say and do anything to destroy science and control the world.

Allan Miller said...

All I am prepared to say is that directed evolution by artificial selection overcomes the problem of a rough fitness landscape within sequence space.

Fitness landscapes and sequence space don't map very comfortably onto each other. But let's try it. Picture a sequence space with one pin-prick - a current functional peptide. It holds its place by being exactly replicated. Now turn on the lights in the rest of sequence space. Every member of the space that performs the current function to some extent is illuminated. Now colour the lights in relation to the selective advantage wrt the starting peptide - shades of red for worse, shades of green for better, white for neutral. What's the pattern? You, I suspect, see a set of isolated dots, separated by vast swathes of empty space, and most of those dots are red - even if hit, fitness would be lower. But on what grounds?

The accessible space could as readily be a wonderland of green, ripe for exploration. And if our starting peptide had simply happened to stumble into this region of function, that is a more likely scenario. A leap from its current position to a 'green' spot would instantly reduce the amount of green in the landscape, and turn up the red. As the green region inexorably shrinks, the landscape gets more patchy - more rugged, in the more conventional geographic model. The peptide is getting as good as it is likely to. Most amendments make it worse - and especially those to certain key sites. It still changes, by exploration of the white area, which may lead it to stumble into a more distant region of green. And of course, the colouring is never static - advantage is always in a state of flux, depending on what is happening 'out there'.

And my long-winded point would be: you can't determine the actual structure of the space just by thinking about it. Your designer would need to be able to mentally fold the proteins, determine their role within the wider cellular biochemistry and the world at large, determine the advantage that would accrue, ensure that a population of such amended individuals built up, and deal with the ecological consequences of the change ... a huge computational task, and an awful lot of trouble to go to, for a job that mutation/recombination and selection/drift are entirely capable of achieving all by their little selves.

The ultimate 'objective' is survival in the environment. Letting the environment choose between variants is a good design approach to that ... ummm ... goal.

Atheistoclast said...

@the whole truth:

I don't want to play games. What I personally believe, whether I am an atheist or a fundamentalist, doesn't matter.

Some might claim that there is a self-organizing principle within Nature that can account for any creativity. Others prefer a more supernatural explanation. Others still insist that the laws of physics and chemistry are sufficient to explain all novelty. But that is a philosophical debate which is tangential to the issue at hand.

But I see no evidence that differential reproduction and chance can alone explain the enormous degree of complexity and specificity evident in molecular biology and that some other explanation is required. That is perfectly reasonable.

Atheistoclast said...

@cabbagesofdoom:

It isn't just jingwei. It is other genes of retroposed origin such as Adh-Twain and Adh-Finnegan that share these 90 or so residues - most of which are clustered together. If you bother to read the papers I cited, it is clear that these residues contribute to the core reactive chemistry of the alchohol dehydrogenase/reductase. Tests with dN/dS reveals that this region has indeed been subject to purifying selection.

Richard Edwards said...

OK, perhaps I am confused by what you are saying. The SDR family to which jingwei and other insect ADH proteins belongs does not show anywhere near this level of conservation. (See http://pfam.sanger.ac.uk//family/PF00106.20.) I presume you are referring to some more specific chemistry? (Is that still "core"?)

You are right, though, I should read the papers and see what they actually say. (They're not Open Access and I don't have VPN set up on this machine, unfortunately.)

Either way, from the abstract, the family seems to be nice example of protein evolution under the standard "duplication" model, where a protein duplicates and then specialises. (I can't tell from the abstract whether this is a case of pre-duplication promiscuity or post-duplication neofunctionalisation. Larry had a nice post about this a couple of days ago.) As a result, it has very little to tell us about de novo origins of new enzymes, which is what the discussion appeared to be about. If this is not what you are talking about then perhaps this accounts for the confusion.

Richard Edwards said...

OK, I tried looking at: Zhang, J.; Yang, H.; Long, M.; Li, L.; Dean, A.M. Evolution of enzymatic activities of testis-specific short-chain dehydrogenase/ reductase in Drosophila. J Mol Evol 2010, 71, 241–249.

It does not directly support any of your statements. Indeed, Table 3 indicates that the protein has been subject to quite strong positive selection in the D. yakuba lineage (dN/dS > 2.5) and that - due to the relative youth of the duplication - there have not actually been that many substitutions.

Perhaps you would be gracious enough to explicitly state which section/figure of which paper leads you to conclude that "the core reactive chemistry consisting of about 90 residues has been stringently preserved"? I'm afraid that the current level of clarity and understanding you have demonstrated does not incline me to go hunting to try and work out what you actually mean.

Atheistoclast said...

@cabbagesofdoom:

Just so we are referring to the same paper:

http://yakuba.uchicago.edu/longlab/sites/default/files/Evolution_enzymatic_JME.pdf

You will just have to do the sequence alignment yourself but it is available on page 8, Figure 3 of my own paper:

http://onlinelibrary.wiley.com/doi/10.1002/cplx.20365/abstract

From the Long paper, I provide the following excerpt beginning on page 247:


Our analyses show that like all SDRs, JGW, and its parental Drosophila ADH share a common protein structure with a typical Rossmann fold containing a canonical GlyXaaXaaGlyXaaGly NAD?/ NADP? binding motif, a conserved aspartate residue (Asp38) that regulates coenzyme binding, and a Ser-Tyr- Lys catalytic triad (Benach et al. 1998, 1999). The observed enzymatic properties of JGW, such as the preference for short-chain secondary alcohols and the inability to use methanol or alcohols with negatively charged groups, further confirm that JGW has preserved the SDR’s active site and reaction mechanism.

Richard Edwards said...

Did that paper go through actual peer review from biologists? The formatting of Figure 3 is pretty poor. In future, you might like to make it clearer. (It's not divided into blocks, so it looks like you are aligning the first 100 aa of ADH with the second 100 aa etc. You also fail to mark any indicators of conservation, which makes it hard to analyse in the context of your claims. Try splitting into blocks and adding shading or Clustal-style indicators of conservation to help make your point. You also need to specify accession numbers somewhere (maybe I missed those) and indicate the actual species in the legend. Saying "different species" is not sufficient.)

Happily, I am very used to looking at awkward alignments from students and it is still obvious that this does not support your statement. Again, this alignment (and the JME paper, which is the one I looked at) are just dealing with the evolution since duplication and are therefore constrained by both function and time.

These are all fairly recent duplication events in restricted taxa: jingwei = teisseri/yakuba; Adh-twain = obscura group; repleta group. They all probably occurred in the last 20 million years or so - much less for jingwei.

To jump from the observation that "JGW has preserved the SDR’s active site and reaction mechanism" to the statement that "the core reactive chemistry consisting of about 90 residues has been stringently preserved" is not only unsupported but is indicative of a fundamental misunderstanding of the data.

Atheistoclast said...

Yes, of course it was peer-reviewed by biologists! The annotation is perfectly correct. It doesn't require much inspection to see the conserved residues/regions of Jingewei and its sister genes.

Pray, tell me, what the difference in your opinion is between the "active site/reaction mechanism" and the "core reactive chemistry"? This is a semantic point you are raising and nothing more. The key residues referred to in the Long paper have been conserved, not just in Jingewi but in 3 other Adh-derived retroposed genes.

The point about time is specious. Whether jingwei is old or recent is largely irrelevant. 2/3 of the original peptide sequence has changed, either through substitution or deletion. What remains are the residues related to the catalytic functionality of the enzyme - which is to be expected.

Richard Edwards said...

My query is not how "active site/reaction mechanism" becomes "core reactive chemistry" but how "a common protein structure with a typical Rossmann fold containing a canonical GlyXaaXaaGlyXaaGly NAD?/ NADP? binding motif, a conserved aspartate residue (Asp38) that regulates coenzyme binding, and a Ser-Tyr- Lys catalytic triad (Benach et al. 1998, 1999)" becomes "about 90 residues [that have] been stringently preserved".

The point about time is far from specious. Unless you are sure that saturation point has been reached, you cannot interpret all observed conserved residues as products of purifying selection, which you appear to be doing - apparently in direct contrast to the literature on SDR domains. I asked for the data that lead you to your conclusion and you pointed to the alignment, so it is far from specious indeed. If you have functional data, cite it. (The Zhang et al. paper discusses the 19 substitutions, not the core conservation. NB. "Core" in the paper refers to the physical core versus surface, not core versus accessory.)

BTW, it's not obvious it was reviewed by biologists - it's in a maths journal and contains at least one figure that I would not accept from my undergrad students, which suggests that the reviewers did not really know what they were looking at. It's also a commentary and not a piece of original research and the review system is often different for opinion pieces.

Atheistoclast said...

Long et al. discuss only the key residues and folds. They don't discuss every single residue! But if you look at Adh-Twain and Adh-Finnegan (shown in my figure), which are also derived from retroposed Adh, you will see that the 90 or so residues preserved in Jingwei are also preserved in them. Do you suppose that this is entirely coincidental?

I think that Jingwei has evolved to its greatest possible extent. That is what Long and other researchers have also concluded. They agree that the core catalytic functionality has been conserved but that the substrate specificity has changed. In other words, Jingwei is a "variation on the same biochemical theme" just like all other gene duplicates.

Complexity is a cross-disciplinary journal. It publishes many papers on the subject of biology which are, of course, reviewed by competent biologists. You are upset only over aesthetics and presentation rather than the content. The sequence alignment is both correct and valid.
If you don't like how I present alignments, then perhaps you should read another of my papers with a similar figure:

http://www.sciencedirect.com/science/article/pii/S0303264711000797

It too was extensively peer-reviewed. Sorry.

Richard Edwards said...

"I think that Jingwei has evolved to its greatest possible extent. That is what Long and other researchers have also concluded."

I think you see what you want to see. Please quote the bit of the paper that leads you to conclude that the authors of that paper think what you claim they do. I did not see any such claim when I skimmed through the paper.

(And no need to be sorry - the reviewers let you down.)

Richard Edwards said...

Actually, given your fondness for maths, it is easy to demonstrate why I think that your belief that "Jingwei has evolved to its greatest possible extent" is so wrong.

Zhang et al. report the Amino acid replacements (NS) and Synonymous changes (SS) for D. teissieri and D. yakuba. I am not 100% sure what these are relative to - it might be the predicted ancestor - but it does not really matter. The figures they give are D. teissieri: NS=9, SS=10 and D. yakuba: NS=28, SS=11.

Without their DNA alignment it is hard to be sure exactly what dN and dS are. (I was a bit lazy before and made a mistake when reporting it as > 2.5 for yakuba because I was not accounting for the number of synonymous and non-synonymous sites.) It is easy to come up with an extremely conservative estimate of synonymous sites, though.

Just so that any mutation is synonymous, let's only consider four-fold degeneracy: that's P, T, V, A and G. Using the XP_002089306.1 sequence for alcohol dehydrogenase from yakuba, this gives 103 possible 4-fold degenerate sites. Without any codon usage bias, full saturation of mutations would converge on 3/4 of these being different, i.e. 75 substitutions. Even assuming all 11 observed substitutions occur at these positions, that is still less that 1/7 of the way towards saturation, and that's a massive under-estimate of number of synonymous sites.

Of course, there is codon usage bias in Drosophila. If you want to see how much, you can read this paper: http://www.biomedcentral.com/1471-2148/7/226/. I don't have the time to do that maths now but I would be very surprised if the codon usage bias is so strong that this would account for having only 11 synonymous substitutions in protein-coding gene of this length. This protein has not been saturated with mutations, therefore you cannot justify the claim that""Jingwei has evolved to its greatest possible extent" - there is no way to know. (Looking at other SDR family proteins, however, would suggest it is far from true.)

Now, please explain how you rationalise it.

Anonymous said...

I don't think there's ever been a resistance to the idea of unstructured sections, since any work I did examining protein structures would show me that the crystals were missing pieces, particularly at both end of the protein, because those were unstructured even in the crystals. Often also loops around binding sites. So when I chatted with Keith I thought he was exaggerating a bit about resistance to the idea. I told him so, then he agreed that people with experience in structure were more open to these ideas, but not that unstructured stuff could improve specificity. I disagree with improved specificity (I told him so too), but I think that might be quite hard to test. For that we had mostly anecdotal examples, rather than real numbers in calories and all comparing unstructured to structured and highly discriminating proteins. But I have not followed the area too much since I moved into genomics. What's your take? Is there better demonstration of this specificity claim?

Anonymous said...
This comment has been removed by the author.
Richard Edwards said...

I don't think there's been resistance to disordered sections per se but I have certainly encountered surprise/resistance to the idea that there are functional elements in those regions and that they are not just spacers.

I am not as familiar as I should be with the arguments for "improved specificity". In my experience, it's more a case of different modes of binding and different types of interaction. I work on domain-motif interactions, which are very different in nature to domain-domain interactions - I think it's more about binding affinity that specificity. Keith Dunker's "MoRFs" are something I keep meaning to read up on as I am not entirely sure whether they are just a sub-class of linear motif that bind through an "induced fit" mechanism, or whether they are something different.

There are ideas about "fly-catching" (I think), where a motif in a flexible tail/loop makes first contact and then instigates higher affinity (but lower specificity) binding but I am not sure what the evidence for that is. I guess in this sense, the disordered regions could be said to improve specificity.

My gut feeling is that a lot of "specificity" actually comes down to localisation. Domain-motif interactions seem to be enriched in dynamic scaffolding and probably play a very large part in cooperative binding and bringing the right proteins together at the right time.

Atheistoclast said...

You need to read the paper in its entirety. Like I say, the core reactive chemistry has been preserved and the 90 or so residues that have also been preserved contribute to this essential functionality. If any changes are made to these then the enzyme will not work properly. Gene duplicates rarely ever fall below 30% homology wrt the peptide sequences because of functional constraints on sequence evolution.

Jingwei has indeed experienced strong positive selection but, as I have pointed out in my own paper, these were likely to have been of a compensatory nature in response to degeneration such as the premature truncation at the C-terminus and other deleterious changes. This can be based on account of parallel evolution evident in Adh-Twain and Adh-Finnegan that you seem to ignore for some reason. I strongly recommend you read this paper on the subject:

Parallel evolution of chimeric fusion genes

http://www.pnas.org/content/102/32/11373.full

All three genes underwent rapid adaptive amino acid evolution shortly after they were formed, followed by later quiescence and functional constraint. These genes also show striking parallels in which amino acids change in the Adh region.

In summary, jingwei differs in its substrate specificity but not in its catalytic chemistry.

Allan Miller said...

On functional specificity and sequence space, this paper is very interesting:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015364

The authors constructed a library of 1.5 million peptides, designed for structure but not for function. The peptides were 102-acids in length. 4 'natural' genes of considerably greater length and structural complexity in E. Coli were deleted, and the library assayed on the basis of ability to get these mutants growing again.

And of their library of 1.5 million, they found 18 artifical sequences, with no designed homology to the deleted proteins, that were nonetheless able to rescue function. Bearing in mind that the search was restricted to functional analogues of just 4 proteins, this is quite an impressive hit rate. It would be interesting to know how many of E.Coli's genes could be individually replaced from this tiny segment of overall protein space.

I think this demonstrates that 'random' protein space is much more function-rich than many believe. The proteins were restricted in their sequential composition only in terms of residue polarity, which is important for folding functional protein catalysts. This is weighting the game slightly - but only in the same way evolution does. It takes working structures and rejigs them; it does not shake random amino acids up in a bag.

Diogenes said...

Wow, Atheistoclast is deeply ignorant of enzymology and protein structure. Nobody would trust that in a sequence alignment over only drosophila genes, or for that matter closely related species, conservation indicates function. That's nuts.

No wonder he got 90 conserved (or invariant?) residues, when he does a sequence alignment of closely related species. This tells you little about the function.

Alcohol is a small substrate. There is no way you need 90 residues to interact with something that small. The 90 residues that Atheistoclast got that are conserved are because of common descent, that's all. The structure of alcohol dehydrogenase is typical of this sort of cofactor-assisted, two domain enzyme and that kind of thing never needs 90 residues for function.

Pray, tell me, what the difference in your opinion is between the "active site/reaction mechanism" and the "core reactive chemistry"?

The difference is that no real molecular biology grad student would say something stupid like "core reactive chemistry."

In protein molecular structure, core means core-- the inside of the protein. Core does not mean essential or conserved, it's a spatial location. No mol. bio. grad student would say "core reactive chemistry" unless the protein had its active site in its core-- which happens, for small substrates, but is rare. It is not the case with alcohol dehydrogenase, which has a standard two-domain structure with the active site on one domain and NAD cofactor binding on the other domain.

That kind of structure is common, and it never has 90 functionally conserved residues. That never happens. If you're lucky you'll get a dozen (!) residues binding NAD on one domain and 3 or 4 conserved in the triad on the other domain.

This is a semantic point you are raising and nothing more.

As I said already, core means core, a physical location. The fact that Atheistoclast talks this way makes me think that the grad. program at his school sucks and does not prepare its students for the real world.

I would like to know what school has such a lousy grad. school program that it permits its grad students to blather in this fashion. I never met a molecular biologist who talked this baby talk.

The point about time is specious. Whether jingwei is old or recent is largely irrelevant.

Bullshit. No molecular biologist would believe that a sequence alignment between species that externally look identical is by itself a trustworthy guide to what's functionally essential.

I don't know any molecular biologist who would defend such an argument. Again, I want to know what school has a grad. school that permits a grad. student to blather like this.

2/3 of the original peptide sequence has changed, either through substitution or deletion. What remains are the residues related to the catalytic functionality of the enzyme - which is to be expected.

Bullshit. No enzyme that interacts with a small substrate like alcohol is going to need 90 residues to do it. That never happens. Like I said, you need maybe a dozen to bind NAD and 3 or 4 for catalysis, total.

I've read hundreds of papers with sequence alignments for enzymes, trying to find the essential, functional residues and I've never seen this ignorance. I want to know what school trains up grad. students like Atheistoclast.

The whole truth said...

atheistoclast said:

"I don't want to play games. What I personally believe, whether I am an atheist or a fundamentalist, doesn't matter."

Wow.

When it comes to your 'side' (IDiot creationism), ALL that matters is your beliefs and agenda. You IDiots don't really care about doing science, you only want to destroy it and replace it with your creationist/dominionist agenda.

EVERYTHING you IDiots do is "play[ing] games".

Like I said, you IDiots (and yes, this goes for you) are fundamentally dishonest, and your statements above prove it.

I used to wonder if you just might have something worthwhile to offer in your arguments regarding biology, but your recent comments on this site have thoroughly convinced me that you are just like all the other IDiots. And no, that is not a compliment.

Richard Edwards said...

@Atheistoclast. Thank you for confirming that you see what you want to see and apparently have to actual data to back up your claims. "Read the whole paper" is just short-hand for "this is what I understood when I read the whole paper" - you should be able to identify specific parts that support specific conclusions - especially when these are not conclusions reached by the original authors.

Another pointer if you want to be taken seriously in biological discussions: there is no such thing as "30% homology". Homology either exists, or it doesn't. Perhaps you mean 30% identity or 30% similarity? (If the latter, you should really also indicate how you define similarity.) In some scenarios of gene fusion or exon shuffling, it might be appropriate to say that 30% of sequence X is homologous to sequence Y but that would imply that they shared a homologous region that comprised 30% of sequence X, not that sequence X had 30% identity/similarity. It is certainly not a generic statement. This is quite a basic error for someone trying to school professional biologists in molecular evolution.

Please also source you bold claim (once you have clarified it) that "Gene duplicates rarely ever fall below 30% homology wrt the peptide sequences because of functional constraints on sequence evolution". It is certainly true that below 30% identity it becomes hard to identify them from sequence analysis but the SCOP database is (to my knowledge) full of ancient gene families where the sequence similarity is lower. (Ditto domain/family databases like PFam.) Furthermore, there might be many duplicates for which we have no structural data that have diverged so far in terms of sequence that we do not recognise them as duplicates - there is a massive ascertainment bias here. Again, I think you see what you want to see and then make bold statements of assumption as if they were fact.

Lastly, your statements on parallel evolution and compensatory evolution have no bearing on the question as to whether there has been enough time for all the non-essential sites to accumulate neutral mutations - except to highlight the small number of mutations that have accumulated in total. Again, you are demonstrating that you lack understanding of the basics of protein evolution.

@Diogenes. Your general complaint about the closeness of the relationship is sound but, to help Atheistoclast understand the problem, I think it is fair to point out that it is not the restriction of analysis to Drosophila that is the problem - Atheistoclast is talking about several paralogues and so we could indeed see a strong signal of evolution. The problem is that these paralogues themselves are the products of very recent duplications. (It is in effect the same problem as the one you outline for orthologues within closely-related species.)

Allan Miller said...

No-one? Oh, OK then ...

Calls to mind the Family Guy episode where, in the last seconds, Peter says "ooh look, a magic lamp ... I wonder what happens if ... ". Cut to credits. "Oh, OK ... uh ... maybe next week, then?".

:0)

Richard Edwards said...

It looks really interesting but I've not got round to reading it yet! (From your summary, I was actually wondering why it wasn't published in Science or Nature!)

It would be interesting to know how many of the 18 sequences actually function by mimicry of the protein they are compensating versus something altogether different but it is certainly an interesting result.

Back of envelope maths assuming ~4000 genes and the same hit rate per gene implies about 1% of the protein space they explore has function, and that's without even considering new functions. I think we need a lot more experiments like this before anyone can start doing those probability calculations that Atheistoclast and the ID crowd seem to love so much and think that they mean anything.

Allan Miller said...

Cheers! One erratum - there were actually 27 knockouts tried, and their library contained members that rescued 4 of them, not just 4 deletions and all rescued, as I implied. Still remakable, IMO.

Diogenes said...

@Cabbages:

Thanks for the clarification, but does Atheistoclast use the word "paralog"? No. I guess that is what he means, but where does he use standard terminology known to every grad student?

As for the rest of my post, I stand by it, and I add the following.

Atheistoclast wrote:
I never said that all residues in a peptide sequence will be conserved!

Yes, you did. That is the hypothetical "primordial protein" that Atheistoclast considered.

It is possible that Atheistoclast is contradicting himself because he does not understand the distinction between "conserved" and "invariant", a distinction that I already clarified, and may have to clarify over and over and over.

Atheistoclast: Let us take your protein domain consisting of 35 residues. Let us also make it variable such that for every residue, 4 amino acids can be substituted there but with no effect on function. The probability of finding a functional sequence at random then becomes the inverse of: (20/4)*35 = 2.91 * 10^24.

If only 4 amino acids are tolerated at a position, it is conserved. It is not invariant. Thus Atheistoclast's hypothetical example is that a protein domain has 35 residues and all 35 residues are conserved.

But now Atheistoclast tells us:

Atheistoclast: "I never said that all residues in a peptide sequence will be conserved!"

Yes, you did. But it might not be that Atheistoclast is lying-- he might be too dumb to know the distinction between conserved and invariant. It's like arguing with a twelve-year-old.