Sandwalk: Don't misuse the word "homology"

Thursday, March 05, 2015

Don't misuse the word "homology"

Here's the latest science news from The Allium": Evolutionist Loses It As Colleague Conflates Homology and Similarity Yet Again.

Evolutionary biologist Dr. Constance Noring shot and killed her microbiology colleague and formerly good friend, Dr. Dan Deline when, for the umpteenth time he used the word homology when he really should have said similarity.

Read the rest. I sympathize with Professor Noring. This could have been me if Canadians were allowed to buy handguns.

37 comments :

Anonymous said...: When a speaker says homology when they meant similarity or identity, my colleagues already know why I'm so insistently raising my hand. Sigh!; Thursday, March 05, 2015 4:26:00 PM
John Harshman said...: How about if they used a corrected similarity value and called it homology? Say, 75% raw similarity corrected (let's use Jukes-Cantor); hmmm, I get 30 changes per 100 sites, or 5% multiple hits, of which around a third are reversals. Would it be OK to say, in that case, 73% homology, i.e. about 73% of sites are identical by descent?

Just asking.; Thursday, March 05, 2015 4:52:00 PM
Anonymous said...: The 30% changed sites would still be homologous. Only variant homologs. The sites would play the role of "characters," and their conservation/change the role of "character states." The "characters" are homologous.

Of course, the history can get muddled by indwells, which makes it hard to tell precisely which positions can still be considered homologous characters, so then it's better to refer to the sequences as the "homologous characters" under consideration.

Percent homology might work, though, when referring to domains. As in a domain making 20% of a protein would be homologous to the same domain that makes 50% of another. But then, which percent would you use? I rather say that both proteins share a homologous domain and avoid confusion.; Thursday, March 05, 2015 5:28:00 PM
Anonymous said...: "indels" not "indwells." No idea how that word got there.; Thursday, March 05, 2015 5:29:00 PM
Larry Moran said...: Homology is a conclusion based on evidence. For example, you may conclude that two genes descend from a common ancestor (i.e. they are homologous) based on the evidence that their aligned sequences are 30% identical.

Homology is a word like "pregnant"—you either are, or you aren't, based on evidence. You can't be 30% pregnant.; Thursday, March 05, 2015 5:49:00 PM
Larry Moran said...: "indwells" got there because either gods or gremlins infect this blog. They attack me almost every day. (Those spelling mistakes and typos are not my fault.); Thursday, March 05, 2015 5:56:00 PM
John Harshman said...: photosynthesis: I'd say there are two sorts of homology working here: site homology and base homology. You're talking about the first, and I'm talking about the second. The same applies to morphology. Different states of a character are homologous in one way, but the same state in different taxa can be homologous (or homoplasious) in the other way.

Larry: I would consider "percent homology" to be a summing of individual homologies. If 80% of the bases in a sequence are identical by descent, it makes sense to call the sequence 80% homologous. This is not the same thing you were complaining about before.; Thursday, March 05, 2015 6:02:00 PM
judmarc said...: So is this "The Onion test"?; Thursday, March 05, 2015 6:38:00 PM
Anonymous said...: I was talking about each position being considered the character under investigation. So bases or amino-acid residues or complete sequences makes no difference. My point applies, each base, conserved or not, would be a character. Each base can be considered homologous whether they are conserved or not. But we rarely go at the base level for characters because the thing can easily get muddled with semantic/philosophical problems.

In your example you're implying that you know that the non-conserved positions are not homologous. That you know that only conserved ones are. Again, character states here would mean that the homologous character is conserved or not conserved, not that the characters are not homologous. To make matters more confusing, for example, apparently conserved positions can also have gone through homoplasy without having lost their homologous status. The positions/bases would be homologous by descent because, despite they have changed states, they are related by common descent.

But it's easy to get all muddled and confused. So I prefer to stay at the sequence level, rather than at the base/reidue/position level. And I rather not allow the use of the word homology when identity is the word that best avoids confusion. When you say identity you don't have to explain that much more. If you want to say percent homology you have lots of assumptions and explaining to do.; Thursday, March 05, 2015 7:01:00 PM
Anonymous said...: Did the smell betray us?; Thursday, March 05, 2015 7:03:00 PM
John Harshman said...: It isn't clear to me what you're trying to say, but I don't think you understand what I'm saying. Let me try again. There are two sorts of homology in DNA or protein sequences. The first is site or positional homology, i.e. sites that we align as the same site even if their bases/residues are different. The second is base/residue homology, e.g. two glycines at position 34 that are glycine because they were glycine in the common ancestor and were never replaced. Positions and their contents are not the same thing.

Again, we do exactly the same thing with morphological characters. Characters are homologous, and character states are homologous too. If two taxa have the same character state, that's homologous if they got it by inheritance of that state from their common ancestor.

Identity doesn't have to be homologous, given that homoplasy happens. The two concepts are different and need different words. You just have to be aware which one you're really talking about.; Thursday, March 05, 2015 8:19:00 PM
Anonymous said...: John,

I understand what you're saying (or I think I do). My attempt was at showing that going your way only adds to confusion.

Why not leave the two glycines at position 34, which are both glycine because they were so in the common ancestor, as being identical character states of a homologous character?

Why would we want to define (by contrast) the tryptophan/phenylanaline pair at position 64 as being non-homologous because one (or both) of them changed from their state in the common ancestor? Why not think of them also as character states of a homologous character, only states that did change?

You're making character states into characters, which is confusing and does not help us understand each other. Of course, you could justify it, but it would still be confusing. Just see how much explanation has gone between us, and I might still be unable to explain my point, while you keep thinking that I don't understand yours?; Thursday, March 05, 2015 9:16:00 PM
Anonymous said...: If two taxa have the same character state, that's homologous if they got it by inheritance of that state from their common ancestor.

Agreed. But if you define percent homology from that, then you're ignoring the homologous characters that did not remain in the same state.; Thursday, March 05, 2015 9:20:00 PM
John Harshman said...: You may not like it, but the commonly understood meaning of homology applies both to characters and to character states, separately. Homology is defined as similarity due to common ancestry. Different states of one character are homologous characters if that character was found in the ancestor; but identical states are homologous states if that state was found in the ancestor. You may want to apply a special, molecular meaning to the term, but I don't see why. There are level of homology; always have been.

Tryptophan at position 64 is not homologous to phenylalanine at position 64, but position 64 is (or may be) a homologous site in two species even if occupied by non homologous residues.; Thursday, March 05, 2015 10:14:00 PM
Larry Moran said...: If 80% of the bases in a sequence are identical by descent, it makes sense to call the sequence 80% homologous.

No, that makes no sense at all.

If you have decided that the stretches of nucleotides share a common ancestor then they are homologous. You describe their relatedness by saying that the sequences are 80% identical. In most cases, that's the evidence that you used to reach the conclusion in the first place.

When you align any two DNA sequences you'll find that roughly 25% of the base pairs are identical. In that case, it makes no sense to say that each of those "characters" is homologous and the sequences are 25% homologous.

Once you've decided that the two sequences are homologous that's the end of the story. The sequences are usually genes but if they're not then it has to be a significant stretch of DNA. It makes no sense to examine small regions of that stretch and say that this 10 bp stretch is 90% homologous while that 10 bp stretch is only 60% homologous.; Friday, March 06, 2015 7:16:00 AM
Anonymous said...: Gun held side ways?! Oh that's a kill shot right there! (RIP Dr. Dan Deline); Friday, March 06, 2015 7:35:00 AM
Marcoli said...: OMG, how did I not know about this web site? Well, looks like I have another reason to get less stuff done!; Friday, March 06, 2015 8:41:00 AM
John Harshman said...: It makes no sense to examine small regions of that stretch and say that this 10 bp stretch is 90% homologous while that 10 bp stretch is only 60% homologous.

Why not? Would it make sense to say that 90% of the bases in a 10bp stretch are homologous? (Of course, mere identity doesn't equal homology, given that there is homoplasy too.); Friday, March 06, 2015 9:03:00 AM
Larry Moran said...: Why not?

Because we have perfectly good ways of saying the same thing without abusing the word "homology." We can say that the genes in two species are homologous and certain segments are more highly conserved than others. We can even say that there's a short segment in the two genes where the sequences are 90% identical in divergent species.

Why do you think we have to use the word "homology" in this context?; Friday, March 06, 2015 10:08:00 AM
Petrushka said...: Dumb question. If "random" DNA sequences are 25% identical, are they all likely to share a common ancestor?; Friday, March 06, 2015 10:13:00 AM
AllanMiller said...: Not on those grounds alone, no - it's just that there are 4 bases, so any two drawn at random will be the same 25% of the time.; Friday, March 06, 2015 10:19:00 AM
John Harshman said...: Why do you think we have to use the word "homology" in this context?

Well, of course we don't have to do anything. A better question is this: Why are we forbidden to use it? At any rate, the homologies of individual characters and character states are not invalid questions in morphological studies. Why are molecules to be considered different?; Friday, March 06, 2015 12:41:00 PM
Petrushka said...: Allen Miller: I misunderstood. I didn't realize they were being matched one at a time. But I'm glad I asked the question.; Friday, March 06, 2015 12:57:00 PM
Unknown said...: When you align any two DNA sequences you'll find that roughly 25% of the base pairs are identical.

Are you sure of that? If you have two random sequences of length l, sure we expect 25% identity, with an SD of 43%/l^.5. But even the most simple alignment algorithm will tend to produce greater sequence identity. How much would depend on the precise method used and the length of sequences involved, but aligning a 100BP sequence to a 10k BP sequence assuming no indels gives me about 43% sequence identity for instance. That's quite a bit higher than 25% and would go further up if indels were allowed.
25% identity is not a very useful baseline. We've been over this when you were arguing that micro RNAs were not generally highly conserved. But for a sequence of 30BP aligning it to a sequence of 100kBP we get about 60% as a baseline, rising to 70% for 1MBP. Detecting homologs requires a high degree of sequence conservation for these short sequences.; Friday, March 06, 2015 1:10:00 PM
Larry Moran said...: Are you sure of that?

No, of course not. Everything you say is true and the problems even apply to amino acid sequences. (Although I would argue that you need to correct identify calculations by subtracting gap penalties.)

I didn't think it was important to quibble in order to make the point.; Friday, March 06, 2015 1:18:00 PM
Larry Moran said...: At any rate, the homologies of individual characters and character states are not invalid questions in morphological studies. Why are molecules to be considered different?

It would be pretty silly to say that the wings of a bird and the flippers of a seal are 42% homologous.

Why are molecules to be considered different?; Friday, March 06, 2015 1:32:00 PM
John Harshman said...: The reason it's silly to say that wings and flippers are 42% homologous is that we have no objective measure of percent homology, since character scoring is a subjective process. Our judgments would depend on what particular characters we had abstracted from the anatomy. For molecular sequences, on the other hand, scoring (once you've aligned them, that is) is simple and objective.

Would you consider it odd to say that 42% of the bases in a given sequence are homologous between two taxa?; Friday, March 06, 2015 2:17:00 PM
roger shrubber said...: I'm confused about the non-homologous parts. I get sequence identity. I get reversals. But for a SNP, it still shares an ancestor, your 'site homology'.
Now indels are different. If a gene or protein were to be called 70% homologous, I would want that to mean that you can align 70% of the sites with the rest being recent indels (since LCA) but I'm still thinking about bits that were in the LCA and deleted in one.
Do you really use homologous to describe character identity? Should you? I'm scratching my head.; Friday, March 06, 2015 2:39:00 PM
John Harshman said...: Do you really use homologous to describe character identity?

Of course you do. To take a gross example, let's consider a character we might call "tetrapod forelimb". Now of course that character is homologous throughout tetrapods. Now consider a few character states, and let's naively code it as "leg" or "wing". "Leg" is of course the ancestral state, and both birds and bats have the derived state "wing". But those states are not homologous.

It's the same with the bases at any given site, except that we have no hope of telling whether two A's are homologous or homoplasious just by examining them.; Friday, March 06, 2015 3:51:00 PM
Larry Moran said...: Would you consider it odd to say that 42% of the bases in a given sequence are homologous between two taxa?

Yes, because we have a far better word for it. We can say that the sequences are 42% identical. That's the raw data that leads us to the conclusion that the genes/sequences are homologous.; Friday, March 06, 2015 4:04:00 PM
roger shrubber said...: That doesn't address my question. I get "homologous as limbs" "Not homologous as wings". Wingness was not shared.
But does anything consider the Aness or Tness of a specific site? It seems severely contrived outside of anything other than artificial algorithmic accountancy.; Friday, March 06, 2015 4:10:00 PM
Anonymous said...: John Harshman, normally I either agree with you or wish I had agreed with you because I was either wrong or ignorant when I didn't. Here, however, I disagree with you and I think you're wrong. (Yes, I do understand your distinction between characters and character states -- I just don't think it is useful for communication.)

In a writing class long ago, I learned that if a writer wants to be understood, he has a responsibility to write clearly, not to bitch about readers who fail to understand. There is a sense in which an entire DNA sequence can reasonably be treated as homologous even if some of the bases have mutated and are no longer identical. When discussing sequences in that sense, saying that two non-identical bases are not homologous would just be wrong. Of course, if you switch to a base-centered frame of reference and somewhat redefine "homologous" to mean identical by descent rather than similar because they are descended from a common ancestor, you can reasonably say that non-identical bases are not homologous. However, you can't expect that your readers to come along with you on this little mental side track, unless you explain a lot.

We'd be stuck with these multiple definitions of homologous if that were all we have (think of the chromosome / chromatid mess in meiosis that is guaranteed to confuse students), but we have a way to express what you mean much more clearly (for your audience) if you use percent identity, rather than percent homology.

Of course, percent homology is a common phrase, but it's a confusing phrase and should be discouraged simply. Forget arguments on fine shades of meanings; percent homology simply does not communicate well.; Friday, March 06, 2015 4:34:00 PM
John Harshman said...: bwilson: These are not multiple definitions of "homologous"; it's all the same definition, applied to different features. And this is how the term is used by systematists; I didn't invent it.

Larry: Identity and homology are not the same thing. Some of that identity is homoplasy, as in my example that started this little argument.

Roger: The analogy was intended to be crude just to make it understandable. "Wingness" is indeed shared, it just isn't homologous. Now in the example we can easily tell that bats and birds do not have homologous wings. But in thousands of real cases in morphology (and always in molecular sequences) the non-homologous states look similar enough -- or in fact identical -- that the only reason they're known as homoplasious is after the fact, i.e. because that character doesn't match the tree. This is a routine statement in morphological systematics, and the molecular case differs in no significant way.; Friday, March 06, 2015 6:33:00 PM
DK said...: Oh, those silly semantics warriors - they sure get tiring after a while.

And the funny thing is, in most cases they are dead wrong when they object to the use of "homology" because high similarity almost invariably means homology.; Friday, March 06, 2015 10:32:00 PM
AllanMiller said...: So, when RecA (or one of its ... er ... homologs such as RAD51) does its ... er ... homology search, we should say something different? This usage has become pretty much embedded in certain areas. Which may grate, but language evolves, even scientific language. Personally, I deplore beginning sentences with 'so', and pointless use of ellipses ...; Saturday, March 07, 2015 4:54:00 PM
Larry Moran said...: You do a search for possible HOMOLOGY based on sequence similarity or structural similarity.; Saturday, March 07, 2015 5:28:00 PM
AllanMiller said...: You do, yes, but RecA itself is frequently described as doing a 'homology search'!; Sunday, March 08, 2015 6:10:00 AM

Quotations

The old argument of design in nature, as given by Paley, which formerly seemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.Charles Darwin (c1880)

Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)

Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Peter Atkins

Quotations

The world is not inhabited exclusively by fools, and when a subject arouses intense interest, as this one has, something other than semantics is usually at stake. Stephen Jay Gould (1982)
I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory. Stephen Jay Gould (2002) p.1339
The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change. Stephen Jay Gould (1977)
Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance. Stephen Jay Gould (1980)
Since 'change of gene frequencies in populations' is the 'official' definition of evolution, randomness has transgressed Darwin's border and asserted itself as an agent of evolutionary change. Stephen Jay Gould (1983) p.335
The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat. Stephen Jay Gould (1999) p.84

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is True

I once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000

It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma

One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick

There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner

An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins

Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod

The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.

Richard Lewontin

More Recent Comments

Thursday, March 05, 2015

Don't misuse the word "homology"

37 comments :