Sunday, January 22, 2012

Margoliash on "Homology" (1969)

Emanuel Margoliash (1920 - 2008) is famous for his studies of the evolution of cytochrome c genes/proteins. His lab sequenced dozens of them and he published some of the first molecular phylogenetic tress back in the early 1960s.

I recently stumbled on a letter he published in Science back in 1969 (Margoliash, 1969). It's about how you define "homology." This is one of my pet peeves. I've been trying to teach people for years that homology refers to the fact that two genes share a common ancestor. It a conclusion based on evidence such as sequence similarity. For example, if two genes/proteins are more than 30% identical over their entire length then you can conclude that they are homologous—they descend from a common ancestor. The conclusion is based on evidence, such as 30% sequence identity. Don't confuse "similarity" and "homology" because they are two different things.1

Homology is like being pregnant. Either you are or you aren't. You can't be 30% pregnant and you can't be 30% homologous.

I knew that the definition of homology had changed over the years but I didn't know that the dispute over its usage in molecular phylogeny started in the 1960s. Here's the Margoliash letter.
I regret the error in citation (the journal name was given as Nature, rather than Science), which crept in among the 462 references of the review (1) to which Winter, Walsh, and Neurath take exception (Letters, 27 Dec.). In that review, the term homologous was taken to imply, in parallel to universal biological usage, "that the genes coding for the polypeptide chains considered, in all the species carrying these proteins, had at one time a common ancestral gene," and we stated that when this concept is not intended "it would be best to use any of the numerous synonyms of 'similar' and 'similarity' and not appear to be prejudging the issue of evolutionary relations." The "pointed and specific criticism" followed, and was entirely contained in the sentence: "Other definitions may cause confusion and are unlikely to supplant well established biological usages." The "other definitions" referred to the article by Neurath, Walsh, and Winter (2), in which they state, "The term homology as applied to proteins refers to similarity in amino acid sequence," and later, that comparisons of protein structures "must be interpreted on a statistical basis lest we misinterpret random similarities."

On this last score there is no argument. Winter, Walsh, and Neurath will surely agree that in this field erroneous conclusions are likely to arise from the lack of an appropriate statistical distinction between random similarities and similarities of structure greater than can result from random phenomena. An excellent method of performing just such a distinction was published by Fitch (3), and although Neurath, Walsh, and Winter acknowledge it in their article (2), they do not use any acceptable statistical techniques in their comparisons of proteases. Thus, even by their own definition they fail to show "homology."

Homology, in any biological evolutionary context has a generally understood and well-defined meaning, namely the one we have adopted for use in protein primary structure comparisons. One cannot argue that such comparisons represent an area of knowledge separate from evolutionary biology, and that therefore one may use the same words for other meanings, since such protein studies obtain their interest largely in terms of evolutionary concepts and have their major impact in the taxonomic-evolutionary field. Winter, Walsh, and Neurath justify their novel definition of "homology" by maintaining that, without fossil remains, it is not possible to decide whether the structural genes corresponding to a set of present-day proteins are or are not ancestrally related. Apart from the inherent danger of assuming that a problem is insoluble, it may be pointed out that six pages after the definition of "homology," the paper (1) reviewed a statistical method for demonstrating just such ancestral homology. One requires enough primary structures to derive a "statistical phylogenetic tree," as has been possible in the case of cytochrome c (4). From such a tree a simple statistical calculation permits one to approximate the number of residues in a set of proteins that will remain invariant, because of biological necessity, no matter how many species are examined (5). If, in the comparison of any two proteins of this set, the number of identical residues is substantially in excess of the number that remain invariant in the entire set of proteins, then clearly this excess cannot result from functional convergence from different phylogenetic origins, a process yielding analogous structures, and, therefore, it can only be attributed to ancestral homology. In such a procedure, the assumption of the constancy of the genetic code has replaced the fossils of the morphological evolutionist.

Even if one does not accept the validity of such a demonstration, it is difficult to understand why there is an insistence on using the word "homology" for "similarities of protein primary structure greater than random." Any of the over 30 synonyms of "similarity" (6) or a variety of elegant neologisms would do, and prevent an insidious misunderstanding likely to arise in biological literature. Rather than take Alice in her confused trip in Wonderland as a model for logical scientific nomenclature, I prefer to follow the 17th-century poet reacting against a form of debasement of the language then prevalent, and "call a cat a cat" (7).

Department of Molecular Biology,
Abbott Laboratories,
North Chicago, Illinois 60064

1. C. Nolan and E. Margoliash, Ann. Rev. Biochem. 37, 727 (1968).
2. H. Neurath, K. A. Walsh, W. P. Winter, Science 158, 1638 (1967).
3. W. M. Fitch, J. Mol. Biol. 16, 9 (1966).
4. W. M. Fitch and E. Margoliash, Science 155, 279 (1967).
5. W. M. Fitch and E. Margoliash, Biochem. Genet. 1, 65 (1967).
6. Roget's Thesaurus (St. Martin's Press, New York, 1965).
7. N. Boileau, Satires 1, line 52 (1660). "J'appelle un chat un chat, et Rolet un fripon."

1. Very few people pay attention to me. I appear to be fighting for a lost cause.

Margoliash, E. (1969) Homology: A Definition. Science 163:127


  1. "I appear to be fighting for a lost cause." This seems a surprising thing to say - I haven't come across any definition that does not require shared ancestry - see for example the wikipedia entry, which discusses homology in a broader sense than just sequence homology, and makes no mention of any alternate usage. Ditto for any number of text books. Are you saying that this example of Winter et al represents more than just an isolated and appropriately ignored/forgotten attempt to redefine an already well established term?

    Using "homology" to mean "similarity" is a common enough (and very understandable) mistake of the sort one looks out for and corrects when made by students. It is an important mistake to watch out for, because it may indicate that the student in question has failed to understand the distinction between a direct observation and an inference based on observed evidence - which is fundamental in any empirical science.

    1. The scientific literature is full of examples of the misuse of "homology." The most glaring errors are "30% homology" or "highly homologous." More subtle errors are "homology modeling" and "homology searches."

    2. There are definitions that do not require common ancestry. Richard Owen, the man who originated the term in biology, defined it simply (and vaguely) as "sameness". Pattern cladists refuse definitions from common ancestry as well.

      John Wilkins blogged on this:

  2. @konrad, agree wholeheartedly

    I have one lingering question about the example of homology provided: is the percentage value arbitrary (for the sake of presenting the argument) or does this value emerge from some statistical method?

    1. If you're referring to my choice of 30%, that's arbitrary. The real cut-off is closer to 20% but that depends on the length of the sequence.

    2. In detail, it depends on the background frequencies of the various nucleotides or amino acids, the size of the database you are searching against, etc. Basically you are assessing the probability that the observed match would occur by chance in a database of size X. The "e-value" is the expected number of hits expected to be of the same score as the observed match. If e > 0.01, you can't be very sure of homology. If e < 1x10^-100 or whatever, you're pretty darn sure.

    3. And the length is important because the very same percent can be achieved with one amino acid in common for a five amino-acids long alignment, which means nothing. That's why identity cutoffs are silly. Cutoffs have to be based on the stats. (I always have trouble explaining that to some of my colleagues ...)

  3. I've been defining homology to my biology and genetics classes as 'similar because of common ancestry'.

    If similarities are strong enough to not be explainable by chance, and arbitrary enough to not be explainable by convergence, then we infer they are due to shared ancestry and that they are thus homologous.

    This definition seems to work at all levels, for genes as well as for phenotypes.

    1. But homologues do not need to be similar. This is more evident in non-molecular characters, such as the human incus and the quadrate bone of turtles.

    2. If they aren't similar, we don't recognize them as homologies. The human incus and turtle quadrate are similar in a great many ways, in particular in their contacts with other bones and their embryological origins. If this were not the case, we would have no idea they were homologous.

      Anyway, the proper definition of homology was given by Colin Patterson, in one word: synapomorphy.

    3. Well, yes. You could well say those are similarities as well. However, what makes something recognisable is not the same as what the thing is in itself. Similarity in homologues is merely a side-effect of common ancestry.
      (Sorry for the delay)

  4. Very few people pay attention to me. I appear to be fighting for a lost cause.

    Yeah... That's because most people tend to settle for a common sense and you like being a prude. The whole binary concept of "homology" is only, in practical terms of added value, good for insisting that it is binary.

    The real cut-off is closer to 20% but that depends on the length of the sequence.

    There is no real cut-off.

  5. The whole binary concept of "homology" is only, in practical terms of added value, good for insisting that it is binary.

    So you would support a definition of homology that would confuse convergent evolution and shared decent?

    1. Why would I insist on confusion? Nope. All I am saying is that a work homology means "sameness" and that's how it is used in a great many contexts. And there is nothing wrong with that. When the question of decent is considered, it is alright to explicitly use binary meaning (although I personally would prefer some better chosen term). What I find extremely silly is insisting that this poorly coined term can only have a single meaning, the one that's not actually common sense.

      The things is, never in my life I saw anyone truly confusing convergent evolution and shared decent but I heard at least few gazillion times this "but homology is binary, you moron!". As I pointed to in the earlier thread on a similar subject, it's like people get orgasm or something out of pointing that ~ 90% of others use the term "incorrectly".

    2. I wish I was in your shoes. People confuse the two all the time, in my experience. And make assumptions of common descent with little or no real evidence...

    3. "Sameness" in homology was never supposed to mean simple similarity, that's the point of Owen's distinction between analogy and homology.

      As I pointed to in the earlier thread on a similar subject, it's like people get orgasm or something out of pointing that ~ 90% of others use the term "incorrectly".

      Perhaps (I hope not) 90% of people who work with molecular data only, but anyone who also works with non-molecular data would know better. We should aim to keep a consistent terminology to facilitate communication between the different areas of biology, especially for transversal concepts.

  6. What is the evidence for evolution? Homology, …(lots of other things).
    If homology is to be evidence for evolution, common ancestry cannot be included in the definition of homology.
    Therefore, homology should be restricted to its original (Owen’s) definition as having similar relative position, as then homology can be used as evidence for evolution.

    1. The evidence for evolution is similarity, in this case, sequence similarity.

      If the similarity is sufficient, we conclude that evolution occurred. The short-hand way of stating this conclusion is to say that two sequences are homologous.

      "Homology" is not the evidence for evolution. "Similarity" is the evidence for evolution. You are making the same mistake that Jonathan Wells made in "Icons of Evolution."

  7. Elliott Sober, 2008. Evidence and Evolution: the logic behind the science.
    Chapter 4: ‘Common Ancestry’.
    Paragraph ‘Homology’ (page 283 in my paperback copy).

    “My focus has been on how similarity (or a dissimilarity) that characterizes a pair of species provides evidence that discriminates between the common-ancestry and the separate-ancestry hypothesis. Isn’t this to ignore the fundamental biological point that it is homologies that provide evidence for common ancestry? There is a large literature on how the concept of homology should be understood, but the question at hand in fact has a simple answer. Homologies are usually taken to be similarities that are present because of inheritance from a common ancestor. The wings of sparrows and robins are homologies in this sense. A homoplasy, in contrast, is a similarity that is not due to inheritance from a common ancestor but instead arose because of independent origination events occurred in separate lineages; the wings of birds and bats are an example. So defined, the concept of homology already has built into it the claim of common ancestry. If our goal is to test the common-ancestry hypothesis against the separate ancestry hypothesis by looking at data, then is would beg the question to say that our data consist of “homologies” in this sense (8). What counts as an observation in this problem must be knowable without one’s having already an opinion as to which of two competing hypotheses is true. That is why similarities are the right place to begin.”
    “(8) Sober (1988) argues that if synapomorphies are to be evidence for one phylogenetic tree over another, then the concept of a synapomorphy should not be defined to mean that the trait is a homology.”

  8. “…. homology refers to the fact that two genes share a common ancestor. It a conclusion based on evidence such as sequence similarity”. If so, common ancestry cannot be part of the definition of homology.

    We’ve been here before.

    The problem started when Larry Moran started (Thursday, December 08, 2011 12:02:00 PM in the line) gave a definition of homology that included ancestry. Not as a conclusion, mind!
    “Here's the definition from Evolution (2009) by Douglas Futuyma.
    Under the phylogenetic concept of homology, which is fundamental to all of comparative biology and systematics, homologous features are those that have been inherited, with more or less modification, from a common ancestor in which the feature first evolved. That is, homologous structures are synapomorphies.”

    However, on (Thursday, December 08, 2011 3:46:00 PM, same line) Larry Moran said:
    "Homology is a conclusion based on evidence. The evidence is based on significant structural, or sequence, similarity, shared developmental pathways, and shared genes.
    Evolution is the explanation for why structures (or genes) are homologous”

    Larry Moran moreover said: (Dec 11, 2011 04:52 PM)
    “I'll stick with the definition in the best textbook on evolutionary biology.
    Everything I said is consistent with that definition.”

    No, it isn’t.

    1. Are the following statements inconsistent with each other?

      "The expansion of the universe is the increase of distance between parts of the universe with time"

      "The expansion of the universe is a conclusion based on evidence"

      If your answer is "no", you should be able to see why Larry wasn't being inconsistent. If your answer is "yes", you're hopeless idiot.

  9. Here, let me try. Some years ago, I wrote a response to Icons of Evolution. Here's the part on homology:

    "Why do textbooks define homology as similarity due
    to common ancestry, then claim that it is evidence for common
    ancestry -- a circular argument masquerading as scientific


    This question stems from confusion on Wells' part between how something is defined and how it is recognized, which are two quite different things. Homology is indeed defined as similarity due to common ancestry. But we don't just label any similarity a homology and call it evidence for common ancestry. That would indeed be circular. What we really do is quite different. Similarity between the characteristics of two organisms is an observation. If the similarity is sufficiently detailed ("both are big" or "both are green" won't do) we consider it a candidate for homology.

    Homologies can be tested to some degree by predicting that the characters will be similar in ways we haven't yet checked. For example, if we propose that similar-looking bones in two animals are homologous, we might predict that they would arise from similar precursors in the embryo, have similar spatial relationships to other bones in the organism, and have their development influenced by similar genes. And this is commonly the case.

    But the main way of testing candidate homologies is by congruence with other proposed homologies. By congruence we mean that the two characters can plausibly belong to the same history. If the history of life looks like a tree, with species related by branching from common ancestors, then all true homologies should fit that tree; that is, each homology should arise once and only once on the tree. If a large number of functionally and genetically independent candidate homologies fit the same evolutionary tree, we can infer both that the candidates really are homologies and that the tree reflects a real evolutionary history.

    And in fact that's what we commonly find. Mammals, for example, are inferred to descend from a common ancestor because they all have hair, mammary glands, and other more obscure characteristics like seven neckbones and three earbones. All these characteristics go together: mammals have all of them and no other animals have any of them. Further, other characters support consistent groupings within mammals, and groupings within those groupings. Within most of life, groups are organized in a very special way called a hierarchy. In a hierarchy, every group is related to every other group in one of two ways: either one group entirely contained within the other (as in a below), or they share no members at all (as in b below). No two groups can partially overlap (as in c below).

    [Sorry, you will have to imagine the graphic. a. is a pair of concentric circles; b. is a pair of non-overlapping circles; and c. is a pair of overlapping circles.]

    What we see if we try to organize species using candidate homologies is that groups organized according to different characters fit together like a and b, but not c, so we get a pattern like this:

    [This is a Venn diagram showing a set of characters diagnosing mammals and various groups within mammals, e.g. a cow and a whale with a circle around them labeled "double-pulley astragalus".]

    Why should these and many other characters all go together in this consistent way? Evolutionary biology explains these characters as homologies, all evolved on a single tree of descent, like this:

    [This is a cladogram with the characters from the Venn diagram above optimized onto the branches.]

    Wells gives no alternative explanation for such patterns, and indeed they are hard to explain in any other way than as reflections of an evolutionary history. Wells has it all wrong. Homology isn't a circular argument, it's a branching tree of evidence.