Monday, March 30, 2015

Interdisciplinary Research

I'm at the Experimental Biology meetings in Boston and yesterday I dropped into a session on "Training the Mind of an Interdisciplinary Scientist." There were talks on how to resolve disputes among member of the interdisciplinary team, and on how to choose a problem that your customers want solved (from an engineer). There are was also a talk from the University of Missouri-Kansas City about their graduate program. Every single graduate student has to choose an interdisciplinary problem for their thesis topic and they have to take a half dozen courses in each of two disciplines (at least).

The only experience I've had with being interdisciplinary is when I tried to understand what computer scientists were interested in and whether we could work together on some problems. We couldn't. The gap was too large. So biochemists have just adopted the tools and techniques of computational sciences and moved on.

Very few of my colleagues are doing interdisciplinary research and they seem to be getting along just fine. Is this whole "interdisciplinary" thing just a fad? Do you know anyone whose main area of investigation spans two distinct disciplines?

I got the distinct impression from the session that there's pressure from university administrations and granting agencies to become interdisciplinary. Is this true?


  1. You seem not to understand what interdisciplinary science is. If a biochemist really "adopted the tools and techniques of computational sciences" rather than just using an off-the-shelf piece of software as a "black box" with no real understanding of how it works, then they are doing interdisciplinary science. However, such a person would unlikely to find the gap between their work and computer science to be "too large".

    These days one of the most important branches of biology is computational biology. To do computational biology you have to know a fair amount of computer science in addition to biology. Yes, some of the older people in the field (such as myself) are more or less self-taught on the computational side, but the undergraduates today are taught computer science as as well as biology and that's a good thing in my opinion.

    1. These days one of the most important branches of biology is computational biology.

      That's ridiculous. You must have a very narrow view of biology.

      To do computational biology you have to know a fair amount of computer science in addition to biology.

      That depends what you mean by "doing computational biology." I've been working with computers in my research since 1968 and I've never found that I had to learn what NP-hard really means or algorithm theory. I've never, ever considered publishing in a computational science journal. Most of my younger colleagues are in the same position. We use computers but we don't "do computer science."

      ... the undergraduates today are taught computer science as as well as biology and that's a good thing in my opinion.

      No, they aren't and no it isn't.

    2. I've been working with computers in my research since 1968 and I've never found that I had to learn what NP-hard really means or algorithm theory.

      You are interested in phylogeny, aren't you? If you want to conduct research on new methods of phylogenetic inference (as I did in my postdoc), you certainly *do* have to learn about things like NP-hard and algorithms. If you don't, you may *use* computers and programs but you can't really understand them.

      Your frequent poster Joe Felsenstein is in the National Academy largely because he didn't just stick to his background in genetics but learned computing and statistics to develop maximum likelihood phylogeny.

      No, they aren't and no it isn't.

      Evidently you are out of touch with what is going on in your own department! Please consult:

      "The Bioinformatics and Computational Biology Specialist Program provides a balance between computer science, mathematics and statistics, and biochemistry, molecular and cellular biology and genetics"

    3. "The Bioinformatics and Computational Biology Specialist Program

      There are three or less students in each year of that program and all are computer science majors who want to learn some biology so they can become bioinformaticians. They do poorly in the biochemistry courses so many of them drop out.

      You must be really stupid to think that I don't know what courses university undergraduates take and what goes on in my own department. I taught bioinformatics for several years but what students need to know about bioinformatics is trivial compared to the discipline of computer science.

      We tried to set up a bioinformatics program in the 1990s but we decided that the cultures were so different that students could not become competent in either discipline. Some of my colleagues gave it a go in 2004. Turns out I was right.

    4. Toronto’s loss then. The rest of the world understands that real bioinformatics (involving algorithmic design and not just running BLAST and what not) and the related field of biostatistics is the key to making sense of the big data we are facing these days.

    5. I've written a few programs in my time but I don't pretend to be knowledgeable about algorithm theory. I don't know anything about biostatistics. In spite of these major decifiencies in my interdisciplinary education, I was able to make sense of the ENCODE data.

      Unfortunately, the "sense" that I made was very different form that of the experts in algorithms and biostatistics. Isn't that strange?

    6. Jonathan Badger,

      Surely interdisciplinary science cannot just mean a biologist adopting a new tool? Hardly any science can work without statistics, so any science would be interdisciplinary, and then the word is empty. (If everybody is special, nobody is, etc.)

      Bioinformatics is a funny one, by the way. At my alma mater I got the impression that a bioinformatician is somebody who does computer science (programming, preferably in several languages) and at the same time understands enough biology to put their programming skills to use to solve biological problems. But at my current institution I have run into people who call themselves bioinformaticians because they know how to use one (1) statistical software package and have no knowledge of biology whatsoever. Is that a local thing or has the meaning of the term changed since I was at uni?

    7. Well, over time interdisciplinary programs do become mainstream. Molecular biology was originally an interdisciplinary program with elements from physics, chemistry and genetics, but eventually it was expected that all biologists (and not just those in "molecular biology" departments) could handle them. But it wasn't that way even as late as the 1980s, where "naturalists" even proudly declared their ignorance of molecular methods.

      But, say, biostatistics is decades from really being integrated into biology proper. It isn't just the ability to do a t-test in Excel or what not -- it's the ability to understand how experiments need to be designed (and with how many samples) in order to get a meaningful result. This really is a interdisplinary mixture of statistics and biology

      No, the meaning of bioinformatican hasn't changed, but yes, you can find people with very limited computational or biological skills calling themselves bioinformaticans because the demand for bioformatics is large.

    8. In my experience, the problem with comp sci people working with biologists is that many want to solve comp sci problems with "spherical cows", and the same with mathematicians, statisticians and physicists. Some 10% are exceptions. The exceptions are actual problem solvers instead of comp scientists, mathematicians, etc. They put the effort into understanding the problems which are seldom as simple as initially related by biologists.
      We used to talk about bridging scientists. Sometimes I think they are translators: people capable of speaking biology and math, or biology and algorithms or such.
      I've seen many examples of failed collaborations. It's easy to blame the mathematicians, comp. scientists, statisticians for their failure to learn enough to understand the actual problem that requires understanding (and they are sometimes to blame) but I have often seen biologists incapable of expressing the problem of interest in an intelligible way. There is sadly a great deal of incompetence to go around.

    9. OK, I can do that. In my experience, the problem with comp sic people working with scientists is that they want to solve a cool problem, but the cool problem turns out not to be the problem you needed to solve for biological research, because that problem wasn't cool enough. The people who have helped out my particular field through computer science have been almost exclusively biologists, e.g. Dave Swofford, John Huelsenbeck, and, dare I say it, Joe Felsenstein.

    10. @John, yes. Mathematicians tend to want to solve problems that are mathematically interesting instead of biologically interesting --- if they are at heart mathematicians. If they are at heart problem solvers they reach out into biology to better understand the problem(s) and mathematics is just a tools in their belt. These are sadly rare.
      Regarding comp.sci, I think there's a dichotomy of problem solvers and controllers. The controllers restrict things as much as possible, have a love of logical purity. The problem solvers are cowboys. The best are special hybrids.

  2. From the name, I would guess that biochemistry was once an interdisciplinary science among biologists and chemists. This was presumably so successful that it is now a discipline.

    My own area, climate science, is chock-full of interdisciplinary research. As one example, reconstructing past climates requires collaboration among meteorologists, oceanographers, geologists, biologists, chemists, historians, and archeologists.

    1. That sounds like an excellent example. Is there a different between collaborating and doing interdisciplinary work? Does a meteorologist have to learn much biology and history or do they just have to find experts in those fields to collaborate with?

    2. It's hard to collaborate without learning anything about the other field. Often the senior people provide their expertise (old dogs and all) while the graduate students doing the work gain expertise in multiple areas, and take courses from multiple departments. If something is important and has depth, an interdisciplinary area can become it's own sub-field. Ocean biogeochemistry is an example that developed out of the need to understand the ocean carbon cycle.

  3. As someone in the software development field my first comment is that there is no such thing as computer science. It's not science, it's not engineering, at best it's a craft with no theoretical underpinnings and no methodologies.

    There are a hodge-podge of cargo cult type practices such as structured programming, object oriented and agile development, scrums and a boat load of faddish practices glommed onto by desperate management willing to do anything short of sacrificing a chicken to gain some measure of control over the software development behemoth*.

    No doubt a good computer "science" program consists of a grounding in mathematics such as calculus, algebra, discrete mathematics, numerical analysis and so on but make no mistake, your typical software developer abandons this knowledge soon after graduation.

    I was lucky enough to take some undergraduate chemistry and biology courses, which while not making me "interdisciplinary" in any sense of the word did make me a sort of dumpster diver of science.

    * Not to be confused with Micheal Behe

    1. Yes. Programming/software development alone is not computer science. Things like time complexity analysis is what makes it scientific. I've seen self-taught programmers write a script that would be O(n^4) on millions of datapoints and wonder why it never finishes.

    2. Algorithms? No science there either? I guess depends on how science is defined. Never mind.

  4. Strikes me that practice at interdisciplinary work of any kind, including research, is an opportunity to improve one's skills at communicating effectively with those in other disciplines. This is necessary every day as part of my current job (and of most jobs in my profession that I can think of).

  5. I've looked into the IPhD in Mathematics program at UMKC. While it looks good on paper and I understand the purpose of the program, I can't justify the expense. The last time I checked I would have had to take 6 computer sci courses (most logical choice of what's available) in addition to the Math PhD workload. Honestly, it seems like a way the university can bleed more money from the grad students.

  6. There are big communications problems between computer scientists and biologists (I speak from my experience as former head of my university's Computational Molecular Biology program). Years ago experts in complexity of algorithms used to talk to biologists and have some problem suggested to them. They would analyze it a bit and then declare that "it is NP-complete!" Then they would leave. Leaving the biologist with the problem.

    There was also a tendency of computer scientists to ignore statistics and probability. They would try to find a deterministic algorithm that would go from Data to Answer without worrying about how uncertain that answer would be. My colleague Phil Green incorporated statistics and probabilities when he wrote his sequence assembly programs Phred and Phrap. That turned out to be a big win. But generally one needs to get a statistician to stand between the biologist and the computer scientist in order to have a really fruitful collaboration.

    1. Yes, there was certainly a lot of biological naivety among computer scientists tying to get into computational biology in the late 1990s-early-2000s when I was a postdoc.

      There would be computer scientists who thought sequence assembly was a trivial problem because it was just a matter of finding the shortest common superstring, with no understanding that sequencing error made that impossible. Statisticians understand that data has errors.

    2. It's not my intent to get into a turf war here. My reply is motivated, in part, by my respect for what UW has accomplished over the years in the area of computational biology.

      If a computer scientist tells you that a problem is NP-complete then that computer scientist has told you something very valuable. Namely, the problem as formulated is computationally intractable. There are two ways to proceed: 1) look for approximate solutions or 2) reformulate the problem into something that is computationally tractable. Failing that, it is appropriate for the (theoretical) computer scientist to walk away.

      There is a communications problem.

      On the computational side, there are computer scientists, mathematicians, probabilists and statisticians, each with their own sub-fields, interests and overlaps. These days, the term "machine learning" is an umbrella term for what I would otherwise call computational statistics. Machine learning researchers can be found in multiple academic departments, not the least of which is computer science.

      If you're interested (and have not already done so) talk with Ed Lazowska at UW. He's a computer systems guy, but a strong proponent of the notion that computer science is no longer a discipline that can assume error free behaviour.

    3. Mathematicians and physicists were particularly prone to make another mistake. They had mathematical skills far beyond those of most biologists. So they would waltz into biology, expecting to clean up by applying standard techniques from their field.

      And every time they tried to do this, they would discover that the method they proposed was already known. Because theoretically-inclined biologists had already been ransacking the mathematical literature, desperately looking for useful techniques.

      Branch-and-bound? Oh yes, that was used by Hendy and Penny, 1982. Hidden Markov Models in molecular biology? Gary Churchill, 1989. Belief propagation on Bayesian networks? That is equivalent to what statistical geneticists discovered in 1970 and called "peeling", and I think those phylogeny folks did some of this as well. EM algorithm? Wasn't that "pre-invented" by the statistical geneticist Cedric Smith in 1954 and called "gene counting" (and reviewed by him rather broadly in 1957)?

      Invading mathematicians and physicists have had a few successes, but far fewer than they expected.

    4. Bob Woodham: I have met Ed Lazowska, but more often communicate with Martin Tompa and Larry Ruzzo, from Ed's department. They are computer scientists (algorithmists) who have learned a lot of biology and know that statistics is important. They got started about the time that Richard Karp was lured to Seattle for a few years by promises (that could not in the end be redeemed). He was and is interested in biology, and he used to lecture the computer scientists on how important statistics was. And since this was Richard Karp Himself, they sat up and listened.

      And yes, knowing a problem is NP-nasty is important. But having the computer scientist act as if that is the end of the story is unhelpful. Some of them did act that way, at least in the 1980s and 1990s. I think that they have learned better now.

    5. Joe: Tompa, Ruzzo and Karp are great representatives of my discipline. You identify them as algorithmists who know that statistics is important. I just did a quick check of UW's CSE web site on the machine learning group

      "Computational Biology" is listed as one of the group's interests. Of the 11 "core faculty" listed under "people," 3 include some flavour of computational biology as among their interests.

      It might be worth checking out what these people are doing to see if collaboration possibilities with CS have improved (from your perspective).

    6. Joe: Sorry, my attempt to embed a URL seems not to have worked. Here it is as raw text:

    7. "If a computer scientist tells you that a problem is NP-complete then that computer scientist has told you something very valuable. Namely, the problem as formulated is computationally intractable. There are two ways to proceed: 1) look for approximate solutions or 2) reformulate the problem into something that is computationally tractable."

      The thing is that finding a problem to be NP hard tells you nothing about practical computation. Last week I was talking to a molecular biologist. He had a problem that was O(2^n) and considered not bothering with the problem, because it was obviously NP hard. But for his his datasets the maximum value of n was 20. It took us about half an hour to write a program that went trough an n=8 dataset in less than a second, which of course means that the n=20 case would take 2 hours at worst. Sure it won't help if n=80, but that's not a practical problem he was facing.
      Another example comes from a recent paper of mine. The general problem is O(2^m), but for n<24 or so it works in practice. For some special cases it is possible to split up your data and get an O(2^n) and an O(2^(m-n)) problem. You can sometimes do this multiple times. Doing this makes each calculation slower by factor of about 1000, but when you go from m=60 to 3 n=20 runs, you have a massive tempo gain. This doesn't work on all possible datasets, but it always works for the special case the method was first developed for and while we're looking at somewhat more general datasets, we haven't come across one where we couldn't do it.

    8. Simon: You make a valid point. If the size of the problem you're dealing with is small enough, then the asymptotic complexity as n increases is not the essential issue. My comments made the (unstated) assumption that computer science researchers aren't typically invited to collaborate with the biologists until the size of the problem has become a constraint on practical computation. Nor did I mean to imply that proclaiming a problem NP-complete is the end of the story. Indeed, I did suggest ways to proceed. I would characterize the subsequent examples you cited as reformulating the problem into something that is computationally tractable.

      Let me use Sudoku as my example. Sudoku is NP-complete. That's not a useful insight if you play standard 9x9 Sudoku. Trivial trial-and-error search algorithms can be written to solve 9x9 Sudoku puzzles in essentially real-time. Indeed, there is a proliferation of Sudoku solvers on the internet written in every computer language imaginable, more to illustrate idiomatic differences in the programming languages than to reveal insight into the nature of Sudoku. But, what about 16x16 Sudoku with hexadecimal "digits," 25x25 Sudoku with "digits" drawn from an alphabet of 25 characters, nxn Sudoku for arbitrarily large n? That's where the NP-completeness kicks in. But, that's not the end of the story. Suppose you had a large n Sudoku puzzle to solve (and invited me to consult). I would point you to a class of algorithms known as network consistency algorithms. Some of these algorithms are linear time. I would apply network consistency to your Sudoku puzzle. Network consistency sometimes is powerful enough to solve a particular puzzle. In general, it won't leaving you with a (reduced) problem that is still (theoretically) NP-complete. But, the reduced problem may now be computationally tractable (in practice), using the same trial-and-error search methods used for the 9x9 puzzle.

      I think it is a miss communication of the first order to think that computer scientists, even theoretical computer scientists, are not relevant when it comes to practical computation.

    9. I think it is a miss communication of the first order to think that computer scientists, even theoretical computer scientists, are not relevant when it comes to practical computation.

      I did not make that statement nor would I endorse it. But in the particular case of being told that a problem is NP, it's not always a helpful one (and in the first example my colleague had consulted with a computer scientist who had told him the problem was NP and left it at that). I've had very productive interactions with computer scientists and I've had very unproductive ones as well and my impression is that the productive ones occur when a problem is interesting to both parties.

  7. Crossing boundaries sometimes fosters,
    Escape from assumed paternosters.
    New ways of thought on information
    That squeezes through each generation.
    Blinkered biochemists come to see,
    Little room for redundancy.
    Give 'em time, they'll come to say,
    Is there junk in DNA?

  8. This comment has been removed by the author.

  9. Fascinating. Some consider biochemistry interdisciplinary as you need to understand chemistry well, much better than many biologists, and you need to understand biology as well. I would add you need to understand kinetics, which is really in addition to what many chemists or biologists learn. Personally, I happened to pick up a good deal of physics, information theory, and computer science because of the problems I was working on. The computer bit came easy to me. I tried to teach it to people who are smarter than me but they didn't think the right way and can't write an algorithm worth running.
    Interdisciplinary is important. For example, you might begin with a biological question, have an experimental tool rooted in physics, that involves some chemistry and a signals that wind up in data that needs interpretation. Really understanding the physics of a detector helps you identify "peaks" from a mass spectrometer, understanding gas-phase ion chemistry and the physics of the machine helps understand how to make a better machine and to optimally tune the one you have. And puzzlingly through the data requires understanding all the types and sources of noise. And only knowing all that allows one to build interpretive algorithms worth a damn. It's quite interdisciplinary. You don't always have to be expert at all of it, but skilled enough to converse effectively with experts, especially being able to ask the right questions.
    One thing that happens when you take the interdisciplinary route is that you get a great deal of grief from some who think that doing so is a bad thing. Maybe that's from seeing the approach fail so badly at times. There's some horrible bioinformatics out there, including where both the biology and the computer science is poor. Just mixing multiple disciplines isn't enough, you have to do it well.
    One more example, the breakthroughs in looking at single neurons firing required that biologists become experts in electronics and data processing.

  10. I understand they think cross breeding enhances the herd. They think that learning other peoples studies will sharpen one up with new imagination etc.
    It might.
    Largely not I think.
    Its all about affecting intelligence curves.

  11. Is this whole "interdisciplinary" thing just a fad? Do you know anyone whose main area of investigation spans two distinct disciplines?

    To some degree interdisciplinary is a buzzword attached to research and that makes it a fad. The key question is whether there are interesting research questions that require background knowledge in multiple disciplines. To some degree interdisciplinary research is marred by the fact that often there aren't and the selling point is that the research is interdisciplinary. There are additional issues which arise from how science is done in practice, which I'll discuss below.

    I know people doing interdisciplinary research. Me for a start. Deriving divergence data estimates from relaxed molecular clock models requires aligned sequences, a molecular phylogeny derived from these sequences, calibration dates taken from fossils, which require the fossils to be placed phylogenetically and assigned absolute ages.
    Now, if you combine these you end up with the following things to do:
    - Collecting samples
    - Sequencing
    - Alignment
    - Molecular phylogeny
    - Morphological phylogeny to assign fossils
    - Searching for relevant fossils
    - Assigning ages to the fossils and placing them in the phylogeny
    - Running the relaxed clock model
    This process requires morphologists, molecular biologists, biomathematicians and paleontologists for the various steps. My advisory board has one of each...
    For quite some time the review process for this was inadequate, because journals would not look for reviewers covering all disciplines. For a review on how bad this can go, there's the excellent Graur and Martin paper (2004, "Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision", Trends in Genetics, 20:80:86). An interdisciplinary paper should be reviewed by reviewers from all relevant disciplines. If there are methods taken from some discipline, these need to be checked by people who understand them...
    My main focus is on taking paleontological data that so far has not been used to inform this process to improve it. I.e. I'm looking at molecular questions and look far ways that fossil data can be employed to get better answers.

    1. SG
      actually your inter exchange is geology paradigms and THEN biology data.
      I say these two disciplines are not compatible to drawing conclusions aimed at one of the disciplines.
      its geologists that you are relying on though you don't list them.
      Indeed geology has no place in biology research. its an absurdity if a claim of science is made in these things.

  12. In considering this topic, we should not forget that Francis Crick, co-discoverer of DNA, was a physicist by training.