Tuesday, January 11, 2011

Secret Alien Messages in Your Genome

Today is the first day of my course on molecular evolution and I want the students to experience the give-and-take of scientific—and not so scientific—debate in the blogosphere.

Their first assignment is to read the following quotation from an article by Paul Davies and answer the question that follows.

Paul Davies is a professor at Arizona State University. He was trained as a physicist and he lists his interests as cosmology, quantum field theory, and astrobiology. The quotation is from an article he wrote last April in the Wall Street Journal [Is Anybody Out There?: After 50 years, astronomers haven't found any signs of intelligent life beyond Earth. They could be looking in the wrong places.]
Another physical object with enormous longevity is DNA. Our bodies contain some genes that have remained little changed in 100 million years. An alien expedition to Earth might have used biotechnology to assist with mineral processing, agriculture or environmental projects. If they modified the genomes of some terrestrial organisms for this purpose, or created their own micro-organisms from scratch, the legacy of this tampering might endure to this day, hidden in the biological record.

Which leads to an even more radical proposal. Life on Earth stores genetic information in DNA. A lot of DNA seems to be junk, however. If aliens, or their robotic surrogates, long ago wanted to leave us a message, they need not have used radio waves. They could have uploaded the data into the junk DNA of terrestrial organisms. It would be the modern equivalent of a message in a bottle, with the message being encoded digitally in nucleic acid and the bottle being a living, replicating cell. (It is possible—scientists today have successfully implanted messages of as many as 100 words into the genome of bacteria.) A systematic search for gerrymandered genomes would be relatively cheap and simple. Incredibly, a handful of (unsuccessful) computer searches have already been made for the tell-tale signs of an alien greeting.
Here's the question. Assume that the aliens inserted a 1000 bp message in the same place in the genomes of every member of our ancestral population from five million years ago. At that point every organism in the species had exactly the same message in a region of junk DNA.

If you were to sequence that very same region of your own genome what would the message look like today? Would it be different from the original message of five million years ago? Is there a way of reconstructing the original message and interpreting it?

Comments will be held until tomorrow evening in order to give everyone a fair shot at coming up with an answer.

Photo Credit: Lieutenant Ellen Ripley communicates with aliens.


  1. Theoretically, it should be possible. I know that a future mutation can build upon previously inactive genes, so it might be possible for some of this junk DNA to have changed, but I don't think so.

    It seems at least somewhat plausible to me.

  2. Since I have no measurable knowledge of biochemistry I won't embarrass myself by delving into how fast 2 kilobits would deteriorate for any given rate and which statistical method should be used to salvage information from a population sample. I'll probably make a severe methodological mistake early on by sheer ignorance of elementary biochemical facts.

    So instead I will just say that if I had the job of encoding some message in non-coding DNA, I would try for erosion resistance and message distinctiveness.

    Therefore, to detect anomalies I would run some SETI tests on junk DNA regions with significant preservation (and only those of unknown function).

    Then again I'm just clueless, so I'm looking forward to informed comments.

  3. fair shot at coming up with an answer

    You mean a nanosecond is not enough?

  4. Human generation time of 20 years (from http://www.genetics.org/cgi/content/full/156/1/297).

    Stated interval is 5x10^6 years which is 2.5x10^5 generations.

    The 1000 base pair (bp) message is assumed to be in a non coding area of the genome ("junk" DNA) so errors in replication are not selected against.

    Each symbol/bp in the message is either A-T or G-C so we have a binary/1 bit encoding of each base pair.

    The human mutation rate is from 1.3x10^-8 to 2.7x10^-8 (from http://www.genetics.org/cgi/content/full/156/1/297).

    Use a human mutation rate 2.5x10^-8 (from http://www.genetics.org/cgi/content/full/156/1/297).

    The human diploid genome contains 7x10^9 bp (MARSHALL 1999 Down) and thus ~175 new mutations per generation.

    Chance of a mutation in any base pair of the message over 5x10^6 years is .00625 (6%).

    We can expect 6.25 bp to be modified in the message over 5x10^6 years.

    Use of forward error correction (FEC) in the message would allow the message to be reconstructed with a reallocation of some of the bp of the message to error detection and correction purposes.

    Selection of the FEC code would depend on whether errors during replication of the message occurred in bursts or were random single bp errors.

    In data storage and transmission applications ~12% of the data stream is used for FEC so there would be ~120 bp used for FEC purposes and ~880 bp available for the actual message.

  5. The real question is, what kind of message could possibly be encoded in DNA by Aliens? The Aliens would first have to know how we would name nucleotides in the future. (I.E. that we call Adenosine A, and thus give the 'message' a much needed vowel) Then, as if that isn't unrealistic enough, they would have to know what languages we speak and presumably how to give short form versions of the words in these languages (they don't have the entire alphabet to their disposal).

    To answer the original question:
    If the message was in a piece of junk DNA, and hence it provides no inheritable advantage, it should be mutated over time so that it no longer resembles the original message. However, if you know the mutation rate and you know where the insertion took place, you may be able to use bioinformatics to line up the sequences of a the sample population and decipher the message based on the mutation rate and probabilities.

  6. 'Junk' DNA would have mutated over the 250k or so generations that passed since the aliens' code was inserted in our genome. As a result, there would be significant changes (point mutations, insertions, deletions, duplication etc.) from the original sequence.

    It would be difficult to reconstruct the original message, even if there were thousands of available individual sequences of this genomic locus. Assume, for instance, there was a bottleneck some 100k years ago: one population of human ancestors survived. That population acquired 4.9m years' worth of mutations at our locus of interest, and we have since obtained 100k years' worth of mutation. However, the current genomic record cannot go back further that 100k years, since that is the oldest common ancestor. We might be able to determine the locus sequences at that point in time, but we could not look back further and determine, with certainty, what sort of mutations occurred before.

  7. You didn't specify whether the sequence is functional. One could argue that it would be ridiculously implausible for the aliens to implant a functional, highly conserved, sequence, and also pointless because the source of such a sequence could not be distinguished from ordinary evolution (I'm guessing this is the answer you have in mind). Or one could argue that, given the intent of the aliens to leave a lasting message, they wouldn't bother leaving anything but a functional, highly conserved, sequence. Proponents of intelligent design would presumably argue that this is not only possible, but that the existence of conserved sequences constitutes evidence for exactly this sort of tampering :-)

  8. [Cough] That picture is not of "Lieutenant Ellen Ripley". It's from Aliens, when she was a "civilian consultant".

    Just sayin' we have to keep our science fiction straight ... right Dr. Davies?

  9. Assuming that the region the 1000 bp message is in is actual junk DNA and thus under no constraints by natural selection then I would think that the message today that I have would look very different from the original message five million years ago. It is possible for all 1000 base pairs to have mutated and the mutation would be passed on from parent to child over successive generations. Of course it is very unlikely that everyone would have the same mutations and thus it would be possible to reconstruct the original message by sequencing the genome of all human populations and then by looking at the relative frequencies of each base pair in each human population (as defined by geographic region) one would be able to deduce that the highest frequency base across the largest number of human populations would most likely be the one which was in the original message. If this is done for all bases in the 1000 bp message then a cost estimate to the original message could be made.

    This of course assumes that the far more technologically advanced aliens would just leave the message in a piece of junk DNA without any means of self-preservation. I find it morel likely that the aliens would have preserved the message by making it critical to the survival of the organism by making any mutation to the message lethal to the organism. There are many examples of this already modern medicine and I think that instead of looking in junk DNA people should be looking for messages in the genes which encode functions that are essential to life instead. Or possibly genes which enable higher cognitive processes which would be necessary if the aliens meant for the organisms they planted the message in to see it.

  10. The message would certainly not look the same. Some changes would likely have been fixed by drift over the 5 million years, and a number of variants would exist in the population as well. With the statistical tools of population genetics, we may be able to sort through the existing variation of the message, but without the ability to compare the sequence in our genomes with other species, we would probably not be able to reconstruct the true ancestral sequence. Whether or not we could interpret the message would depend on how many mutations had actually been fixed in our lineage.

  11. The message would be different, because neutral mutations could have been incorporated into the 1kbp region at any point in time. However, reconstructing the alien DNA would be a hard task. Although one might be able to store the regions of every single human into a database and sequence it to find conserved regions, the conserved regions may have descended from an earlier mutation as well (assuming alien DNA is subject to mutations and they had not developed machinery to ensure that region remains intact throughout history). Reconstructing this region, therefore, is similar to a phylogenetic tree - although we infer the evolutionary relationships based on similarities and differences, there is also a chance that all of our inferences are wrong.

  12. Steve, you also need to consider how many rounds of cellular replication occur from the initial zygote to the cell that ultimately undergoes meiosis in each individual human generation. If the pre-meiotic mass were 1e6 cells, that would be 20 more rounds of replication.

  13. I'm led to understand that the sequence in question decodes to:


  14. @Anonymous you also need to consider how many rounds of cellular replication occur from the initial zygote to the cell that ultimately undergoes meiosis in each individual human generation.

    It should be painfully obvious that I have no biochemistry/molecular genetic background.

    My take on the MARSHALL 1999 paper, which is the source of the 175 mutations per generation in a 7x10^9 bp genome thus giving a mutation rate of 2.5x10-8, this is an inter-generational mutation rate and should account for all the rounds of cellular replication between generations.

  15. I estimate that there are 129 new mutations in the human genome each generation. See: Mutation Rates.

    At this rate, given that our lineage has been through several bottlenecks, the alien message would be corrupted and probably impossible to read.

  16. @Larry At this rate, given that our lineage has been through several bottlenecks, the alien message would be corrupted and probably impossible to read.

    Strictly hypothetical, if the aliens had the wherewithal to implant a message in our genome 5 mya presumably they had some idea of mutation rates and took steps to surmount this problem, otherwise why bother in the first place ?

  17. @LM: Could you comment on the (apparently sourced) claim on Wikipedia of "highly conserved" regions of junk DNA?

    I don't know how junk DNA could achieve conservation. If those replicators were a program, ie a genetic heuristic, a quick-and-dirty solution would be to surround every encoding of a letter in the message alphabet with a lethal letter. So I would have error detection with scrapping.

    A sophisticated approach would use error correction by copying any mutated letter back to the original, plus layered error detection to catch higher level errors. The latter would either be lethal or activate an overwrite of the faulty portion by the mirror (redundancy).

    Also note that the message has to be learnable rather than arbitrary, otherwise we wouldn't understand it; maybe even be unable to distinguish it from a random string.

    Therefore there wouldn't be a letter "A" in the message. What names you assign to letters doesn't matter. Also putting in "Sorry for the inconvenience" would require maybe an entire book preceding the message that explains the language. And by book I'm not referring to literature, not even Adams's.

    Like Carl Sagan said: If you want to make an apple pie from scratch, you must first invent the universe.

  18. I'm an engineer, not a biologist, so I'm pretty ignorant when it comes to this. I tried going through similar calculations to Steve Oberski, and don't get that high of a chance that the message could be corrupted. Could someone point out where I'm going wrong?

    I started at the same point as Steve - 250,000 generations. Given 129 mutations per generations yields 32,250,000 mutations. Assuming a genome size of 3,000,000,000,000 base pairs, the odds of any given base pair having mutated over that time is 1.075e-5. Multiplying that times the 1000 base pairs for the message gives a less than 1 percent chance that the message would be corrupted.

    That just doesn't seem right. Where's my mistake?

    Hmm. I just followed Larry Moran's link to mutation rates. I see that my mistake is that in assuming only 129 mutations per generation, when in fact it's 129 mutations per individual. But then, how would you go about calculating how corrupted the message would be? It seems to me that following any particular lineage would still give a chance of mutation similar to what I calculated, so I still must be missing something.

  19. Oops. One small correction to my previous comment - the chance of corruption to the 1000 bp message is just over 1% by my (probably erroneous) calculation.

  20. Well, lets see: taking your estimate of 129 mutations per offspring, but randomly spread out over a 3 billion bp genome, yields an estimate of the probability that a given offspring will have a mutation in a particular 1000 bp region of 0.000043. That means the probability a mutation won’t be in this 1000 bp message to be 0.999957. Further, assuming 5 million years and 20 year generation time, that’s 250,000 generations. 0.999957 to the 250,000 power is .0000214. So it is very unlikely that this region won’t suffer at least one (neutral) mutation in 5 million years.

    I haven’t thought about it too deeply, but maybe you need to divide the probability of a mutation in half if its autosomal dna (but maybe not: you need another for this one individual to mate with). And I think the population size is not really relevant (cancels out when factoring population mutation rate vs fixation rate).

    And once fixed in the population, no hope (other than access to ancient dna sequences) to recover the original message (assuming the message was coded such that errors are not identifiable (unlike an erlor in an english sentence)

  21. DR asks,

    Could you comment on the (apparently sourced) claim on Wikipedia of "highly conserved" regions of junk DNA? [http://bit.ly/hcBh83]

    The Wikipedia article discusses functional elements in noncoding DNA. It lists a bunch of them. They are well-known, see: Genomes and Junk DNA for a partial list.

    Some functional regions of noncoding DNA may not have been identified. One way of detecting them is to compare sequences in different genomes and look for conservation of regions that don't have a known function. Such regions have been detected. They're probably not junk.

    The flip side of this coin is regions that are not conserved. Those region are likely to be junk (with some exceptions). Most of the mammalian genome outside of transposons falls into this category. That's part of the evidence for non-functionality (i.e., junk).

  22. steve oberski says,

    Strictly hypothetical, if the aliens had the wherewithal to implant a message in our genome 5 mya presumably they had some idea of mutation rates and took steps to surmount this problem, otherwise why bother in the first place ?

    I think you're missing the point. The point is that an astronomer, Paul Davies, claims that aliens could leave a message in junk DNA. He says that DNA sequences are "little changed" in 100 million years.

    I'm simply pointing out that his knowledge of genomes and evolution is about as good as my knowledge of quantum field theory.

    If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

  23. Larry:

    If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

    A friend of mine suggested the moon for something like this. It's geologically inactive, and a signal from there sent in response to radio signals from us would certainly spur technological advance to get us there.

    The problem with setting it in DNA is having to know in advance which line is likely to develop to the point it can gaze at its own deoxyribonucleic navel, and, of course, ensuring the integrity of that message for the indeterminate length of time that would take. Like you, I don't think it's a likely choice. Just an intriguing suggestion.

  24. Nobody mentioned endogenous retroviruses so far?

    They left their messages in the human genome, some of them about five million years ago. As far as I know, you can still recognize the genes and LTRs, but not every single letter.

  25. @Larry I think you're missing the point.

    I found the problem intriguing as an exercise in the reliable transmission of data through a channel that I don't deal with in my day job. This is a classic simplex data transmission scenario where there is no reverse channel that allows the receiver to signal the sender. If nothing else it was a chance to learn a bit about biochemistry and molecular genetics.

    @Larry If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

    Which reminds me, 2001: A Space Odyssey is being shown at the TIFF Lightbox theater in it's original 70mm glory.

  26. @Larry If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

    When we finally get round the minor technical hurdles of traversing distances that start at 100 million trips to the moon and back, we will, of course, need to leave some kind of calling card if we only find unicellular slime. Rude not to! The message will read: "Called but you were out. Here's a big slab, plus we made some pyramids, and perched a few big stones on top of each other. Got bored waiting for you to evolve. The reading frame, by the way, is 4 bits."

  27. Certainly, a billion-year-old alien message encoded into our DNA would be easy to crack, too. Surely it couldn't be more difficult than figuring out the workings of a mathematics based on the square root of a 4th quadrant rutabega?

  28. There are messages in every single painting, I have seen images that have been implanted in every lind of picture and painting that i have clicked on,Including snaps took from google earth.Most of the images are of different kinds of alien looking creatures and Also I have found what I believe 100% is gods word or words letters etc. These words/letters are in the shape of ancient creatures such as the Giant Sloth,Wood pecker,Rat etc
    these words/letters you can find anywhere and everywhere, I am no linguist so I do not know what letters or words they represent but I would say there is over 30 different ones, Thats how I believe Jesus knew the name of Judas he read his name from his face.

  29. If you're going to leave a message in DNA for hundreds of thousands of generations to decode, why not just preserve samples of the DNA in something gooey that hardens (like amber) without worrying about the mutation rate?