Sandwalk: Revisiting the genetic load argument with Dan Graur

Friday, July 14, 2017

Revisiting the genetic load argument with Dan Graur

The genetic load argument is one of the oldest arguments for junk DNA and it's one of the most powerful arguments that most of our genome must be junk. The concept dates back to J.B.S. Haldane in the late 1930s but the modern argument traditionally begins with Hermann Muller's classic paper from 1950. It has been extended and refined by him and many others since then (Muller, 1950; Muller, 1966).

Several prominent scientists have used the genetic load data to argue that most of our genome must be junk (King and Jukes, 1969; Ohta and Kimura, 1971; Ohno, 1972). Ohno concluded in in 1972 that ...

... all in all, it appears that calculations made by Muller, Kimora and others are not far off the mark in that at least 90% of our genome is 'junk' or 'garbage' of various sorts.

It's important to keep in mind that the genetic load argument is one of the Five Things You Should Know if You Want to Participate in the Junk DNA Debate. It's also very important to understand that this is positive evidence for junk DNA based on fundamental population genetics. It refutes the popular view that the idea of junk DNA is just based on not knowing all the functions of our genome. There's delicious irony in being accused of argumentum ad ignorantiam by those who are ignorant.

I've discussed gentic load several times on this blog (e.g. Genetic Load, Neutral Theory, and Junk DNA) but a recent paper by Dan Graur provides a good opportunity to explain it once more. The basic idea of Genetic Load is that a population can only tolerate a finite number of deleterious mutations before going extinct. The theory is sound but many of the variables are not known with precision.

Let's see how Dan handles them in his paper (Graur, 2017). In order to calculate the genetic load (or mutation load), we need to know the size of the genome, the mutation rate, and the percentage of mutations that are deleterious. Dan Graur assumes that the diploid genome size is 6.114 × 10⁹ bp based on accurate cytology measurements from 2010. I think the DNA sequence data is more accurate so I would use 6.4 Gb. The difference isn't important.

There's a huge literature on mutation rates in humans. We don't know the exact value because there's a fair bit of controversy in the scientific literature. The values range from about 70 new mutations per generation to about 150 [see: Human mutation rates - what's the right number?]. Graur uses a range of mutation rates covering these values. He expresses them as mutations per site per generation which translates to values from 1.0 × 10^-8 to 2.5 × 10^-8. As we shall see, he calculates the genetic load for a range of mutation rates order to get an upper limit to the amount of functional DNA in our genome.

The most difficult part of these calculations is estimating the percentage of mutations that are beneficial, neutral, and deleterious. Population geneticists have rightly assumed that the number of beneficial (selected) mutations is insignificant so they concentrate on the number of deleterious mutations. The estimates range from about 4% of the total mutations to about 40% of the total based on the analysis of mutations in coding regions.

Most scientists assume that the correct value is about 10% of the total. What this means is that if there are 100 new mutations in every newborn there will be about 10 deleterious mutations if the entire genome is functional. If only 10% is functional then there will be only 1 deleterious mutation per generation. A mutation load of about one deleterious mutation per generation is the limit that a population can tolerate. Graur assumes 0.99. Others have proposed that the mutation load could be higher (Lynch, 2010; Agrawal and Whitlock, 2012) but it's unlikely to be more than 1.5. The difference isn't important.

Graur calculates a range of deleterious mutation rates (μ_del) based on multiplying the percentage of deleterious mutations times the total number of mutations.

The other variable is the replacement level fertility of humans (F). Think of it this way: if every child has a significant number of deleterious mutations then the population can still survive if every couple has a huge number of children. Statistically, some of them will have fewer deleterious mutations and those ones will survive. If F = 50 then in order to get one survivor each person needs to have 50 children (or each couple needs to have 100 children).

Historical data suggests that the range of values goes from 1.05 to 1.75 per person (2.1 to 3.5 children per couple). Graur makes the reasonable assumption that the maximum sustainable replacement level fertility rate is 1.8 per person in human populations over the past million years or so.

The important part of the Graur paper is the table he constructs where he estimates the number of deleterious mutations by combining the mutation rate and the percentage of deleterious mutations on the y-axis and the fraction of the genome that may be functional on the x-axis. At the intersection of each value he calculates the minimum replacement level fertility values required to sustain the population.

Let's look at the first line in this table. The deleterious mutation rate is calculated using the lowest possible mutation rate and the smallest percentage of deleterious mutations (4%). Under these conditions, the human population could survive with a fertility value of 1.8 as long as less than 25% of the genome is functional (i.e. 75% junk) (red circle). That's the UPPER LIMIT on the functional fraction of the human genome.

But that limit is quite unreasonable. It's more reasonable to assume about 100 new mutations per generation with about 10% deleterious. Using these assumptions, only 10% of the genome could be functional with a fertility value of 1.8 (green circle).

Whatever the exact percentage of junk DNA it's clear that the available data and population genetics point to a genome that's mostly junk DNA. If you want to argue for more functionality then you have to refute this data.

Note: Strictly speaking, the genetic load argument only applies to sequence-specific DNA where mutations have a direct effect on function. Some DNA serves as necessary spacers between functional sequences and this DNA will only be affected by deletion mutations. From what we know right now, this is a small percentage of the genome. However, there are bulk DNA hypotheses that attribute non-sequence specific function to most of the genome and if they are correct the genetic load argument carries no weight. So far, there is no good evidence that these bulk DNA hypotheses are valid and most objections to junk DNA are based on sequence-specific functions.

Agrawal, A. F., and Whitlock, M. C. (2012) Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annual Review of Ecology, Evolution, and Systematics, 43:115-135. [doi: 10.1146/annurev-ecolsys-110411-160257]

Graur, D. (2017) An upper limit on the functional fraction of the human genome. Genome Biol Evol evx121 [doi: 10.1093/gbe/evx121]

King, J.L., and Jukes, T.H. (1969) Non-darwinian evolution. Science, 164:788-798. [PDF]

Lynch, M. (2010) Rate, molecular spectrum, and consequences of human mutation. Proceedings of the National Academy of Sciences, 107:961-968. [doi: 10.1073/pnas.0912629107]

Muller, H.J. (1950) Our load of mutations. American journal of human genetics, 2:111-175. [PDF]

Muller, H.J. (1966) The gene material as the initiator and the organizing basis of life. American Naturalist, 100:493-517. [PDF]

Ohno, S. (1972) An argument for the genetic simplicity of man and other mammals. Journal of Human Evolution, 1(6), 651-662. doi: [doi: 10.1016/0047-2484(72)90011-5]

Ohta, T., and Kimura, M. (1971) Functional organization of genetic material as a product of molecular evolution. Nature, 233:118-119. [PDF]

84 comments:

AnonymousFriday, July 14, 2017 12:42:00 PM
Although this https://www.quantamagazine.org/missing-mutations-suggest-a-reason-for-sex-20170713/ doesn't use the phrase "genetic load," it seems to be addressing the same topic to I think greatly different effect. Any comment?
ReplyDelete
Replies
MarkkFriday, July 14, 2017 2:11:00 PM
To me this shows that the word junk is bad in describing these part of the genome. It should be called the buffer area or something like that.
ReplyDelete
Replies
EricFriday, July 14, 2017 5:37:00 PM
If I am understanding the data correctly, what it actually points to is the maximum size of human functional genome. Humans could have a genome that is nearly >90% functional if the genome was about 600 kbp.

On the flip side, if the size of the human functional genome was larger, say 1.2 Mbp, then this would tend to select for lower mutation rates. There might be a balancing act between the size of the functional genome and the fidelity of DNA replication during meiosis.
ReplyDelete
Replies
daedalus2uSaturday, July 15, 2017 11:26:00 AM
Humans have a haploid life stage, which may be important in removing deleterious mutations. Data on other eukaryotes seems to indicate that selection in the haploid life stage does have F1 and longer effects.

Presumably these effects would be important in humans; in both haploid gametes.

http://www.pnas.org/content/early/2017/07/10/1705601114.short

If I may speculate, senescence in haploid gametes may be a low-cost “feature” that culls deleterious mutations from the F1 generation. Senescence in the adult (observed in essentially all organisms with a haploid life stage) may be an unavoidable consequence of senescence of haploid gametes.
ReplyDelete
Replies
Faizal AliSaturday, July 15, 2017 1:57:00 PM
Why are people so averse to the idea of junk DNA? Other than creationists, of course. I know why they don't like the idea.
ReplyDelete
Replies
Bill ColeSunday, July 16, 2017 12:42:00 PM
Transcriptional landscape of repetitive elements in normal and cancer human cells

Article · July 2014 with 46 Reads
DOI: 10.1186/1471-2164-15-583 · Source: PubMed

Cancer cells are acting similar to how cells behave during embryo development. This paper show increased activity of repeat sequences in cancer cells. The DNA that appears to be junk when measured in adult cells maybe very active during embryo development. If DNA was degrading due to generational mutation why would their be conservation of repetitive sequences?
ReplyDelete
Replies
Bill ColeSunday, July 16, 2017 5:54:00 PM
"Cancer cell lines display increased RNA Polymerase II binding to retrotransposons than cell lines derived from normal tissue. Consistent with increased transcriptional activity of retrotransposons in cancer cells we found significantly higher levels of L1 retrotransposon RNA expression in prostate tumors compared to normal-matched controls.

Conclusions
Our results support increased transcription of retrotransposons in transformed cells, which may explain the somatic retrotransposition events recently reported in several types of cancers."

Increased expression levels shows a changed based on the cell being in cancer or rapid cell division mode. This is usually accompanied by activation of embryonic pathways. The question is if increased expression levels is indicative of function during cell division?
ReplyDelete
Replies
Bill ColeSunday, July 16, 2017 9:04:00 PM
This paper shows increase activity during embryo development. Again embryo pathways are activated in cancer cells.

"RESEARCH ARTICLE OPEN ACCESS
Exploratory bioinformatics investigation reveals importance of “junk” DNA in early embryo development
Steven Xijin GeEmail author
BMC Genomics201718:200
DOI: 10.1186/s12864-017-3566-0© The Author(s). 2017
Received: 13 October 2016Accepted: 7 February 2017Published: 23 February 2017
Abstract

Background
Instead of testing predefined hypotheses, the goal of exploratory data analysis (EDA) is to find what data can tell us. Following this strategy, we re-analyzed a large body of genomic data to study the complex gene regulation in mouse pre-implantation development (PD).

Results
Starting with a single-cell RNA-seq dataset consisting of 259 mouse embryonic cells derived from zygote to blastocyst stages, we reconstructed the temporal and spatial gene expression pattern during PD. The dynamics of gene expression can be partially explained by the enrichment of transposable elements in gene promoters and the similarity of expression profiles with those of corresponding transposons. Long Terminal Repeats (LTRs) are associated with transient, strong induction of many nearby genes at the 2-4 cell stages, probably by providing binding sites for Obox and other homeobox factors. B1 and B2 SINEs (Short Interspersed Nuclear Elements) are correlated with the upregulation of thousands of nearby genes during zygotic genome activation. Such enhancer-like effects are also found for human Alu and bovine tRNA SINEs. SINEs also seem to be predictive of gene expression in embryonic stem cells (ESCs), raising the possibility that they may also be involved in regulating pluripotency. We also identified many potential transcription factors underlying PD and discussed the evolutionary necessity of transposons in enhancing genetic diversity, especially for species with longer generation time.

Conclusions
Together with other recent studies, our results provide further evidence that many transposable elements may play a role in establishing the expression landscape in early embryos. It also demonstrates that exploratory bioinformatics investigation can pinpoint developmental pathways for further study, and serve as a strategy to generate novel insights from big genomic data.

Keywords

Single-cell RNA-seq Exploratory data analysis Pre-implantation development Early embryogenesis Transposons Repetitive DNA"
ReplyDelete
Replies
AceofspadesMonday, July 17, 2017 6:47:00 AM
Larry writes:

> Let's look at the first line in this table. The deleterious mutation rate is calculated using the lowest possible mutation rate and the smallest percentage of deleterious mutations (4%).

If μdel in this row is calculated based on the smallest percentage of deleterious mutations (4%) then aren't we already assuming for this row that the functional fraction of the genome is 4%? Why do we then go on to compare this value of μdel to other functional fractions of the genome?

That is the one thing thing I don't understand about this paper - it seems that the variable μdel already incorporates the functional fraction of the genome into it - yet in the table, it is plotted against the functional fraction of the genome.
ReplyDelete
Replies
judmarcMonday, July 17, 2017 1:14:00 PM
[T]here are bulk DNA hypotheses that attribute non-sequence specific function to most of the genome and if they are correct the genetic load argument carries no weight. So far, there is no good evidence that these bulk DNA hypotheses are valid and most objections to junk DNA are based on sequence-specific functions.

Is there a way to tell the difference between "bulk" DNA that serves a function, albeit non-sequence-specific, and junk that is there by accident (and because effective population sizes aren't large enough) but serves at least to give mutations a somewhat safer place to go?

I can think of widely varying genome size in reasonably closely related species. Anything else?
ReplyDelete
Replies
David Monday, July 17, 2017 2:32:00 PM
I blame ENCODE for offering up such provocative findings that are contingent upon the interpretability of such an ambiguous term. And, while I generally agree with Grauer's thesis, I think in the end he (like ENCODE) is arguing more about semantics rather than biology. His points are good but their significance will be lost because the words being used are contextually defined and re-defined from study-to-study.

In common parlance, I don't think that even SJ Gould would deny that spandrels (literal or biologic) served *some* "function" at *some* level (i.e., you cannot have an functional arch without the spandrels). However, I believe Gould's larger caution is very much in play in the current debate--namely that by trying to ascribe "function" to genomic regions in order to then *deduce* function is destined to result in an untestable tautology rather than a testable hypothesis.
ReplyDelete
Replies
unknowingTuesday, July 18, 2017 9:15:00 PM
On the contrary, while authors may quibble over the specifics over the exact criteria used to categorize sequences there is a clear distinction between function as dependent on sequence specificity and the generic "functionality" of encode.

Such a broad definition of functionality renders the term meaningless.

Also junk was chosen as a descriptor precisely because the lay meaning is an apt metaphor. Junk DNA has the potential to see use someday, but at present sits idle taking up genomic space.

ReplyDelete
Replies
UnknownThursday, July 27, 2017 11:53:00 AM
What Larry doesn't seem to understand is that much of the genome consists of functionally redundant copies of genes. As such, a deleterious mutation can be compensated for by another intact copy. So the genetic load can be tolerated by the buffering effect of gene duplicates.
ReplyDelete
Replies
UnknownThursday, July 27, 2017 12:05:00 PM
One more thing, Larry. Many harmful mutations have RECESSIVE phenotypes and their effects are masked by the other allele.
ReplyDelete
Replies
AceofspadesMonday, July 31, 2017 9:46:00 AM
An experiment was conducted earlier this year to see if researchers could come up with some functional denovo genes from randomised DNA sequences. The result was that they were able to create hundreds of functional genes from random strings of DNA of length 150bp (this should put to bed any argument from ID advocates which states that new, useful genes cannot arise from junk DNA)

In an article about the study, one of the researchers recounts:

"During my early months in the Tautz lab, while still a Master’s student, I contemplated the possibility of doing an experiment that could support de novo evolution as a general process, and so I came up with a thought experiment. I would insert random sequences in living cells, together with enough regulatory machinery to make sure they would be transcribed and translated by the host. Then, I would wait until any of those would mutate enough to “acquire a function.” It occurred to me that starting with a sufficiently large pool of random sequences would reduce the waiting time, because some would exhibit some biochemical activity upon their introduction."

https://natureecoevocommunity.nature.com/posts/16396-exploring-random-sequence-space-in-the-name-of-de-novo-genes

The results were surprising - 25% of the random sequences they generated were beneficial to the bacteria that received them and 52% inhibited growth.

Here is the paper: https://www.nature.com/articles/s41559-017-0127

My question to Prof. Moran then is: If 25% of the transcripts from these random bits of DNA were able to promote growth in E-coli then wouldn't that imply that the same might be possible for us?

Maybe 25% of our own transcripts which are not evolutionarily conserved are also be conferring some small transient advantage? It might not be that this advantage is strong enough to be selected for and that is why it is transient - a few tens of thousands of generations from now and perhaps most will be mutated out of existence but by then other new transcripts that are somewhat beneficial will have popped up in the mean time.
ReplyDelete
Replies
EliSaturday, November 16, 2019 5:33:00 PM
You and Graur might be interested in this recent preprint (https://www.biorxiv.org/content/biorxiv/early/2019/09/30/785865.1.full.pdf)
ReplyDelete
Replies
JoãoMonday, April 13, 2020 12:24:00 AM
Larry, do you know if Graur responded or commented on the following paper by Galeota-Sprung et al., 2019?

Mutational Load and the Functional Fraction of the Human Genome

"We find that the functional fraction is not very likely to be limited substantially by mutational load, and that any such limit, if it exists, depends strongly on the selection coefficients of new deleterious mutations."

https://academic.oup.com/gbe/article/12/4/273/5762616
ReplyDelete
Replies
JoãoMonday, April 13, 2020 4:35:00 PM
Larry, here is what Ben Garleota-Sprung commented on their paper:

https://twitter.com/SprungBen/status/1249790349725368321

ReplyDelete
Replies
JoãoMonday, June 05, 2023 5:55:00 PM
Michaeljf said:
"Joao (and Larry) - I am working my way through this Galeota-Sprung et al and comparing to Graur (2017)."

Thank you for your comment, Michael! I revived this comments as I read "What's in your genome". I think I brought Larry's attention to Galeota-Sprung paper, and I'm happy this is somewhat in the book. Larry wrote:

"Using these values, Dan Graur estimated that at least 75 percent of our
genome has to be junk, and it’s likely that the actual amount of junk DNA is
closer to 90 percent. However, a more recent analysis shows that calculating
the fraction of junk DNA is a lot more difficult than Graur thought and
certainly a lot more complicated than the simplistic calculations that I
presented earlier[4]."

Footnote 14 refers to Graur (2017) and Galeota-Sprung et al. (2020).

:)
ReplyDelete
Replies

Add comment