A reader pointed me to the ThermoFisher Scientific website. ThermoFisher Scientific is a major supply of scientific equipment and supplies. They created their life sciences wesite to help inform their customers and sell more products. The page I'm interested in is: Overview of Post-Translational Modifications (PTMs). It begins with,
Within the last few decades, scientists have discovered that the human proteome is vastly more complex than the human genome. While it is estimated that the human genome comprises between 20,000 and 25,000 genes (1), the total number of proteins in the human proteome is estimated at over 1 million (2). These estimations demonstrate that single genes encode multiple proteins. Genomic recombination, transcription initiation at alternative promoters, differential transcription termination, and alternative splicing of the transcript are mechanisms that generate different mRNA transcripts from a single gene (3).
The increase in complexity from the level of the genome to the proteome is further facilitated by protein post-translational modifications (PTMs). PTMs are chemical modifications that play a key role in functional proteomics, because they regulate activity, localization and interaction with other cellular molecules such as proteins, nucleic acids, lipids, and cofactors.
The article gives a reference for the claim that the human proteome consists of over one million proteins. The reference (#2) is Jensen (2004). That's a review discussing the characterization of post-translational modifications by mass spectrometry. The only information on the number of possible proteins comes from the caption to Figure 1, which is similar to the figure shown above. The caption says,
The human genome is predicted to contain on the order of 30 000 open reading frames, each of which, on average, may produce five or six different mRNA species. Each of these mRNA species are in turn translated into proteins that are processed in various ways, generating on the order of 8–10 different modified forms of each polypeptide chain. Thus, the human genome may potentially produce on the order of (30 000 × 6 × 10) 1.8 million different protein species.This is not an appropriate scientific reference. You don't support your own speculations by referencing others who make the same data-free speculations.
Let's pause for a moment and do a little fact-checking. It's simply not true that scientists have discovered that the human proteome is "vastly more complex than the human genome" no matter how you want to interpret that phrase. It's certainly not true that this non-fact was discovered after 1996 (past two decades).
Perhaps what the company representatives meant to say was that there are more different proteins than genes. Some of them are produced by alternative splicing. That fact has been known since the early 1980s. However, the great majority of human genes produce one single functional, polypeptide chain. That doesn't count as a recent discovery that the proteome is "vastly more complex."
This figure shows 100,000 "transcripts" from 25,000 genes implying there are 100,000 different polypeptide chains. There's no evidence to support such a claim and plenty of evidence against it.
It's true that the polypeptide chains produced by translation can be subsequently modified in several ways. That's also a fact that's been known for a lot longer than 1996. Read the textbooks from the 1980s to see descriptions of phosphorylation, glucosylation, and a host of other post-translational modification. Some of these (e.g. phosphorylation) are involved in regulating enzyme activity so there will always be two different versions of the protein (phosphorylated and nonphosphorylated) in the cell. That still doesn't count as "vastly more complex."
In the case of glycosylation, there will be multiple forms, from nascent unglycosylated polypeptide through half a dozen intermediates to the final glycosylated version. That's quite a few but not "vast." Besides, most of those intermediates are transient.
Now, if you combine glycolyated and phosphorylated you can double the number of variants but that's very misleading. What we're really interested in are the total number of functional proteins in a cell and not the number of transient intermediates that might occur during post-translational modification.
Almost all human polypetides are modified by removing the N-terminal methionine residue so I suppose you could honestly say that every gene produces at least two different proteins; one with an initial methionine and one without. Technically, that makes 40,000 different proteins produced by 20,000 protein-coding genes. I don't think that's what people mean when they talk about a complex proteome. What they mean is one million proteins doing different things inside the cell. It's an attempt to explain how humans can be so complex with only (gasp!) 20,000 protein-coding genes.
There are several hundred (~300) different types of post-translational modification known. Some have been discovered recently but many have been known for a long time. I don't know how many different polypeptides (the primary product of translation) are post-translationally modified beyond removing the N-terminal methionine. I suspect it may not be a majority.
Furthermore, I don't know how many post-translational modifications actually contribute to function and how many are just due to "noise." A recent study shows that about 25% of yeast proteins are phosphorylated but less than 10% of phosphrylation sites are well conserved (Studer et al., 2016). The authors conclude that rapidly evolving phosphorylation sites "can contribute strongly to phenotypic diversity."
The authors of the accompanying Insights article have a slightly different take on the results (Matalon et al., 2016). They say,
Such a lack of conservation appears to contradict the textbook view that phosphorylation is strictly controlled and regulates important functions. Whereas certain phosphorylation events do surely regulate function, many may not. Edwin Krebs himself, who received the 1992 Nobel Prize with Edmond Fischer for the characterization of "reversible protein phosphorylation," noted that there likely exists a degree of phosphorylation noise.I agree with Matalon et al. that noise is probably the explanation for many post-translational modifications.1 This doesn't diminish the claim that the proteome is diverse but it challenges the unstated assumption that proteome complexity is deeply meaningful.
Noise—phosphorylation events not selected to carry out a specific function—can provide a simple explanation for the weak evolutionary conservation of phosphosites. Mechanistically, the low degree of sequence specificity required for phosphorylation implies that new kinase recognition motifs can frequently emerge by chance, without having been selected for, and hence need not be conserved. Kinase promiscuity means that even noncanonical substrates may be phosphorylated occasionally, so that abundant proteins can yield subpopulations detectable with mass spectrometry.
When I refer to the "myth" of proteome complexity, I'm not challenging the idea that there are many more protein variants than there are genes. We could quibble about the exact number but it's almost certainly not true that there are, on average, 40 different functional proteins produced by only 25,000 protein-coding genes. That's what would have to be true if there were one million different proteins in the proteome.
I challenge the people who make this claim to show me the 40 different variants of each of the glycolytic enzymes and the enzymes of the citric acid cycle. Or the 40 different variants for each of the subunits of RNA polymerase or complexes I, II, III, and IV in the mitochondrial membrane.
1. Note the use of conservation as a proxy for function.
Jensen, O.N. (2004) Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Current opinion in chemical biology, 8:33-41. [doi: 10.1016/j.cbpa.2003.12.009]
Matalon, O., Dubreuil, B., and Levy, E.D. (2016) Young phosphorylation is functionally silent. Science, 354:176-177. [doi: 10.1126/science.aai8833]
Studer, R.A., Rodriguez-Mias, R.A., Haas, K.M., Hsu, J.I., Viéitez, C., Solé, C., Swaney, D.L., Stanford, L.B., Liachko, I., and Böttcher, R. (2016) Evolution of protein phosphorylation across 18 fungal species. Science, 354:229-232. [doi: 10.1126/science.aaf2144]