In spite of what you might have read in the popular literature, there are not a large number of newly formed genes in most species. Genes that appear to be unique to a single species are called "orphan" genes. When a genome is first sequenced there will always be a large number of potential orphan genes because the gene prediction software tilts toward false positives in order to minimize false negatives. Further investigation and annotation reduces the number of potential genes.
The human genome has a number of duplicated genes that aren't found in our closest relatives but, for the most part, these are transient duplications and the extra gene will soon be deleted or disabled (it becomes a pseudogene). The interesting new genes are de novo genes—new genes that are not derived from a gene duplication event. There are about 60 potential de novo genes in our genome [see How many genes do we have and what happened to the orphans?].
New potential genes tend to be small and if they have an open reading frame then the potential protein is less that 100 amino acids long. They are usually transcribed at very low levels and often the transcripts are only detected in a few types of tissues; notably, brain or testes. Both of these tissue are famous for having lots of transcripts.
Only a small number of these potential genes will turn out to be actual genes with a functional RNA [What Is a Gene?]. The problem is that it's hard to prove that a potential de novo gene is really a gene (i.e. has a biological function).
Ruiz-Orera et al. (2015) are the latest workers to give it a try. They analyzed the human, chimpanzee, macaque, and mouse genomes for regions that were frequently transcribed to produce a transcript that was at least 300 nucleotides long. This identifies potential genes. They compared the five genomes to find examples that were only expressed in humans and/or chimpanzees but where similar nontranscribed sequences were present in macaque or macaque and mouse genomes.
The result was 634 human-specific transcribed regions, 780 that were chimpanzee-specific, and 1,300 that were only found in humans and chimps. Only 51% of these transcribed regions were found in intergenic regions. The other 49% were found within known genes: 38% within introns and 11% overlapping exons. (Recall that about 70% of our genome is DNA that's between genes [What's in Your Genome?].)
We want to know if these are real genes of just spurious transcripts. The first clue is that 94% of these transcribed regions are expressed in testes. That's a tissue where chromatin is being reformed and DNA is much more exposed than in other tissues. You expect more spurious transcription in testes cells.
These potential de novo genes are not conserved, by definition, so you can't use sequence conservation as evidence of function. Instead, the authors looked for evidence of proteins/peptides encoded by the transcripts. This eliminates all possible genes that may have a functional noncoding RNA but it's a start.
They found one human-specific peptide and 6 hominoid-specific peptides by mass spectrometry. By looking at ribosome-associated RNAs they identified 5 additional human-specific and 10 hominoid-specific transcripts. Thus, there are 21 potential de novo protein-coding genes. The median size of the peptides is 76 amino acid residues.
Next they looked for signatures of purifying selection by comparing the number of substitutions in the transcribed regions in the macaque lineage and the human/chimpanzee lineages. The sequences in the macaque lineage are expected to accumulate mutations at the rate of mutation (neutral rate) but there should be fewer mutations in the human and chimpanzee lineages if the sequences have a new function. They conclude, "... in de novo genes in general there was not a significant decrease in the number of substitutions in the longest ORF when compared to neutrally evolving sequences, suggesting that the majority of these transcripts do not encode functional protein."
The conclusion is that there are very few de novo protein-coding genes in the human genome but,
Our results indicate that the expression of new loci in the genome takes place at a very high rate and is probably mediated by random mutations that generate new active promoters. These newly expressed transcripts would form the substrate for the evolution of new genes with novel functions.This is important because it shows us that generation of new genes from "random" sequences is not difficult.
Carvunis, A.-R., Rolland, T., Wapinski, I., Calderwood, M.A., Yildirim, M.A., Simonis, N., Charloteaux, B., Hidalgo, C.A., Barbette, J., Santhanam, B., Brar, G.A., Weissman, J.S., Regev, A., Thierry-Mieg, N., Cusick, M.E., and Vidal, M. (2012) Proto-genes and de novo gene birth. Nature, 487:370-374. [doi: 10.1038/nature11184]
Kaessmann, H. (2010) Origins, evolution, and phenotypic impact of new genes. Genome research, 20:1313-1326. [doi: 10.1101/gr.101386.109]
Long, M., Betran, E., Thornton, K., and Wang, W. (2003) The origin of new genes: glimpses from the young and old. Nat Rev Genet, 4:865-875.
Long, M., VanKuren, N. W., Chen, S., and Vibranovski, M. D. (2013) New gene evolution: little did we know. Annual review of genetics, 47:307. [doi: 10.1146/annurev-genet-111212-133301]
Näsvall, J., Sun, L., Roth, J. R., and Andersson, D. I. (2012) Real-time evolution of new genes by innovation, amplification, and divergence. Science, 338:384-387. [doi: 10.1126/science.1226521 ]
Neme, R., and Tautz, D. (2013) Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC genomics, 14(1), 117. [doi: 10.1186/1471-2164-14-117]
Ruiz-Orera, J., Hernandez-Rodriguez, J., Chiva, C., Sabidó, E., Kondova, I., Bontrop, R., Marqués-Bonet, T., and Albà, M. (2015) Origins of de novo genes in human and chimpanzee. PLoS Genet, 11: e1005721. [doi: 10.1371/journal.pgen.1005721]
Schlötterer, C. (2015) Genes from scratch–the evolutionary fate of de novo genes. TRENDS in Genetics, 31(4), 215-219. [doi: 10.1016/j.tig.2015.02.007]
Tautz, D., and Domazet-Lošo, T. (2011) The evolutionary origin of orphan genes. Nature Reviews Genetics, 12(10), 692-702. [doi: 10.1038/nrg3053]
Wu, D.-D., Irwin, D.M., and Zhang, Y.-P. (2011) De novo origin of human protein-coding genes. PLoS Genet, 7:e1002379. [doi: 10.1371/journal.pgen.1002379]