I've thought a lot about how to define the word "gene." It's clear that no definition will capture all the possibilities but that doesn't mean we should abandon the term. Traditionally, the biochemical definition attempts to describe the part of the genome that produces a functional product. Most scientists seem to think that the only possible product is a protein so it's common to see the word "gene" defined as a DNA sequence that produces a protein.
But from the very beginning of molecular biology the textbooks also talked about genes for ribosomal RNAs and tRNAs so there was never a time when knowledgeable scientists restricted their definition of a gene to protein-coding regions. My best molecular definition is described in What Is a Gene?.
A gene is a DNA sequence that is transcribed to produce a functional product.
Dan Graur has also thought about the issue and he comes up with a different definition in a recent blog post:
What Is a Gene? A Very Short Answer with a Very Long Footnote
A gene is a sequence of genomic material (DNA or RNA) that has a selected effect function.
This is obviously an attempt to equate "function" with "gene" so that all functional parts of the genome are genes, by definition. You might think this is rather silly because it excludes some obvious functional regions but Dan really does want to count them as genes.
Performance of the function may or may not require the gene to be translated or even transcribed.
Genes can, therefore, be classified into three categories:
(1) protein-coding genes, which are transcribed into RNA and subsequently translated into proteins.
(2) RNA-specifying genes, which are transcribed but not translated
(3) nontranscribed genes.
Really? Is it useful to think of centromeres and telomeres as genes? Is it useful to define an origin of replication as a gene? And what about regulatory sequences? Should each functional binding site for a transcription factor be called a gene?
The definition also leads to some other problems. Genes (my definition) occupy about 30% of the human genome but most of this is introns, which are mostly junk (i.e. no selected effect function). How does that make sense using Dan's definition?