Scientists first began to get a glimpse of the organization of mammalian genomes about 40 years ago when they looked at the overall complexity using hydridization technology. It soon became apparent that most of the genome was made up of short stretches of DNA that were repeated thousands of times. One major component of this repetitive DNA was about 6000 bp in length. These sequences were called Long Interspersed Elements or LINEs. The other component was much shorter, about 300 bp. These were called Short Interspersed Elements or SINEs.
We now know that LINEs are a form of retrotransposon. The major human LINE is called L1 and it has two open reading frames (ORF's) that are similar to the gag and pol genes of typical retrotransposons [Retrotransposons].
The LINE sequence (blue, above) is organized like a typical gene with a 5′ untranslated region (5′ UTR) and a 3′ untranslated region (3′ UTR). There are two open reading frames (ORF) encoding an RNA binding protein, a reverse transcriptase, and an endonuclease similar to the retrovirus integrase. Like most transposons, L1 is flanked by a short repeated section of genomic DNA.
The role of the RNA binding protein has not been fully worked out but the roles of the reverse transcriptase and endonuclease proteins are known. When the L1 sequences is transcribed, it can be copied into double-stranded DNA and this copy can be integrated into the genome at a site cleaved by the endonuclease.
The copy-integration scheme is shown in the figure on the left from Current Genetics: junk DNA - repetitive sequences.
The net effect of this mechanism is to spread a copy of L1 to another part of the genome. Thus, L1 is a typical selfish DNA transposon.
The human genome contains about 500,000 copies of L1 but the vast majority are fragments of various sizes. Most of the fragments are missing the 5′ end and they presumably arose when the copying mechanism failed to completely copy the L1 mRNA from the 3′ end. About 10,000 copies are full length (6000 bp) and of these 80-100 are known to be "active." Active L1s have intact ORFs and they are regularly transcribed.
About 17% of your genome is composed of L1 LINEs and fragments. It is one of the major sources of junk DNA in your genome.
The important point to remember is that the active L1 LINEs are constantly producing reverse transcriptase in human cells. This enzyme can copy any available RNA into double-stranded DNA. It is responsible for most of the pseudogenes that litter our genome contributing to the mass of functionless DNA known as junk.