More Recent Comments

Sunday, October 07, 2007

Junk in your Genome: LINEs

Scientists first began to get a glimpse of the organization of mammalian genomes about 40 years ago when they looked at the overall complexity using hydridization technology. It soon became apparent that most of the genome was made up of short stretches of DNA that were repeated thousands of times. One major component of this repetitive DNA was about 6000 bp in length. These sequences were called Long Interspersed Elements or LINEs. The other component was much shorter, about 300 bp. These were called Short Interspersed Elements or SINEs.

We now know that LINEs are a form of retrotransposon. The major human LINE is called L1 and it has two open reading frames (ORF's) that are similar to the gag and pol genes of typical retrotransposons [Retrotransposons].


The LINE sequence (blue, above) is organized like a typical gene with a 5′ untranslated region (5′ UTR) and a 3′ untranslated region (3′ UTR). There are two open reading frames (ORF) encoding an RNA binding protein, a reverse transcriptase, and an endonuclease similar to the retrovirus integrase. Like most transposons, L1 is flanked by a short repeated section of genomic DNA.

The role of the RNA binding protein has not been fully worked out but the roles of the reverse transcriptase and endonuclease proteins are known. When the L1 sequences is transcribed, it can be copied into double-stranded DNA and this copy can be integrated into the genome at a site cleaved by the endonuclease.

The copy-integration scheme is shown in the figure on the left from Current Genetics: junk DNA - repetitive sequences.

The net effect of this mechanism is to spread a copy of L1 to another part of the genome. Thus, L1 is a typical selfish DNA transposon.

The human genome contains about 500,000 copies of L1 but the vast majority are fragments of various sizes. Most of the fragments are missing the 5′ end and they presumably arose when the copying mechanism failed to completely copy the L1 mRNA from the 3′ end. About 10,000 copies are full length (6000 bp) and of these 80-100 are known to be "active." Active L1s have intact ORFs and they are regularly transcribed.

About 17% of your genome is composed of L1 LINEs and fragments. It is one of the major sources of junk DNA in your genome.

The important point to remember is that the active L1 LINEs are constantly producing reverse transcriptase in human cells. This enzyme can copy any available RNA into double-stranded DNA. It is responsible for most of the pseudogenes that litter our genome contributing to the mass of functionless DNA known as junk.

5 comments :

CedricF said...

Nice summary Larry. The LINE1 machinery is also known to be responsible for the propagation of over 1 million copies of Alu elements, which account for another 10% of our genome. Alu is the most abundant family of transposable elements in the human genome and its spread is ongoing (although at a much reduced rate than it used to in the distant past). It has been estimated that there is a new Alu insertion for every 100 human births.

What's conceptually interesting is that Alu is a parasite's parasite, which has been even more successful at propagating than LINE1 itself. It shows that the genome is a dynamic ecosystem, with many fascinating creatures.

Larry Moran said...

I've been heading for a posting about Alus ever since I named signal recognition particle as Monday's Molecule. Please be patient. There are several articles in the works.

CedricF said...

Yes it has always been fascinating to me to think that Alu RNA has retained the ability to bind SRP. I am looking forward to reading your posts on the topic.

Sigmund said...

Don't forget that pseudogenes and endogenous retroviral sequences can only enter via the germ line. Theres a lot of interesting data coming out now concerning piwi RNAs - germline specific small functional RNAs that may work in repressing the expression of the repetitive elements.

Anonymous said...

LINEs also have regulatory elements that can alter the ways in which our genes are expressed. (my own research bias rears its ugly head again...)