tag:blogger.com,1999:blog-37148773.post7419506225803001348..comments2024-03-27T14:50:47.345-04:00Comments on <center>Sandwalk</center>: Confusion about the number of genesLarry Moranhttp://www.blogger.com/profile/05756598746605455848noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-37148773.post-32337403537725751032017-07-04T09:36:21.661-04:002017-07-04T09:36:21.661-04:00Larry, you asked how they decided there were 14,72...Larry, you asked how they decided there were 14,727 lncRNA genes (now > 15,000). From the link: "lincRNA (Long intergenic non-coding RNAs) Ensembl gene annotation, cDNA alignments and chromatin-state map data from the Ensembl regulatory build are used to predict lincRNAs for human and mouse. We do not import the lincRNAs identified by Guttman et al [1], but their publication guided us to our current approach for automatically annotating lincRNAs. First, regions of chromatin methylation (H3K4me3 and H3K36me3) outside known protein-coding loci are identified. Next, cDNAs which overlap with H3K4me3 or H3K36me3 features are identified as candidate lincRNAs. A final evaluation step investigates if each candidate lincRNA has any protein-coding potential. Any candidate lincRNA containing a substantial open reading frame (ORF) covering 35% or more of its length and containing PFAM/tigrfam protein domains will be rejected. Candidate lincRNAs that pass the final evaluation step are included in the human or mouse gene set as lincRNA genes."<br /><br />All the lincRNA are predictions. Once they make their way into the annotation they can be validated by various means such as RT-PCRseq (Howald, 2012). At present 4,609 of the lincRNA transcripts (not genes) are tagged as validated experimentally. <br /><br />How many of these have a function is whole new question. And, especially if you are talking about known function, I suspect the answer is very few.Michael Tresshttps://www.blogger.com/profile/10775168466643478973noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-80353314048511871312017-07-04T09:12:01.138-04:002017-07-04T09:12:01.138-04:00"This means that ... genes ... have arisen in..."This means that ... genes ... have arisen in each of the genomes over the past [...] 100 million years."<br /><br />That 100 million years assumes unchanging or slowly changing clocks. There is much confusion over the difference between generation-to-generation mutation rate and long-term genetic clock, about how fast the latter changes, and what makes it change. What are your views about the differences and the variability?strangetrutherhttps://www.blogger.com/profile/06608525362496458458noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-35910921573467353492017-07-04T06:47:06.874-04:002017-07-04T06:47:06.874-04:00That link is also useless. At best it helps us und...That link is also useless. At best it helps us understand a bit about which RNAs they are going to consider but it says nothing about how they decide which ones have a function (= gene) and which ones are spurious transcripts. Larry Moranhttps://www.blogger.com/profile/05756598746605455848noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-8735826981825970592017-07-04T05:40:02.461-04:002017-07-04T05:40:02.461-04:00Try this link. As you can see from the last paragr...Try this link. As you can see from the last paragraph (lincRNA) they are predicted using cDNA alignments and chromatin-state maps: <br />http://www.ensembl.org/info/genome/genebuild/ncrna.htmlMichael Tresshttps://www.blogger.com/profile/10775168466643478973noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-74667371265787935892017-07-03T17:21:43.650-04:002017-07-03T17:21:43.650-04:00I don't find that very helpful. Do you? If you...I don't find that very helpful. Do you? If you can figure out how they decided there are 14,727 lncRNA genes then please share it with me. Larry Moranhttps://www.blogger.com/profile/05756598746605455848noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-45041800046598633622017-07-03T15:47:51.811-04:002017-07-03T15:47:51.811-04:00You will find the different procedures ENSEMBL use...You will find the different procedures ENSEMBL uses for genome annotation under <a href="http://www.ensembl.org/info/genome/genebuild/genome_annotation.html" rel="nofollow">http://www.ensembl.org/info/genome/genebuild/genome_annotation.html</a>SPARChttps://www.blogger.com/profile/09563722742249547887noreply@blogger.comtag:blogger.com,1999:blog-37148773.post-64858383851018927412017-07-03T06:35:44.205-04:002017-07-03T06:35:44.205-04:00"I don't know what method Ensembl uses to..."I don't know what method Ensembl uses to identify a functional transcript. Are these splice variants of protein-coding genes? "<br />GENCODE (the manual annotation arm of Ensembl) has 128,000 transcripts annotated in mouse now (https://www.gencodegenes.org/mouse_stats/current.html), so either the paper is a bit out of date or there is a discrepancy between Ensembl and GENCODE for some reason.<br />The transcripts are always predictions (the correct terminology should be transcript models) and based on the balance of the available evidence. Obviously some transcript models have better evidence than others. They aren't all coding transcripts or from coding genes, just 55,000 of the models are predicted as coding. The full (long) list of annotated transcript types is at the bottom of the page.Michael Tresshttps://www.blogger.com/profile/10775168466643478973noreply@blogger.com