More Recent Comments

Monday, February 10, 2014

The importance of RNA-Seq and next generation

I want to draw your attention to: Genomics researchers astonished to learn microarrays still exist. I especially like this comment from the author (jovialscientist) ...
In recent years, RNA sequencing (RNA-Seq) has been favoured over microarrays. This new technology, using next-generation sequencing, is slightly more accurate, and nations have recently declared war over which is the best aligner to use.

In a recent poll, 98% of researchers answered "next-generation sequencing" to every single question – even their name, age and job title. The new science of "sequence first, think later" has been coined "nextgenomics".


38 comments :

John Harshman said...

"Sequence now, think later" seems like a fine protocol to me, just as long as you don't omit the second step. You?

Jonathan Badger said...

Yeah -- I'm tired of of the argument that claims that science has to be "hypothesis-driven" as if the the simplified "scientific method" taught in middle school was the law as opposed to just a description of how science *can* be done. Quite a few scientific discoveries happen from just analyzing data and finding trends in it that you wouldn't have thought to look for prior to gathering it. Of course, yes, you have to be careful about statistical significance and so forth.

Larry Moran said...

John Hashman asks,

"Sequence now, think later" seems like a fine protocol to me, just as long as you don't omit the second step. You?

Call me old-fashioned but I prefer to do my thinking BEFORE I design an experiment.

I understand the importance of data collection but the papers coming out of the ENCODE labs show us that there should be a lot more to science than just that.

How many more RNA-Seq datasets do you think we need in order to understand whether the RNAs have a function?

Larry Moran said...

Jonathan Badger says,

Quite a few scientific discoveries happen from just analyzing data and finding trends in it that you wouldn't have thought to look for prior to gathering it.

What's the most important scientific discovery that came from analyzing data from a massive RNA-Seq experiment?

If there were lots of examples then the experiments would be valuable and my criticism would be unjust.

John Harshman said...

That would seem a problem with "think later" rather than "sequence now". Of course, I'm not really thinking about RNA sequencing or "transcriptome" sequencing -- not my thing -- but about old-fashioned genome sequencing. It's just getting so cheap these days that you almost might as well do a genome as target some specific region. You can always recover the region in silico.

Joe Felsenstein said...

Just a complaint about terminology. Why are people still calling the sequencing method "next generation" sequencing? It is the dominant approach of the current generation. When another generation of sequencing methods comes along, will be find ourselves saying "back in the days of next-generation sequencing"?

This is marketing terminology, adopted uncritically. I think that some embarrassment is called for.

OK, back to you-all so you can resume your discussion of which comes first, sequencing or thinking ...

Jonathan Badger said...

I've been involved in RNA-Seq experiments in various algae, and while I wouldn't claim that any Nobel-level conclusions have come from them, they have been extremely helpful for a variety of reasons.

The first thing is single gene identification. Many algae have extremely large genomes and so sequencing them isn't very practical. But sequencing their transcriptomes *is*. We can learn a lot about the metabolic makeup of organisms this way and we have encountered many pathways that weren't thought to be in the organism.

Then of course there is differential expression. Sometimes it isn't obvious why a given pathway would be up or down regulated when the organism is, say, starved for nitrogen. We would have never looked for these cases from the beginning, but when we see them a number of times it is unlikely that they are artifacts. Instead they are telling us that biology works differently from how we thought.

It's kind of like in the early days of genomics when we would be stunned to find genes that were thought to be animal-only in bacterial genomes. Each individual observation may not be that important, but put together they change how we thought biology works.

Matt Talarico said...

I've been a little confused about this. I often hear online that microarrays are obsolete due to RNA-seq. But there seems to be a consensus among my profs here at McMaster that they are indeed still used. In fact, I'm studying for a test right now and just "used" a microarray to answer an experimental design practice question. Speaking of that, I probably shouldn't be writing this post right now.

Jonathan Badger said...

For organisms like human and mouse where there are well supported gene models and arrays have already been made, they are still useful in many ways. But there isn't much point to designing new arrays for other organisms anymore, so in that sense they are dead.

Jonathan Badger said...

Yes. It's unfortunate. Particularly since there have been at least three generations of NGS and the first generation of it (454 pyrosequencing) is scheduled for retirement in a few months.

In regard to "back in the days of next-generation sequencing", I've been to a restaurant with a Space Age theme. What they meant by it wasn't that it was futuristic, but that it evoked the 1960s. So "back in the Space Age" as it were.

Georgi Marinov said...

How many more RNA-Seq datasets do you think we need in order to understand whether the RNAs have a function?

This is not why RNA-seq experiments are done the majority of the time. ENCODE was an exception. It is also done for the purpose of annotating newly sequenced genomes, but most of the RNA-seq profiling that goes on these days is simply for the purpose of measuring gene expression levels. Nobody in their right mind goes for microarrays these days, because RNA-seq is indeed vastly superior in almost everything.

So if you are questioning why we are doing RNA-seq in general, you are questioning the need for measuring gene expression levels and for annotating newly sequenced genomes. Which I am sure you would agree is absurd.

Georgi Marinov said...

Not many people in genomics are using the term "next-generation sequencing" anymore - it applied to 454, Solexa/Illumina, ABI, the polonator and Helicos around 2005-2010, but at this point the variety of instruments out there is quite large so people just refer to the particular platform. Technically, "next-generation sequencing" is synonymous with "second generation sequencing", and is characterized by large number of short reads, and is contrasted to "first-generation" Sanger sequencing. The true third-generation systems should be able to deliver a large number of very long reads and preferably be both single-molecule and with low error-rates (though those two things are somewhat incompatible with each other). PacBio defies accurate classification but I would call it 2 1/2, hopefully nanopores do deliver real third-generation capabilities, we will know soon. The potential is there.

I don't think anyone at any point will talk about 4th generation sequencing so no need to worry about that into the future.

Georgi Marinov said...

Arrays are still cheaper so they do find some uses. Not so much in gene expression studies where they have indeed been largely substituted, but the array genotyping market is still big, and ironically, Illumina holds a large portion of it. If you are doing a GWAS on 30,000 subjects, it is still extremely expensive to do whole-genome resequencing, while arrays can do a not-so-good but still useful job for a small fraction of that cost. And this is without considering the hardware and informatics infrastructure you would need to secure to be able to handle that much data. But as costs drop, sequencing will replace arrays in that area too, and then they will be truly dead for good.

SPARC said...

My impression is that most people still do qPCR rather then chips or RNAseq and if they have chip date they still run qPCRs to confirm them. Just as they did Northerns to confirm qPCRs back in the 90s. Unfortunately, not too many try to confirm their RNA data on the protein level. Especially, when estimating the expression of genes encoding secreted proteins like cytokines. And how many would consider that the activity of a protein might not necessarily correlate with its amount?

Joe Felsenstein said...

I wonder whether the phrase "next-generation sequencing" is dying out. I see a lot of evolutionary biologists coming through my office and saying "we're doing next-generation sequencing and ..."

So let's do a search. I used Web Of Science and looked for papers having the phrase "next-generation sequencing" in the title. It seems also to include papers where the hyphen is replaced by a blank. Results:

2007 : 2
2008 : 19
2009 : 62
2010 : 177
2011 : 274
2012 : 531
2013 : 615

So still going up. And not all 2013 papers are yet in their database.

Larry Moran said...

@Georgi Marinov,

I do, in fact, question the "need" for more measuring of gene expression levels and trying to annotate yet another genome by doing RNA-Seq experiments.

Unknown said...

Nice editorial in Current Op Chem Bio on arrays this week highlighting their continued uses :~) http://www.sciencedirect.com/science/article/pii/S136759311400009X

DAK said...

I hate the phrase "high throughput". With a passion.

Joe Felsenstein said...

Is that because one decade's "high" throughput is another's wimpy throughput? Kind of like "supercomputer". A 1980 supercomputer would look very slow today.

Anonymous said...

Darwin said without an hypothesis a 'man might as well go down into a gravel pit and count pebbles' Seems to me most of the RNA-seq papers I've seen had a vague hypothesis and were targeting specific data sets.
BTW for all you working scientists is 'RNA-seq' pronounced 'RNA sequencing' or 'RNA seek'

zumb said...

Does "wholesale sequencing" fit the terminology?

Georgi Marinov said...

Joe FelsensteinTuesday, February 11, 2014 7:38:00 AM
Is that because one decade's "high" throughput is another's wimpy throughput? Kind of like "supercomputer". A 1980 supercomputer would look very slow today.


Not necessarily - more computing power is always good. More sequencing reads is not always needed. For genome sequencing, if we found some magical way of sequencing whole chromosomes from one telomere to the other with low error rates in a single reads, then we would only need to do a few passes over each one and a genome would be sequenced in a few hundred reads. And for functional genomics and metagenomics applications, there are only so many reads you need to achieve an accurate statistical representation of what's going on.

There will be an end point to this

Georgi Marinov said...

I don't understand why.

You would have to argue that we have sufficiently sampled the diversity of life, with high-quality genomes available for everything important (completely false) and that we don't need to know what happens to gene expression during development, upon various treatments and stimulations, etc.

Peter Perry said...

I totally agree.

Peter Perry said...

"RNA-seek" is shortening for "RNA-sequencing". You can pronounce it any of those two ways, but people usually just say "RNA-seek" for short.

Joe Felsenstein said...

(Sorry for all the Deleted comments above, finally got this in the right subthread.)

As a population biologist I feel impelled to remind you that we want not just a sequence for each species, but population samples as well. That may mean that higher speed and lower cost will be prioritized for some time.

Georgi Marinov said...

Sure, but most such studies these days are RNA-seq-based because RNA-seq gives you a direct look at allelic variants.

Unknown said...

Joe

I just wanted to apologize to you... I never should have presumed to put words into your mouth - esp re: Rambam etc.

mea culpa

I was struck that somehow your gentle suggestions that we need not confront each and every audience with an admonition to abandon their faith when presenting the cogency of Evolution seemed to give Larry and others pause.

I wanted to strike that iron while it was hot... I had no right to use you as an anvil.

again apologies.

Georgi Marinov said...

You won't hear any argument in defense of the "Let's generate some data, we will think about what to do with it later" approach to doing science from me. I have in the past been guilty of this myself, when the technology was new and cool, and I am all the more aware of its pitfalls as a result.

But just because people commit that sin some, or even a large fraction of the time, it does not follow that it is the case all of the time and nothing useful has come out of sequencing. The study of transcriptional regulation and RNA biology is almost unthinkable without deep sequencing in 2014, and some areas (for example, all the extremely cool small RNA stuff) simply could not have developed without it

Peter Perry said...

I think it should be noted that, although it sounds cool to say that all experimentation/observation should be hypothesis driven, one only needs to look at the history of science to see that stamp-collecting has been, and still is, extremely important. Darwin wasn't driven by any hypothesis when he went on the Beagle. There would be no Biogeography, Taxonomy, Mineralogy, Petrology, Astronomy, Biochemistry, etc if people hadn't stamp-collected before. This is still true. Many interesting hyphotesis have come out of analyzing data for patterns after the fait accompli. Just "looking at what's there" has been extremely useful in all branches of science.

I understand we are talking about a more specific case here, where money is poured into experiments that may (or may not) be questionable in terms of scientific return and which are sometimes done purely for their "coolness" and marketing power, specialy when vanity journals are involved. But statements like "Call me old-fashioned but I prefer to do my thinking BEFORE I design an experiment" sound a bit patronizing and correspond to an idealization of science, just like the so called scientific method, that don't really correspond to what happens in reality.

As a disclaimer, I don't work with RNAseq, neither with human or otherwise eukaryotic genomics, so I have no personal pride to defend in this particular debate.

SRM said...

Well, there is also this little thing that in any scientific endeavour there will always be some sort of basic hypothesis unavoidably built in even if its not very complex initially: for example, the hypothesis that there will be something to count/quantify and it will be possible to do so.

For something like RNA-seq, the basic hypothesis might be: for the given organism/cell type and/or environmental conditions, there will differential expression of genes and it will be possible to quantify this differential expression.

Nowadays, this of course is no risky hypothesis but even before the experiment begins the secondary questions, anticipating the validation of the first hypothesis, are already beginning to pile up: e.g. what genes are being expressed, what genes are not, and why. And of course there could be questions and answers that emerge from careful analysis of the data that simply would not have been anticipated prior to the experiment.

Larry Moran said...

Pedro Pereira says,

But statements like "Call me old-fashioned but I prefer to do my thinking BEFORE I design an experiment" sound a bit patronizing and correspond to an idealization of science, just like the so called scientific method, that don't really correspond to what happens in reality.

I understand the general trust of your criticism and I agree with most of it. However, I really do think we are witnessing a time when "data collecting" is being abused. It's not always good science.

We've got more than enough information on RNA levels in various cells and various species. It's time to find out what it means. How many of those RNAs are functional?

I'm not denying the REALITY of what's being published. I'm questioning whether it's good science.

Peter Perry said...

However, I really do think we are witnessing a time when "data collecting" is being abused. It's not always good science.


Yes, I generaly agree. I'm glad you understood in what sense I meant what I said. It wasn't an attack.

Mike said...

Larry, I respectfully disagree. RNA-seq is absolutely vital in the study of disease-associated transcriptomics, where the goal is not to characterize the transcriptome of various organisms or to do comparative biology, but rather to link particular RNA signatures with specific clinical phenotypes.

I run a very large cancer genomics initiative, and RNA-seq is a key component of the program, whereby we can assess if and how changes at the DNA level are associated with changes at the RNA level. RNA-seq is also useful for assessing chromosomal rearrangements, which is difficult to do using DNA sequencing unless large amounts of input DNA is available for mate-pair sequencing.

We collect all of these data (DNA and RNA), and while we have a pre-defined hypothesis (i.e. 'there are important genomic and transcriptomic signatures that define particular subtypes of cancer'), that hypothesis is atypical in that, because of the size of the data sets, we cannot approach the problem from the typical "gene-X-controls-function-Y" standpoint.

Oh, and we use microarrays, too. They're cheap and valuable tools for discovery and (particularly) for validation of RNA-seq data, if only because the informatics required to handle microarray data is vastly better developed than that required to effectively handle RNA-seq.

Larry Moran said...

Mike says,

Larry, I respectfully disagree. RNA-seq is absolutely vital in the study of disease-associated transcriptomics, where the goal is not to characterize the transcriptome of various organisms or to do comparative biology, but rather to link particular RNA signatures with specific clinical phenotypes.

Hi Mike. How's that working out? What's the most important thing you've learned from doing those experiments other than the fact that every cell and every tissue (cancerous of not) contains a different set of RNAs—most of which have no function?

Mike said...

Larry, it's working out very well, in fact.

We've learned a tremendous amount about the role of RNA editing in cancer progression.

We've used RNA-seq to identify novel gene fusions and rearrangements that can (and do) disrupt tumour suppressor genes and contribute to tumour development without affecting genome sequence, copy number, or promoter methylation.

We've used RNA-seq to identify RNA expression signatures that are associated with clinical outcomes.

It goes way beyond saying "this tumour has a different set of RNAs than that tumour".

Larry Moran said...

Interesting. Could you give me a reference to the published work on RNA editing and its role in cancer progression?

Mike said...

Our RNA editing data are unpublished (as are our data on RNA expression signatures and clinical outcome...watch this space, though...), but there are published reports on differential (benign vs malignant) RNA editing in breast cancer (http://www.ncbi.nlm.nih.gov/pubmed/19812674).

There are numerous published reports on novel tumour-associated rearrangements that would be difficult or impossible to detect using DNA sequencing alone (http://www.ncbi.nlm.nih.gov/pubmed/22745232; http://www.ncbi.nlm.nih.gov/pubmed/21478487).

Finally, there are several RNA expression-based "signatures" in use today in the clinic for improved patient stratification over standard clinical and pathological features. See, for e.g., Decipher (http://genomedx.com/decipher/overview/) and Prolaris (https://www.myriad.com/products/prolaris/), which are in use in the US for prostate cancer, and provide some degree of improved patient stratification over the standard clinical variables of T category, Gleason Score, and PSA. The tests are not perfect, but they are a step in the right direction, and certainly improve patient prognostication.