Monday, July 11, 2016

Science journal tries to fix problems with transparency and trustworthiness

The editors of Science recognize that they have a problem. They aren't very transparent or trustworthy. This is true. These same editors have been guilty of publishing and promoting lots of poor quality science over the past few years. Three examples come to mind ...

  • Arseniclife: Science published a ridiculous claim that arsenic could replace phosphorus in DNA. That paper has been refuted but never retracted.
  • Ardipithicus ramidus: Science fell for the authors' hype.
  • ENCODE: Science falls for the hype promoted by ENCODE leaders. Editorial and feature writers announce the death of junk DNA

Don't worry. The editors have been working hard to fix the problem. After a year of study they announce their solution in the June 3, 2016 issue in the lead editorial: Taking up TOP. The author is the current Editor-in-Chief, Marcia McNutt.

She begins with ...
Nearly 1 year ago, a group of researchers boldly suggested that the standards for research quality, transparency, and trustworthiness could be improved if journals banded together to adopt eight standards called TOP (Transparency and Openness Promotion).* Since that time, more than 500 journals have been working toward their implementation of TOP. The editors at Science have held additional retreats and workshops to determine how best to adapt TOP to a general science journal and are now ready to announce our new standards, effective 1 January 2017.
So, what is TOP and how is it going to make Science more trustworthy? Does it involve firing some well-known writers and editors? Does it involve better reviewers?

Nope. TOP is just a way of making sure that raw data is available to other researchers.
... we believe the benefits of requiring the availability of data, code, and samples on which the authors' interpretations rest are worth the effort in compliance (and in some cases in adjusting data ownership expectations), while acknowledging that some special circumstances will require exemptions. This practice increases transparency, enables reproducibility, promotes data reuse, and is increasingly in line with funder mandates. We are also requiring the citation of all data, program code, and other methods not contained in the paper, using DOIs (digital object identifiers), journal citations, or other persistent identifiers, for the same reason. Citations reward those who originated the data, samples, or code and deposited them for reuse. Such a policy also allows accurate accounting for exactly which specific data, samples, or code were used in a given study.
That's not going to fix the main problem.


  1. That's not going to fix the main problem.

    Still, it might help to fix a problem.

    As for the problem you mention, editors are certainly part of it, but so are reviewers. I don't see a simple solution.

    1. Also, data transparency is a *practical* solution. It is easy to say "get better editors and reviewers", but that's a bit like the old argument that we need "better politicians". We can easily complain about the current and past ones but there's no guarantee (or even an objective way to measure) if future ones will be better.

    2. I agree. Data transparency won't fix lots of problems, but it's a necessary step.

  2. Reading the standards there are some issues. The Data standards seem to make no exception for proprietary data, which as far as I understand it is vital to climate science as raw weather data is collected in many regions by private institutions and climatologists can gain access for a fee. In papers this raw data would be referenced as something like "weather station raw data 1972-1985, company X" and other researchers willing to replicate the particular study would obtain a license. That is an unfortunate situation, but the alternative is ignoring the proprietary data (and of course climate change deniers get it both ways: If the data is included they get to complain that it is not public and if it is not included they get to complain that the data used is incomplete).

    As far as code transparency goes, the big issue is distinguishing between code that is actually part of the analytical methods and other code. If I color my graphs in R rather than Illustrator do I now have to conform to a higher standard? Likewise quite a bit of code would just bloat SOMs - If I give a mean for some parameter and the data is in the supplement, should I really have to write something like: The mean of parameter so and so was calculated by this code in R:
    It's not awfully precise, which means that people will likely err on the side of including all kinds of useless information, just in case a reviewer isn't clear on how you perform some basic task. As I result I foresee supplements becoming less readable with the relevant information hidden between "the script that makes the y-axis look as cool as it does in figure 3" and "just in case the readership of science includes some curious preschoolers: to obtain the sum of parameters 6 and 8 we used the command 'parameter.6 + parameter.8'".

    1. Is proprietary "data" really data though? How do you know that the data can be trusted if nobody can see it? Anyway, it's not like governments don't have public weather stations. Even in ultra-capitalist countries like the USA.

      As for the code issue, I don't think they are saying your code should be included in the text of the supplemental, just that the source code needs to be accessible -- which many computational biology journals already require -- no more "data was processed by an in-house script" which isn't described further.

    2. Yes, proprietary data is really data as long as it's clear how to obtain it. I don't have qualms about referencing a single fossil, give its inventory number and the institution it resides at - if somebody wants to check they will have to visit that institution. If you want the raw data from that particular source, you obtain a licence from them. It's not as if these datasets weren't known in the field - NASA has a license, the MPI has a license, etc.

      Using data from these private companies is done to complement data from public sources. As I noted above, climate "sceptics" have been heavily criticising studies that did not include them for ignoring data.

      Take another example - there are some genome assemblies running on a local cluster. If you don't have access to a computing cluster, you can't replicate these. But the raw reads will be made public as will the control files used.

      I don't have an issue with "in-house script" as long as it's described well. I've just split up a large alignment into partitions. There is a file that tells you which positions are in each partition (and that'll be SI). The individual partitions will also be part of the SI (or more likely deposited on dryad). Do you really need the piece of code that reads in the file that has the positions as well as the FASTA file and then spits out FASTA files for each partition? The most interesting bit in there is how I name my files... Most of my scripts don't do spectacular things, they just batch process datasets. Yup, I ran a citable tool for each of the partition (or rather I am running). So my script just dumps a control file with a different filename for the output and the FASTA file to disk and then runs that piece of software. Again, that's not something obscure and "we ran the analysis for each partition" should be transparent enough. I'm not sure who would benefit from reading the script. If there was something more interesting to see than a loop, I could see a point here, but I can't.

      If a script has some novel functionality, yes, I think it should be available. But a lot of the things you just write a short script for are absolutely mundane and there's no real point in publishing them. There's nothing the script adds to the description of "we ran tool X for each of the partitions with the following settings:...". There's in fact nothing stopping somebody from replicating this by manually editing the control file and running tool X.

      I think code should be made public if it:
      a) is vital to replicating the results.
      b) does something which can not be put into plain language in a shorter way - I want to understand what you are doing, if you are using a language I don't use a lot (I'm fluent in R, can follow C++ and Python easily enough and if I never had to look at a PERL script ever again I would be very happy).
      c) does something in a novel fashion. If you've got a clever way to reimplement a known method that runs 20% faster, please share.

    3. Increasingly there is a consensus that scientific analysis should be replicable by the reader (and skeptical reviewers). That means all the data and software should be available to all readers. There is a place for proprietary data and methods -- industry and their secret ways leading to private profits. But that isn't acceptable for publishable science. There have been papers that have had to be retracted because of honest bugs in analysis code. Wouldn't it be better if these could be found at the peer review stage?

    4. Yes, but in the case I gave above I think its far more reasonable for a reviewer or reader to check whether the FASTA file for the 18th partition contains the sequences from the complete alignment that the file detailing the partitions gives. Checking the results is quicker than trying to check the code here and running the code would of course result in the same output FASTA files, without answering the key question of whether they contain the correct sequences.
      Of course I manually checked this to avoid a bug in the first place. And again, my key problem is not that code should be kept secret, just that the code that gets released isn't clogged by irrelevant bits. The release everything version is like asking people to put "letters labeling subfigures were added using Adobe Illustrator" to their Material and Methods section. That's not a relevant information for the reader or the reviewer. "Figure 3 was formated using Inkscape" isn't important. So why should "Figure 3 was formated using R" be in, and worse require published code. lines(data,lwd=.7,col="#5555FFFF") is literally the same as selecting the lines in a vector graphics program after plotting and changing the line width and color of the graph. You wouldn't put an instruction on how to format a vector Figure with a particular piece of software into a repository. You wouldn't require authors to supply that type of information, because it actually detracts from the paper. And the same is of course true for that bit of R code. You can check whether a graph faithfully represents the data given the raw data, which you can plot in any way you want.

    5. Apparently something ate my reply. I don't think every piece of code serves that purpose. In the example I gave above, running my code would produce the same FASTA files. To check whether these are correct, the best option would be to manually check a few of them. Likewise I produce most of my figure by writing R code to format axis, add labels etc. I've got a tree plotter that I think looks nicer than the tree plotters that are generally used. But that's merely cosmetics and you could plot the tree with FigTree just as well. You could also make the format adjustments by editing the FigTree output in a vector graphics program. How vital is it that I publish my code? My main issue here is clarity - some code is important to understand and evaluate a paper, some code is completely irrelevant for both and at worst distracts from the paper itself. As noted above, I'd much prefer a detailed description of a new analytic method than published code for it in a programming language I'm not that secure in. That's particularly true in bioinformatics, where so many people use PERL, which is probably the most obscurantist language out there. It's easier to follow lolcode.

    6. Probably the best way to handle code is to use a "notebook" format such as IPython/Jupyter, Rmarkdown, and the like. This is what leading bioinformaticans like Titus Brown have been recommending for the past few years as part of the push towards replicable bioinformatics. In such a system, all the code for analysis and generation of figures are in one place with no manual steps that often lead to errors. In principle, you can even write the entire paper's text as a notebook, although I've only seen that done in a couple of cases.

    7. That seems great as long as you are not dealing with any computationally heavy task.

  3. Here is another wonderful thing Science did (though I know about it second-hand, and might be wrong). In 1993 a guy I knew, a good guy, thought he had sequenced 14-million-year-old Magniola DNA. (He was wrong, as was true with many claims of sequenced ancient DNA in that era).

    He sent the paper to Science, which accepted it. We know that they have high standards, and base their decisions only on scientific considerations.

    And then they delayed publication for some weeks, to make sure that publication coincided with the release of the film Jurassic Park!

    Strictly for scientific reasons, folks.

    1. There's a well known effect of Jurassic Park on publications in paleontology. There were 3 peaks in terms of paleontological papers published in the big transdisciplinary journals. One is associated with Punctuated equilibrium and the onset of paleobiology as a quantitative discipline. One is associated with the Alvarez hypothesis and the follow up work including the Raup and Sepkosky paper on periodicity in mass extinctions. And the 3rd peak is associated with the release of the first Jurassic Park and it primarily affected dinosaur paleontology. Possibly of interest is that the two older peaks are also associated with increased citation numbers - both the PE+paleobio and the KT impact peaks had a number of papers that are relevant outside the strict confines of the discipline - but the Jurassic Park Peak isn't. There is still a lingering effect in that a paper on dinosaurs is usually something the authors can expect to publish at a higher impact than a paper of comparable quality and scope for any other clade. There are these run of the mill "oldest member of clade X found in locality Y" papers. Well, a Tyrannosaur made PNAS this year while 2 years ago the putative oldest beetle made the respectable J Sys Pal, but well, one is the oldest documented evidence for a huge chunk of terrestrial metazoan diversity, the other is a Tyrannosaur (as seen on TV).

  4. I'm really fascinated by all this but this is nothing new.

    We've been over this before. My question has always been: Are you prepared to be found wrong?

    If professor Moran and his supporter happen to be wrong, what are the consequences?

    Will they return the taxpayers money the had wasted?

    Nuh.They will move on as if nothing happened.

    1. Pretty sure everyone is prepared to be found wrong. But about what, exactly? The consequences of being wrong in science is that eventually you have to change your opinion.

      That doesn't mean that taxpayer money has been wasted, though I'm not clear on just what taxpayer money you're talking about. Still, any data and/or results that would have to be reinterpreted are still valuable data and results.

      All this is hypothetical, of course, and will almost certainly remain so, as I doubt Larry is wrong about whatever it is you're not saying. (Larry is wrong about some things, but not any of the things that nutjobs complain about.)

    2. In science, it is as important to prove what is wrong as to support what is true. Proving something is wrong -- and publishing that result -- helps stop other people from going down that road, thus increasing the chance we'll figure out what's true. Being wrong is embarrassing as anything -- I know, I've been wrong before -- but it's part of doing science well.

    3. John Harshman,

      Everyone knows that neo-darwinism doesn't explain what it promised it would-the mechanism of evolution (s).

      It was the same when darwinism went down the drain.(wanna details?)

      If anyone still sticks to this notion (neo-darwinism or Larry's new option), they should retire. I mean it. There is no way to continue.

      In other news, profesor Conyne has retired. It was well overdue. Thanks Jerry. You mad my life easy now.

      I will find out the details of his (coyne's) retirement soon, but on blogs and the public often relates to Coyne as an "evolution bully". Would anyone like to say "why"?

      I hope this is wrong opinion of professor Coyne that will be vindicated and his public hate toward religion, and especially muslims be tolerated. It has to be verified...

    4. So what is the mechanism of evolution?

    5. Jerry Coyne turned 65. That might have something to do with his retirement.

    6. @Joe

      Shhhh ... let's just keep that bit of information to ourselves, okay?

      It's fun to watch IDiots like Velhovsky make up some elaborate conspiracy theory to explain Coyne's retirement.

    7. It's also fun to hear them declare that Coyne's retirement will make life easier on them. Actually retirement gives Jerry more time to write popular science books, and to post devastating responses to bizarre creationist arguments.

    8. What "evolution" are you referring to? If it is 'change overtime, Shapiro and Noble have some ideas. You really should ask them directly.

      If you are referring to an "evolution" you would like to have happened or to be observed, or replicated, you really need to make the first move and provide some reliable evidence that IT actually happened and more than one piece of evidence that has not been falsified.

    9. Are you retired Joe? You are .......4 as per evolutionarily count

    10. If you can't be rational, at least be coherent.

  5. Nothing wrong with a critical approach. Fred Hoyle didn't like the expanding universe and created a workaround by suggesting an ongoing, minute addition of matter would restore the static albeit expanding universe. So what? Other theories of expansion dominate today. I don't think we have heard the last word on expansion yet.

  6. John Harshman,

    Do you think your statements here meet the criteria of being rational and coherent?

  7. John Harshman,

    Do you think your statements here meet the criteria of being rational and coherent?

    1. Yup. Can you say the same for yourself?

    2. Really? If you think your statements are coherent and rational? Why are you asking me then what the mechanism of evolution is? Don't you think that 150 years isn't enough to finally find it? How many more years do you need to figure it out? The longer it goes on, the less likely it looks like evolution by your standards and less likely to be accidental. I'm sure you will have no problem with that in the end when it goes beyond randomness.

    3. I was asking for your opinion, since you reject the standard mechanisms. I didn't say I expected your opinion to be sensible. So do you think you can answer the question?

      By the way, that wasn't a very coherent or rational comment.

    4. I'm sure you will have no problem with that in the end when it goes beyond randomness.

      We're quite happy with non-randomness. That's what selection is. And mutation may also have non-random aspects. Some mutations may be more likely to occur than others. But none of it needs any guidance. If you know how to do probability math correctly, any serious question about the adequacy of unguided processes to explain evolution was answered a long time ago.

    5. "I was asking for your opinion, since you reject the standard mechanisms."

      Fair enough. However, I would like you to come up with a sensible standard mechanism of evolution and then prove it that's really the standard, and where. Do you know what I mean? How many people do you think would be able to support your view of the "standard"?

      "I didn't say I expected your opinion to be sensible. So do you think you can answer the question?"

      When you sensibly manage the above, I will try to do my best from then on.

      "By the way, that wasn't a very coherent or rational comment."

      Would it matter if you lose the next round?

    6. Would it matter if you lose the next round?

      There are rounds? I asked a question, you refuse to answer. That's about all.

    7. We both know that the "standard evolutionary mechanism"-whatever that is in your mind-is under fire. You are running a dangerous course of being left behind (if you care) if you stick to it.

      Some "appear" not to care about the upcoming changes in the evolutionary thinking. Some mock it. Others outwardly scorn those who dared to stir the pot.

      Well, in the end, I very well know that there will be changes as to what the mechanism of evolution is. How drastic? I don't really care.

      All I know is that people like you will deny the change at first and then... you will have to embrace the changes, like epigenetics influence, the non-Darwinian changes that have been documented for a while now.

      Here is the new one if you didn't have enough to contemplate: quantum mechanics. I hope you are good at that. If not, maybe educating yourself in this field is not a bad idea. It looks not only exiting but also controversial.

    8. Quantum mechanics and non-Darwinian evolution: Welcome to the 20th century! :-)

    9. All I know is that people like you will deny the change at first and then... you will have to embrace the changes, like epigenetics...

      heh heh. Notice the connection

      .. the connection between religious end times predictions (that never come true, as it happens) and end times predictions regarding science (that is, the end f evolution naturally).


    10. So, to summarize, you once again refuse to answer.

    11. judmarc,

      Are you suggesting that quantum mechanics was already tested by Darwinists in the 20th century and possibly rejected due to whatever.

      Link me to some papers.

      I'm just curious how Darwinism aligned with quantum mechanics in the 20th century, because I tell you right now it doesn't align with 21st century quantum mechanics at all.

    12. Spare it John. If evolution was a fact and its mechanism was well known you wouldn't be so reserved, would you? You, and your buddies, would be in my face...

      Well, you don't have that luxury and you never will. I guarantee it.

      BTW; Have you ever considered a more peaceful environment rather than trying to push people to believe what you believe?

    13. To update: you still have no answer.

    14. John Harshman,

      The answer you are looking for is not forthcoming for the following reasons:

      If evolution you believe in were true, we would have at least some evidence for it including the mechanism (s).

      Unfortunately, not such evidence exists, so why would you expect me to have answers for something that is your delusion, not mine? Don't you think that if the mechanism of evolution was a well know fact, I would mention it here and others more than once?

      Here is an example as to why your theory stinks:

      How do you evolve a land-walking-mammal into an aquatic one?

      You, and your buddies would probably say natural selection acting on mutations. Larry would say neutral theory, genetic drift etc.
      This all may sound great but only for an untrained eye because there is a huge difference between coming up with possible mechanism of evolution of a land-walking-mammal evolving to an aquatic one and reality. Scientific reality, to be exact.
      For evolutionists it may look like a piece of cake, on paper only, but proving their claims is another thing.

      So, to sum this up John, I don't have an answer you are looking for, because there is no plausible evidence that the evolution you believe in has happened, could have happened and is definitely not happening right now.

      So, unless you provide evidence that the "standard mechanism of evolution is true" or meets one criteria of scientific requirement, we might as well call your standard mechanism of evolution a wild guess or a naturalist miracle.

      Which one do you prefer, John?

    15. I'm just curious how Darwinism aligned with quantum mechanics in the 20th century, because I tell you right now it doesn't align with 21st century quantum mechanics at all.

      Would you mind explaining what you think the issue is more precisely? What are the new findings in quantum mechanics that cause a problem for evolution?

    16. Ah, so you're a creationist. I hadn't realized that. What do you think of otters?

    17. @John: Is that directed at me, or at Velhovsky? Because I'm certainly not and Velhovsky is so obviously one that I find the question puzzling either way.

    18. I really didn't know he was a creationist. There are many flavors of loon: IDiot, Third Way, etc.

    19. I'm just curious as to what evidence convinced you that the natural processes in nature do not require design?

    20. I'm just curious how Darwinism aligned with quantum mechanics in the 20th century, because I tell you right now it doesn't align with 21st century quantum mechanics at all.

      Before we even get to evolution, why don't you tell us all what you think the fundamental differences are between quantum mechanics in the 20th century and the same in the 21st?

    21. For evolutionists it may look like a piece of cake, on paper only, but proving their claims is another thing.

      So you don't take the fossil record or genetics as evidence? If you're completely unfamiliar with logical scientific concepts of evidence and proof, as it appears you are, then I suppose you could feel that way.

    22. Cruglers, when western science began, the assumption was design. Within that design framework, people tried to figure out what the world is like and how it works. They were very successful in explaining these things without reference to a designer. Also, they learned many things that contradicted the design story they knew (Genesis). Gradually, design was shoved aside; it wasn't necessary to explain anything.

      At this point, design continues to be an unnecessary hypothesis. Therefore, if one wants to to propose that the universe is designed, one has to provide positive evidence that it is. So far, nobody has done a convincing job of that.

    23. I'd suggest that for clarity as to who is talking to who, all of us should quote the comment we are responding to or the name of the person we are responding to. Or both. I'm guilty of forgetting that too. Sorry.

      @John Harshman
      "Ah, so you're a creationist. I hadn't realized that."

      I don't care who you think I am. If you are smart enough, which I have no doubt about that you are, you will figure out soon enough what I represent. I hope.

      "What do you think of otters?"

      So, is this the best you can do?

      What are you trying to tell me John? That otters used to be land mammals and due to evolutionary processes you can't explain and the mechanism of evolution you have not clue about they became aquatic otters? Is that your theory? Please tell me John Harshman that your theory has teeth; more grip on reality.

    24. Why can't you just tell us what you represent? Why should we have to search for subtle clues?

      The point of the otters was that you can find mammals in all conceivable intermediate conditions between terrestrial and fully aquatic. Otters (and I'm thinking of freshwater otters here) are just a little way along that trajectory. If you can't believe that otters are weasels that decided to swim a lot, I don't know what to do with you.

    25. Velhovsky, you've managed pretty quickly to reach the point of simply being boring. You answer no questions, just keep asking roughly the same ones while pretending you haven't read the answers (I was going to say "read and understood," but that might be a bridge too far). Yeah, we get it, your "thing" is to dismiss without comment or thought all the evidence as if it doesn't exist.

      So will you answer any of the questions John or I have asked? (Not even sure why I'm asking, as I don't expect your answers to be any more rational or interesting than your questions. Still, hope springs eternal.)

    26. Jon Scott was one of the top creationists in the early days of the internet. Unusually, he was well versed in science. He talks movingly of the day his world fell apart. A dinosaur with feathers!

      "I kept updating the archive and working on it straight through 1998, the year in which Caudipteryx zouii and Protarchaeopteryx robusta - two creatures which scientists described as obviously non-avian dinosaurs (which means they weren't birds), but which had feathers! I simply emphasized their avian qualities and either explained away or dismissed as unimportant their reptilian characteristics, and went on happily spreading the myth of creationism.

      Yes - I had the evidence, the information, and the knowledge of how evolutionary biology works - yet I did not have the intellectual integrity to admit to the truthfulness of evolutionary theory and kept denying that this incredibly intricate law and set of 'trends' in nature could possibly have any validity.

      Then, in september of 1999, the bomb dropped. I picked up my issue of the National Geographic and saw what else on a page advertising an upcoming issue; but Sinornithosaurus millenii! It had long steak-knife-shaped teeth like a T. rex, a long, muscular tail, hyper-extendable "switchblade" claws on the hind legs like Velociraptor mongoliensis, a narrow snout that looked almost like a bill, a bird-like pubic structure, and worst of all - feathers!

      I simply stared at the page for a few moments, muttered "oh shit!" to myself a few times, and got up to check the N.G.News web site. This wasn't just some artistic depiction of what a reptile/bird might look like - and it was no hoax. It was a small dromaeosaurid ("raptor") with killing claws, razor-sharp teeth, and a pair of wing-like arms complete with plumage. My heart sank, and my gut churned. This was it - the one proof of evolution I had always asked for but never thought would come to light. In my mind, I was betting that even if evolution were true, the chances of finding such a beautiful example of transition would be slim enough to be dismissed as impossible. And yet here it was - proof.

      I stepped outside to compose myself, and stood there looking at the world around me."
      ' "This is it..." I spoke to myself softly, "Welcome to the real world." '

  8. I'm just curious as to what evidence convinced you that the natural processes in nature do not require design?

    There is a huge pile of such evidence. Since we don't have all year, I'll just pick one: quantum physics. All the experiments that have ever been done, down to more decimal points than you can count, show the universe functions on probabilities. So you can't design anything and be assured of the outcome; there's a roll of the dice fundamentally at the bottom of everything. Just to repeat for emphasis: The way the universe works, you can't plan or design in advance for any particular outcome. So any "theory" that relies on planning or design is wrong.