Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure of the paper #4

Open
rougier opened this issue May 6, 2016 · 11 comments
Open

Structure of the paper #4

rougier opened this issue May 6, 2016 · 11 comments

Comments

@rougier
Copy link
Member

rougier commented May 6, 2016

Introduction

  • The replication crisis today
    → Medicine, Biomedical, Psychology, Political Sciences, etc.
  • Computational science is no exception
    → Software is still a second-class citizen in Science
    → Missing/unavailable code, not compilable, not replicable, etc.
  • Pre-publication solutions (for new work)
    → Good practices, notebooks, active formats, virtual containers etc.
  • Post-publication solutions (for old work)
    → Mostly no solution but things are starting to change

Motivation

  • Use-cases (see Miscellaneous ideas #1)
    → J. Stachelek
    → N. Rougier
    → B. Girard
  • ReRun, Repeatable, Replicable, Repoducible, Reusable or Remixable ?
    → ReRun (variation on experiment and set-up)
    → Repeatable (same experiment, same set-up, same lab)
    → Replicable (same experiment, same set-up, independent lab)
    → Reproducible (variations on experiments, on setup, independent labs)
    → Reusable (different experiment)
    → Remixable ()
  • Reproducibility criterion
    → Quantitative
    → Qualitative
    → Other ?

Editorial process

(see http://rescience.github.io/write/ and http://rescience.github.io/read/)

  • The editorial board
  • Submission
    → Code
    → Article
    → Data
  • Review
    → Public
    → Interactive
  • Edition (criterion for accept & reject)
  • Publication (Github / Zenodo)

Conclusion

  • The added value is the article, not the code
    → Original article + ReScience article should be sufficient for future replication
  • What about failed replication ?
    → See http://rescience.github.io/faq/
  • Expanding the model (the CoScience journal)
    → Instead of post-reproduction, publish articles including independent replication
@rougier
Copy link
Member Author

rougier commented May 6, 2016

This is a proposal for the structure of the paper.
Based on this, I can try to write a first draft if everyone's ok.

@MehdiKhamassi
Copy link

Great! Sounds good.
I tend to consider that the open-source code is also part of the added value here.
Maybe we could also write here that a publication in ReScience is also the opportunity not just to reproduce previous results but to make a few additional simulations that original authors had not done or shown in the original article: for instance, plotting the performance of the model as a function of different parameter values. Such optional additional simulations could help further stress that replicating models brings another added value consisting in bringing new insights on a model from a fresh look with new lenses.

@khinsen
Copy link

khinsen commented May 6, 2016

@rougier That's a very good start!

The one aspect I would do differently is the "pre-/post-publication" discussion in the introduction and the associated "the added value is the article" in the conclusions.

The technology listed under "pre-publication" works for post-publication as well, so that's not the difference. The real difference is research done reproducibly from the start vs. everything else. There is even the probably small category of "research done reproducibly but published classically, without code and data". That's the kind of work we explicitly exclude from ReScience at the moment in order not to encourage that behavior.

In terms of "added value" (in the Conclusions), the main added value for huge category of research not done reproducibly is that someone else has tried replication/reproduction and reports (reproducibly, of course) about that attempt. Article and code go together in that case, so I wouldn't put them in contrast to each other.

This also motivates the "CoScience" idea, and it provides a link to replication initiatives in experimental sciences. In all these cases, the emphasis is on independent replication.

@rougier
Copy link
Member Author

rougier commented May 6, 2016

Version 2 trying to take comments into account. My point with the added value argument was to underline that there is little hope that the newly produced code will still run 10 or 20 years from now. However, the fact that someone tried to actually reproduce the original results and documented this effort (by reporting errors, missing information, etc) would ensure longer term reproducibility.

Introduction

  • The replication crisis today
    → Medicine, Biomedical, Psychology, Political Sciences, etc.
  • Computational science is no exception
    → Software is still a second-class citizen in Science
    → Missing/unavailable code, not compilable, not replicable, etc.
  • Reproducible from the start
    → Good practices, notebooks, active formats, virtual containers etc.
  • What do we do when it's not the case ?
    → Replicate/reproduce it and publish it

Motivation

  • Use-cases (see Miscellaneous ideas #1)
    → J. Stachelek
    → N. Rougier
    → B. Girard
  • ReRun, Repeatable, Replicable, Repoducible, Reusable or Remixable ?
    → ReRun (variation on experiment and set-up ?)
    → Repeatable (same experiment, same set-up, same lab ?)
    → Replicable (same experiment, same set-up, independent lab ?)
    → Reproducible (variations on experiments, on setup, independent labs ?)
    → Reusable (different experiment ?)
    → Remixable
    → Reimplementation
  • Reproducibility criterion
    → Quantitative
    → Qualitative
    → Other ?

Editorial process

(see http://rescience.github.io/write/ and http://rescience.github.io/read/)

  • The editorial board
  • Submission
    → Code
    → Article
    → Data
  • Review
    → Public
    → Interactive
  • Edition (criterion for accept & reject)
  • Publication (Github / Zenodo)

Conclusion

  • New (open source) code
    → new collaboration, new horizon, new results
  • The added value is also the article
    → Original article + ReScience article should be sufficient for future replication
  • What about failed replication ?
    → See http://rescience.github.io/faq/
  • Expanding the model (the CoScience journal)
    → Instead of post-reproduction, publish articles including independent replication

@oliviaguest
Copy link
Member

Is the list with the various R-words specifically with respect to software as an instantiation of a theory or hypothesis? I think it might be good to define some words, even experiment and lab can be ambiguous. I'd like to see reimplementation on the list if possible...? I think that is actually (in my opinion and in my field) the most important r-word.

@rougier
Copy link
Member Author

rougier commented May 6, 2016

@oliviaguest I think it should be part of the discussion in this specific part. I took the experiment/setup think from the Carole Goble (see #2) but this will certainly change since I imagine we all have different definition in minds. The distinction you're making between theory and hypothesis is not totally clear to me so it's certainly worth discussing it further. I will open a new issue on R-words such as to see if we can converge on some definitions (and what words exactly because I don't know if we need all of them in this paper but in the same time, it would be the opportunity to gather different opinions from different fields).
Maybe you can also open an issue on the theory/hypothesis distinction ?

@rougier rougier mentioned this issue May 6, 2016
@oliviaguest
Copy link
Member

I'm not crazy about starting a discussion on that to be honest. It's a philosophy of science point that has been belaboured by significantly more qualified people than me/us. My question works just as well, as I intended it without the conjuction. Does that make sense?

@oliviaguest
Copy link
Member

oliviaguest commented May 6, 2016

Do we care about the type of model in terms of qualitative aspects? E.g., is it a vague(r) model of a proof of concept (e.g., showing that certain computations are possible given certain constraints, etc.) or actually capturing/simulating/modelling/learning some specific data?

@rougier
Copy link
Member Author

rougier commented May 6, 2016

We have to adapt to the model/experiment that is replicated. If the model/experiment gives wualitative result, the replication should do the same. This is the case for the first article where a bifurcation was expected in some specific parts in the model. But some other papers might want to explain some experimental data very precisely and in such a case the replication should do the same. But, if you consider noise, you might want to give some tolerance in the replication.

@tpoisot
Copy link

tpoisot commented May 6, 2016

Following up on this, would we accept "fake" data for data-based replication? I.e. generating dataset wit the same properties, and re-implementing over these as opposed to the original ones?

@rougier
Copy link
Member Author

rougier commented May 6, 2016

I would say it depends on the goal of the original paper. For example if this is a new method for data-processing, it might make sense to used fake data (with same statistical properties ?) if the original ones are not available (or if they were already fake). But if the paper uses some experimental data to show something, I imagine it would be hard to show the same if the original data are not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants