R-words #5

rougier · 2016-05-06T11:55:51Z

The idea is to converge on definition of these words according to our respective scientific domains. At this point it is not even sure that all words are relevant to all domains and this may also depend on the kind of software we consider (see discussion in #4)

Rerunable:

Repeatable:

Replicable:

Reproducible:

Reusable:

Remixable:

Reimplementable:

oliviaguest · 2016-05-06T14:01:01Z

I will only attempt one because the rest seem too similar/unknown to me.

Reimplementable: Is there enough information in the specification (i.e., journal article, or referenced within the journal article) to recreate the model (within the theory or account) from scratch? If yes, then the model is reimplementable. If no, then even if the experiments can be carried out within the original (presumably opaque) codebase, then the model is not able to be reimplemented (given the current specification).

rougier · 2016-05-06T14:26:13Z

Rerunable: Is it possible to re-run the model (same computer, same system, same program) and get the exact same results ? It may seem obvious that the answer is yes but it is not that obvious actually. For example, if you're using a random generator and did not set or record the seed, then you cannot guarantee a re-run. Same if you fed manually some parameters when starting your model without a mechanism to save them or read them from a file that is changed after the run.

oliviaguest · 2016-05-06T14:29:37Z

What's the history/reason behind the capitalisation of rerun?

rougier · 2016-05-06T14:37:26Z

Replicable: Is it possible to re-run the model (same program) on a different computer using a different system or different version. Does your specification give enough information concerning required library and their respective version number ? Does your model relies on system specific libraries (use of system specific library) ? Does it correctly handle system-specific features (float precision, endianess, etc.)

rougier · 2016-05-06T14:38:00Z

@oliviaguest None, just correct it.

rougier · 2016-05-06T14:38:31Z

Reproducible = Reimplementable (for me)

oliviaguest · 2016-05-06T14:41:05Z

Ah, so for me it's more complex, like (although I might need to think about it more):

(Rerun + Reimplement) ~ Reproduce = Replicate

R-words on LHS are somehow weighted.

Edited. OK, not sure. But I think maybe best to nest them? I have seen other definitions out there.

oliviaguest · 2016-05-06T14:48:23Z

Something needs to be said about replicating the data also, in my opinion. If you are modelling data, then surely the experiment that produced the training data and testing data has to also be replicable.

khinsen · 2016-05-09T13:21:15Z

@oliviaguest Your definition of reimplementable looks fine to me, but it should be made clear that it applies to a human-readable document (such as a paper), not to software or computed results like the other R-words. I also think that this term matters to us, because it defines the ideal candidate paper for a replication to be published in ReScience.

khinsen · 2016-05-09T13:31:52Z

@rougier I am fine with your definition of "rerunable". The tricky part is the transition from there to "replicable".

The idea is that there are aspects of a computation that should be modifiable without affecting the results, according to expectations shared in the community. A computation is then called "replicable" if it satisfies those expectations. Typically the expectations include results independent of minor version changes in everything, and of the use of different compilers and operating systems.

The big problem is of course that these expectations are never written down explicitly, and it is unlikely that there is a complete consensus about them in any community. But without a clear list of criteria, it is impossible to verify if a computation is replicable. To make it worse, some people's expectations are about obtaining the exact same results at the bit level, whereas others consider it normal that "small" variations happen, though nobody ever seems to be able to define "small" in this context.

khinsen · 2016-05-09T13:33:02Z

Repeatable: = rerunable.

khinsen · 2016-05-09T13:37:50Z

Reproducible: the result of a computational study is reproducible if its human-readable description is reimplementable and if a reimplementation leads to results whose scientific interpretation is the same as for the original results.

Note that reproducibility can change over time, for two reasons:

A new reimplementation can lead to different results than any previous one.
The differences between the results of different implementations can become scientifically relevant if the state of the art of the field makes significant progress.

khinsen · 2016-05-09T13:45:46Z

Reusable: a piece of code or a dataset is reusable if its characteristics are sufficiently well described that it can safely be transferred to another context.

For a piece of code, there are two interesting relations to other R-words:

Reusable code is well-documented code, meaning that its documentation is reimplementable.
Reusable code is related to replicable computations through a clear statement of dependencies in the documentation.

oliviaguest · 2016-05-09T13:48:46Z

Where do changes that are not central to theory, so can be abstracted away in ideal circumstances, get relegated to? For example I have come across cases where a non-theoretically important implementation detail which should not affect the model (e.g., quicksort vs another sorting algo) ends up affecting the model because the authors were not careful. The type of sorting algo used is categorically not part of the theory and should not be, but was still integral to the replication of the results (because careful consistent modelling was not carried out). It should be part of the spec, ideally, but it was neither part of the spec nor abstracted away enough during investigations of the model so the results ended up depending on a theoretically irrelevant point.

And - very relatedly - where do details that are important to the theory but have not been discovered as such belong in this r-hierarchy of words? It is a similar but importantly different case in which an implementation detail needs to be promoted to the theory-level because it is actually theoretically important, e.g., it is important for quicksort to be mentioned as the theory depends on it and not just because of sloppy modelling.

oliviaguest · 2016-05-09T13:50:02Z

PS: I mentioned "r-word" as a slightly flippant comment. I am now a little sorry it has caught on as it makes me feel conflicted.

khinsen · 2016-05-09T13:54:52Z

@oliviaguest I'd say that the cases you describe are outside of the R-word universe. They are well covered by traditional terms such as "mistake", "oversight", etc. Their symptom is usually non-reproducibility. In fact, I'd say that a major motivation to test for reproducibility is to catch situations such as those you describe.

khinsen · 2016-05-09T13:57:52Z

Some general comments about the R-word definitions:

We should provide a short definition for clarity in our paper, and for uniformity of usage in the context of ReScience. But I'd leave it at that - discussing any of these concepts in depth can quickly turn into a dissertation on the philosophy of science.
We should be careful to state what each word applies to: a piece of code, a complete computation, a result, a paper, ...

oliviaguest · 2016-05-09T13:58:21Z

I don't understand, if they are outside the words we are defining then I'm really confused. 😕

khinsen · 2016-05-09T13:59:04Z

@oliviaguest Don't worry. It's a good shorthand for this discussion. I hope it won't end up in the text of our paper!

khinsen · 2016-05-09T14:02:19Z

@oliviaguest Perhaps "outside" isn't the best term. They are of course related, being specific cases of non-reproducibility. But I don't think we need a specific new term for each cause of non-reproducibility. We don't want to blow up the cost of future editions of the Oxford Dictionary.

oliviaguest · 2016-05-09T14:05:21Z

Aha! Now I see the confusion, @khinsen.
I am asking which umbrella word they fit under, not to give them unique names! Which word that you are defining explains those cases? And if it is the same word - why? I am curious, as I do not know the answer and feel there are many similar sounding words to me. I use a very different way of talking about these issues, so I am trying to fit (re-describe) my experiences to match the general concepts I see you defining here.

khinsen · 2016-05-09T14:22:01Z

@oliviaguest The common category is "reproducibility" in my opinion.

Your second case is almost the textbook definition of a cause of non-reproducibility. Scientist A publishes a study. Scientist B tries to reproduce the scientific conclusion using a modified study, and fails. Comparison of the two studies then shows that something that everybody considered a technical detail actually is important and should be promoted to a part of the theory.

Your first case is very similar, except that the comparison of the two studies shows that study A was not designed carefully enough. The theory has survived another round.

So the common point is that a reproduction attempt fails, and the analysis of the failure improves everybody's understanding. Just the Happy End that we need to keep our funders happy.

jsta · 2016-05-10T01:39:03Z

My feeling is that the term remixable is very dependent on the details of the license assigned to the work. However, it also has a practical aspect. It is very difficult to remix a model if the codebase is not made up of modular pieces (functions).

khinsen · 2016-05-10T05:57:32Z

@jsta My main question concerning remixable is: what is it about? Mixing suggests a large collection of things. What are those things? Functions in a library? If so, what is the mix resulting from mixing functions?

jsta · 2016-05-18T00:33:40Z

@khinsen I am not sure. A subset of the original? remixable may be a tough one!

oliviaguest · 2016-05-18T01:29:43Z

Is everything open source remixable?

khinsen · 2016-05-18T06:13:16Z

If you take "mixing" from a legal point of view, probably yes. Otherwise, we need to decide first what "remixable" really means!

gdetor · 2016-05-18T16:52:56Z

It's not always the case. Imagine that you have a hybrid code (open source and proprietary), then you have to acquire the proprietary license as well. Otherwise you cannot mix the hybrid code with any other code. The use of NAG library would be an example. So I think you have to verify that all of the mixing parts are under a "open source" license.

oliviaguest · 2016-05-23T00:51:13Z

Perhaps more important than or equally important to definitions: A metric?

How do you choose a reproducibility metric? by @IanHawke

khinsen · 2016-05-23T07:56:00Z

Very important indeed, but in my opinion this is a research topic for many years to come. At this time, we can do no more than mention the problem and refer to papers such the one by Mesnard and Barba. Another reference along these lines is a recent paper in Science about reproducibility of DFT computations in materials science.

Note also that the problem concerns only computational models derived from continuous mathematics. That's of course a huge part of computational science, but not all of it. As a consequence, all the R-words can be defined independently, pretending that all of science can be done using discrete maths. All science is based on simplifying assumptions, so this could be ours.

oliviaguest · 2016-05-23T10:34:00Z

What about the more general point of criteria?

khinsen · 2016-05-23T12:39:01Z

I'd say that at the level of generality we work at, these criteria follow from the definitions of each R-word, with one option being "domain-specific, we can't say any more here". We should probably say something about the criteria in each definition.

As an example, rerunable makes sense only if the criterion is bitwise identical results, not counting metadata such as time stamps. At the other extreme, the criteria for being reproducible are necessarily domain specific.

oliviaguest · 2016-05-23T13:34:20Z

I think some general meta principles might be required though... I might be talking cross-purposes with you, but I have a feeling that criteria or at least meta criteria (criteria for criteria) can be nailed down.

oliviaguest · 2016-05-23T16:21:37Z

This articles gives definitions for replication and reproduction very clearly: http://biostatistics.oxfordjournals.org/content/10/3/405.full

oliviaguest · 2016-05-23T16:22:30Z

And here is another: Replicability is not Reproducibility: Nor is it Good Science

khinsen · 2016-05-24T07:13:18Z

@oliviaguest Thanks for those references! I remember the second one well, because I disagree with its conclusion, but its definitions of replicability vs reproducibility are indeed very clear. The first one seems to use the exact inverse definitions, and defines in detail only the one we call replicability. Unfortunately, in the criteria for replicability, there's again the "reasonable bounds for numerical tolerance", which is what Ian wrote about in his blog post.

oliviaguest · 2016-05-24T08:44:29Z

Slight tangent but I don't personally think any definition is Gospel from any paper nor do I like the idea of a prescriptive/normative definition-war. Principally, because I think modelling and non-modelling have differences that transcend these words and a modeller telling everybody what word to use just won't work anyway. The best we can do is define terms when we use them.

oliviaguest · 2016-05-24T08:47:52Z

I know that's not what's being attempted here, it's just (being Cypriot and reading above about the OED as if it's dictating as opposed to describing) I'm explicitly aware of language centralisation.

khinsen · 2016-05-26T08:40:58Z

I am not interested in prescribing anything either, assuming that we have the power to do so which I seriously doubt. I would like to see some more standardization of vocabulary, but that's beyond my influence. In the meantime, I just want to be clear about the definitions we use ourselves.

rougier · 2016-06-30T05:40:37Z

The ACM just issued an announcement Result and Artifact Review and Badging where they proposed some definitions:

Repeatability (Same team, same experimental setup)

The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.
Replicability (Different team, same experimental setup)

The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
Reproducibility (Different team, different experimental setup)

The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

labarba · 2016-08-21T01:05:14Z

The ACM adoption is unfortunate and ahistorical in the computing community.

My students and I have been working for a long time on a literature review to sort through the disarray of terminology on reproducibility. Here I will share some notes. (I have a draft of a blog post or essay, but it's abandoned for a few months now. These tidied-up notes will help.)

The phrase "reproducible research" in computational science is traced back to geophysicist Jon Claerbout at Stanford, who started in the '90s this tradition in his lab that all the figures and tables in their papers should be easily re-created, even running just one command. The oldest published paper we found that addresses their method is:

Claerbout, Jon, and Martin Karrenbach. "Electronic documents give reproducible research a new meaning." Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, pp. 601-604 (1992), doi: 10.1190/1.1822162 http://library.seg.org/doi/abs/10.1190/1.1822162

Some of the content of that paper is very outdated (it’s 1992 after all), but the way the “goals” of reproducible research are presented is interesting:

merge a publication with its underlying computational analysis
researchers prepare documents in a form that they can themselves reproduce
leave work in a finished condition where coworkers can reproduce the calculation, including the final figures
prepare a complete copy of the local software environment …
export electronic documents to other sites so they can readily reproduce …

Claerbout relates some of the story of “reproducible research” coming out of Stanford in an essay on his website:

“Reproducible Computational Research: a history of hurdles, mostly overcome,” Jon Claerbout, http://sepwww.stanford.edu/sep/jon/reproducible.html

He mentions that with Matthias Schwab, they submitted an article to “Computers in Physics” about the reproducible-research concept, but it was rejected—the magazine was later bought by IEEE and turned into “Computing in Science and Engineering,” where it was eventually published years later as:

M. Schwab, M. Karrenback, and J. Claerbout (2000), Making scientific computations reproducible, CiSE 2(6):61–67.

At Stanford, statistics professor David Donoho learned of Claerbout’s methods in the early 1990s, and began adopting (and later promoting) them. A well-cited early paper from his group is:

Buckheit, Jonathan B., and David L. Donoho. Wavelab and reproducible research, Volume 103, Lecture Notes in Statistics, pp 55-81. Springer New York, 1995. PDF as a Stanford Technical Report

This paper is often cited for the quote:
“an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data that produced the result”
… but Donoho says that this statement was made paraphrasing Jon Claerbout, so it should not be solely attributed to Donoho when cited.

Buckheit and Donoho make the commitment:
"When we publish articles containing figures which were generated by computer, we also publish the complete software environment which generates the figures.”

Citing the work of Claerbout, they say:
“… reproducibility of experiments in seismic exploration requires having the complete software environment available in other laboratories and the full source code available for inspection, modification, and application under varied parameter settings.”

and later:
“… publishing figures or results without the complete software environment could be compared to a mathematician publishing an announcement of a mathematical theorem without giving the proof.”

See also:

Drummond, C. (2012). Reproducible Research: a Dissenting Opinion. Preprint: http://cogprints.org/8675/

and commentary by R. Peng in:
http://simplystatistics.org/2012/11/15/reproducible-research-with-us-or-against-us-3/

But especially, read this:

Replicability vs. reproducibility — or is it the other way around?, October 31, 2015, by Mark Liberman: The language of science, http://languagelog.ldc.upenn.edu/nll/?p=21956

This is an essay by Mark Liberman, Christopher H. Browne Distinguished Professor of Linguistics at the University of Pennsylvania. He teaches introductory linguistics, as well as big data in linguistics, and computational analysis and modeling of biological signals and systems (among other topics).
Regarding the confusion with the swapped terms, Liberman concludes:
"As far as I can tell, it's a difference between people influence by Drummond's provocative but deeply confused article, and everybody else in a dozen different fields.”

I found this blog post where the author corrected the swapped terminology after becoming aware of this!
http://lgatto.github.io/rr-what-should-be-our-goals/

Additional references using a terminology that is consistent with Claerbout/Donoho/Peng are:

The the “TOP Guidelines” (Transparency and Openness Promotion), Standards for Promoting Reproducible Research in the Social-Behavioral Sciences (2014), https://mfr.osf.io/render?url=https://osf.io/ud578/?action=download%26mode=render

Report of the National Science Foundation's Subcommittee on Replicability in Science: "Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science” (2015) PDF

I have more! But I will leave it there for now, as this Issue comment is already 2,000+ words.

benmarwick · 2016-08-21T08:02:10Z

On the question of definitions of 'reproducibility' and 'replicability', I think the idea of convergence on definitions noted above might be impossible because these terms have totally opposite definitions in different fields.

@labarba's comprehensive summary of the literature captures what I think is the common and widespread use, outside of the ACM, political science, and one or two other areas. Incidentally, it seems like the two terms are used synonymously in this paper in this sentence "However, good intention are not sufficient and a given computational results can be declared reproducible if and only if it has been actually replicated in a the sense of a brand new open-source and documented implementation."

The article What does research reproducibility mean? simlarly summarises the prevailing definitions for most researchers in my field and related areas. They present reproducibility as

"the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator. That is, a second researcher might use the same raw data to build the same analysis files and implement the same statistical analysis in an attempt to yield the same results".

This is distinct from replicability:

"which refers to the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected."

They further define some new terms: methods reproducibility, results reproducibility, and inferential reproducibility.

But, as Lorena has noted (I'm looking forward to seeing the rest of her review!), the definitions in this Science paper, which are also consistent with a long history of discussions of scientific reproducibility, as noted in the linguistic analysis at the Language Log blog, are totally opposite to the ACM, which take their definitions from the International Vocabulary of Metrology. Here are the ACM definitions:

Reproducibility (Different team, different experimental setup)
The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently

Replicability (Different team, same experimental setup)
The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.

The problem with these definitions is that that IVM is the wrong place to look for modern definitions of these terms, This is because it's exclusively concerned with measurement of physical properties. It does not engage at all with computational contexts. Computational contexts are a big part of the contemporary reproducibility discussion, thanks largely to the work of @victoriastodden.

We can also note with interest the recent Nature News articles Muddled meanings hamper efforts to fix reproducibility crisis and 1,500 scientists lift the lid on reproducibility. These report on the general problem of a lack of a common definition of reproducibility, despite a widespread recognition that it's a problem. Those are helpful to demonstrate that there is a range of definitions in common use.

The main point here is that any discussion of definitions of these terms needs to acknowledge this diversity as part of the challenge of promoting these values and behaviours in science broadly. If this diversity is neglected, and you are writing for audiences spanning many fields (I hope this for ReScience!), there is a risk of being irrelevant for researchers in fields that have different definitions to the ones you've adopted.

I understand that you do have to present some kind of definitions in this paper, and I guess that the ones you choose will depends which research community you want to signify your affiliations with. There's no problem with that, so long as you note (perhaps with a brief comment and carefully chosen citations) that there is substantial diversity in how the terms are used across the sciences. It's great for science generally that more people are concerned with these issues, even if they don't agree on the definitions!

labarba · 2016-08-21T14:07:11Z

@benmarwick writes:

the definitions in this Science paper, which are also consistent with a long history of discussions of scientific reproducibility, as noted in the linguistic analysis at the Language Log blog, are totally opposite to the ACM, which take their definitions from the International Vocabulary of Metrology.

[...]

The problem with these definitions is that that IVM is the wrong place to look for modern definitions of these terms, This is because it's exclusively concerned with measurement of physical properties. It does not engage at all with computational contexts.

I wholeheartedly agree—going to IVM for inspiration on what definitions to adopt was misguided. (It is possible, too, that some folks in that committee were influenced by the Drummond papers. SIGH.)

The ACM is the Association for Computing Machinery. Although we may resign ourselves to the impossibility of a convergence of terminology across all disciplines, within computational disciplines there is a clear history of adoption. I have given more than a dozen references above, spanning 25 years, and there are many more.

(If anyone is adding a counter-example—like, "chemists use the opposite meaning"—please, do include a reference, rather than leaving it as hearsay.)

The Science paper @benmarwick cited:

Goodman, S. N., Fanelli, D., & Ioannidis, J. P. (2016). What does research reproducibility mean?. Science translational medicine, 8(341), 341ps12-341ps12. DOI: 10.1126/scitranslmed.aaf5027 http://stm.sciencemag.org/content/8/341/341ps12.full

... recognizes that: “… basic terms—reproducibility, replicability, reliability, robustness, and generalizability—are not standardized”, while clearly adopting the Claerbout/Donoho/Peng usage. They say:

“ … the modern use of ‘reproducible research’ was originally applied not to corroboration, but to transparency, with application in the computational sciences. Computer scientist [mistake: geophysicist] Jon Claerbout coined the term and associated it with a software platform and set of procedures that permit the reader of a paper to see the entire processing trail from the raw data and code to figures and tables.” [used in] “epidemiology, computational biology, economics and clinical trials…" [refs. provided]

Goodman et al. propose a new lexicon as a way out of the confusion:

methods reproducibility (original meaning of reproducibility)
results reproducibility (previously, replicability)
inferential reproducibility

A good portion of this article derives from a talk given by Goodman at a workshop of the National Academy of Sciences, titled: "Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results." Goodman gave there a useful clustering of disciplines into "groups with similar cultures," as follows:

Clinical and population-based sciences (e.g., epidemiology, clinical research, social science),
Laboratory science (e.g., chemistry, physics, biology),
Natural world-based science (e.g., astronomy, ecology),
Computational sciences (e.g., statistics, applied mathematics, computer science, informatics), and
Psychology.

When discussing the diversity of definitions that @benmarwick mentions, we can look at this clustering and see which group each usage falls in.

I already gave a dozen+ references for computational sciences.

In epidemiology and social science, the meaning is consistent with Claerbout/Donoho/Peng—cf. Peng, Dominici & Zeger (2006), on epidemiology, and the NSF 2015 [PDF] report for social sciences. In clinical research, there is "no clear consensus as to what constitutes a reproducible study" (Goodman et al., 2016) but the usage of the terms is consistent: one replicates the findings (while reproducibility refers to the process of investigations).

For the group of natural world-based sciences, I don't have in my notes references for astronomy or ecology (yet), but we heard from @benmarwick that the usage in archaeology is consistent with the above.

The pattern of usage is clear: reproducible study and replicable findings.

khinsen · 2016-08-23T15:11:33Z

While I agree that IVM is not the last word on terminology for science at large, I don't consider it absurd either to turn to it for "prior art" in choosing terms. Computational science has different issues than experimental science, but in the end, both are forms of doing science and their practitioners should be able to talk to each other. It makes more sense to me to extend the traditional terms from experimental science to computational scenarii where this is possible.

jsta · 2016-09-14T01:01:26Z

I wanted to follow-on from @labarba's most recent comment with a reference from natural-world sciences (ecology):

Cassey, P. and Blackburn, T.M., 2006. Reproducibility and repeatability in ecology. BioScience, 56(12), pp.958-959.

It seems that they follow the most common and widespread use of the terms detailed in @labarba's comprehensive summary except that they switch out the term Replicability for Repeatability.

labarba · 2016-11-01T13:32:35Z

I published this on Medium: "Barba group reproducibility syllabus"

It's not addressing terminology, but rather a summary of the top-10 references chosen in my group as the basic reading list on reproducibility. Topical to this thread as a complement of the lit review I started above.

rougier · 2016-11-02T06:33:59Z

I just added a link on: http://rescience.github.io/about/

heplesser · 2017-06-26T08:30:59Z

I only became aware of this discussion a few days ago. Even though the trains has left the station a while ago, I would like to add a few comments.

That terminology (reproducibility: re-run the same code; replicability: independent implementation) jars with my general (intuitive) understanding of the terms and was probably the reason why I based the terminology proposed in Crook, Davison and Plesser (2013) on Drummond (2009), while otherwise disagreeing with Drummond. Merriam-Webster differentiates "reproduction" and "replication" as follows (see Synonym Discussion section of the entry):

"reproduction implies an exact or close imitation of an existing thing. ... replica implies the exact reproduction of a particular item in all details ... but not always in the same scale."

A reproduction is a "exact or close imitation", while a replica is an "exact reproduction in all details", thus a replica is a kind of reproduction that is particularly close to the original. Since re-doing the same study running the same software on the same data is closer to the original than an independent implementation, common usage, in my opinion, suggest that "replication" fits better for re-running using the same software as the original.

Furthermore, a quick Google search turned up 18.4 million hits for "reproducible research", but only 0.5 million for "replicable research". So "reproducible" seems to be the far more common term. Now I would think that the (scientific) public at large is first and foremost interested in whether we can trust scientific results, whether they are robust overall, reveal the laws of nature---whether they can be corroborated by independent experimentation. In view of this, is it really sensible to narrow "reproducing" to mean the ability to "running the same software on the same input data and obtaining the same results"?

In the pioneering work of Claerbout's group (Claerbout and Karrenbach, 1992, Claerbout, undated, Schwab et al, 2000), I haven't found any discussion of why they chose the term "reproducible" for their approach. I wish they had chosen differently, so that the rather young (25 years) reproducible research movement had not ended up with a terminology at odds with the significantly older metrology.

labarba · 2017-06-26T09:51:08Z

A reproduction is a "exact or close imitation", while a replica is an "exact reproduction in all details" …

While you place the emphasis on the phrase "in all details," I could place it on "exact" to make the same argument for "reproduction" instead of "replication." In the end, it is seldom practical to try to get help from the dictionary for discussions about terms of art.

Reproducibility is a spectrum of concerns. A most basic question is: can you run my code with my data and get my results? This is the minimum requirement, and often referred to as "reproducible research." I would wager that's why the search results for "reproducible research" are most numerous. Replications, in the sense of Peng and others, are (unfortunately) quite rare. But reproducible research is a pre-requisite for replication studies, because if replication fails—as Donoho points out—only if both author teams worked reproducibly is it possible to find the source of any discrepancies.

khinsen · 2017-06-26T11:03:46Z

Personally I don't care much about vague analogies to dictionary definitions that were clearly not written with research in mind. But I do agree with @heplesser's argument about the use of "reproducible" in a wider sense, applying to science in general rather than to the specific problems of computer-aided research.

I suspect that Claerbout's choice of the term "reproducible research" was meant to be provocative. All scientific research is supposed to be reproducible (in an ideal world), so what he was arguing for was "merely" to adopt this criterion for computer-aided research as well. Back then, 25 years ago, nobody discussed non-reproducibility in experimental contexts.

Today, "reproducibility" in a less well-defined sense has become a widespread concern. Most modern uses of the term can clearly not be interpreted in Claerbout's sense, because they don't refer to computation. In the long run, I doubt computational scientists will be able to claim the "reproducibility" label for their particular rather technical issue. But this question won't be settled before the scientific community at large, which is dominated by experimentalists, understands the various issues and agrees on a common terminology. I expect this to take at least another decade, during which ReScience can become anything from a mainstream journal to a relic of the past.

In the meantime, the definitions we currently use in ReScience are clear and have some historical justification, which is good enough for me.

labarba · 2017-06-26T11:40:37Z

the scientific community at large, which is dominated by experimentalists

At the risk of being argumentative by going off-topic, I can't help but react to this statement with a bit of skepticism, given the results of: S.J. Hettrick et al. (2014), UK Research Software Survey doi:10.5281/zenodo.14809

92% of academics use research software
69% say that their research would not be practical without it
56% develop their own software

khinsen · 2017-06-26T12:06:50Z

@labarba The fact that most scientists use software does not imply that the use of software is seen as a major cause of non-reproducibility by them. My impression of the discussion of the "reproducibility crisis" in the literature is that it focuses on statistical issues, such as insufficient sample size in experiments or p-hacking in data analysis.

hlageek · 2022-01-30T10:07:12Z

Prompted by @oliviaguest on this Twitter discussion, I would like to bring to attention more recent (2019) literature Reproducibility and Replicability in Science (https://doi.org/10.17226/25303). This "Consensus Study Report" tackles the terminological confusion on p. 42-46 and arrives at the following definition on p. 46:

CONCLUSION 3-1: For this report, reproducibility is obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with “computational reproducibility,” and the terms are used interchangeably in this report.
Replicability is obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

While the report itself admits varied usage of the terms, such consensus reports, which have been compiled across many disciplines, are good rally points for anchoring terminology. The lack of consistency in terminology hinders progress of the topic and if the ReScience journal takes a stance, it may further help the community by stabilizing the volatile terminology.

khinsen · 2022-01-31T07:16:12Z

@hlageek ReScience has adopted the definitions you quote a while ago. I hope we use them consistently. If you find an incompatible use, please open an issue!

oliviaguest · 2022-01-31T10:59:48Z

@khinsen might it be useful to link to some of these on the website where we discuss/define it? http://rescience.github.io/faq/

olexandr-konovalov mentioned this issue May 23, 2016

R-words: getting the terminology right olexandr-konovalov/gnu#73

Open

oliviaguest mentioned this issue Jun 14, 2016

References #2

Open

labarba mentioned this issue Aug 21, 2016

Reproducible study, replication of findings: usage in ReScience guidelines ReScience/ReScience#30

Open

labarba mentioned this issue Jun 22, 2017

Check for you affiliation and ORCID number and draft ReScience/ReScience-article-2#8

Closed

R-words #5

R-words #5

Comments

rougier commented May 6, 2016 • edited Loading

oliviaguest commented May 6, 2016

rougier commented May 6, 2016 • edited Loading

oliviaguest commented May 6, 2016

rougier commented May 6, 2016

rougier commented May 6, 2016

rougier commented May 6, 2016

oliviaguest commented May 6, 2016 • edited Loading

oliviaguest commented May 6, 2016

khinsen commented May 9, 2016

khinsen commented May 9, 2016 • edited Loading

khinsen commented May 9, 2016 • edited Loading

khinsen commented May 9, 2016

khinsen commented May 9, 2016

oliviaguest commented May 9, 2016 • edited Loading

oliviaguest commented May 9, 2016

khinsen commented May 9, 2016

khinsen commented May 9, 2016

oliviaguest commented May 9, 2016

khinsen commented May 9, 2016

khinsen commented May 9, 2016

oliviaguest commented May 9, 2016 • edited Loading

khinsen commented May 9, 2016

jsta commented May 10, 2016

khinsen commented May 10, 2016

jsta commented May 18, 2016

oliviaguest commented May 18, 2016

khinsen commented May 18, 2016

gdetor commented May 18, 2016

oliviaguest commented May 23, 2016

khinsen commented May 23, 2016

oliviaguest commented May 23, 2016

khinsen commented May 23, 2016

oliviaguest commented May 23, 2016

oliviaguest commented May 23, 2016

oliviaguest commented May 23, 2016

khinsen commented May 24, 2016

oliviaguest commented May 24, 2016 • edited Loading

oliviaguest commented May 24, 2016

khinsen commented May 26, 2016

rougier commented Jun 30, 2016

labarba commented Aug 21, 2016 • edited Loading

benmarwick commented Aug 21, 2016

labarba commented Aug 21, 2016

khinsen commented Aug 23, 2016

jsta commented Sep 14, 2016 • edited Loading

labarba commented Nov 1, 2016

rougier commented Nov 2, 2016

heplesser commented Jun 26, 2017

labarba commented Jun 26, 2017

khinsen commented Jun 26, 2017

labarba commented Jun 26, 2017

khinsen commented Jun 26, 2017

hlageek commented Jan 30, 2022

khinsen commented Jan 31, 2022

oliviaguest commented Jan 31, 2022

rougier commented May 6, 2016 •

edited

Loading

rougier commented May 6, 2016 •

edited

Loading

oliviaguest commented May 6, 2016 •

edited

Loading

khinsen commented May 9, 2016 •

edited

Loading

khinsen commented May 9, 2016 •

edited

Loading

oliviaguest commented May 9, 2016 •

edited

Loading

oliviaguest commented May 9, 2016 •

edited

Loading

oliviaguest commented May 24, 2016 •

edited

Loading

labarba commented Aug 21, 2016 •

edited

Loading

jsta commented Sep 14, 2016 •

edited

Loading