-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation checklist for scientific code repositories #2
Comments
The codecheck project has a quite formalized process with a so-called manifest file: https://codecheck.org.uk/guide/community-workflow#requirements Here's a checklist for machine learning papers: https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501 And here's a guide specific to Python: https://docs.python-guide.org/writing/structure/ |
Thank you! The Papers with Code checklist and template are amazing, it really makes it easy to find the relevant files and re-run the code. The codecheck reminded me of three software journals/collections:
|
The important thing is that the repositories shared via ROpenSci/pyOpenSci/JOSS are likely a bit easier better built (aimed at re-use by others, as this is how those get citations) and are usually used by the researchers with sufficient background to get by and find their way around weirdly/inconveniently structured repo (e.g. where GitHub search fail they can clone and grep easily). The bigger problem are repositories with analysis papers where the target audience is a PhD student/postdoc who often only knows one (statistical) programming language and maybe even only to a degree allowing to do their analysis, but not necessarily to understand someone else's code, or wrap their head around the current software dev practices/tooling (I know a lot of excellent scientists who are like that). There is also a group of repositories which lays in between a re-usable code and analysis code. I call it MatLab, but really in covers other languages too; these are specialised languages which are unlikely to be re-used by researchers outside of departments having relevant licence and expertise, yet the authors seem to think that many researchers will re-use them (but oftentimes do not document them sufficiently). I believe there is a small group of methodologists who will in fact use that (MatLab or other) code and a larger group who just want to use it as a basis for re-implementing/understanding the algorithm by analysing of the code (for which MatLab is often a good choice!). |
I wonder if anyone came across a checklist describing how to prepare a code repository before sharing it in a paper? I know of the Ten simple rules for documenting scientific software list which touches on some good practices which could reduce the problem of being unable to gather what is happening in others code, but it is oriented towards re-usable software, while a lot of the worst examples of repositories are for the papers where the author does not expect their code to be re-used (i.e. it is there only to document that they did an analysis/performed a simulation, etc.).
Certainly https://the-turing-way.netlify.app/ made a lot of effort to make research reproducible and encourage minimal reasonable practices, such as file naming, linting, and importantly repository organization.
Do you know of other resources targeted at researchers sharing their small software/analysis code which would encourage best practices such as:
The text was updated successfully, but these errors were encountered: