Skip to content

Using GitHub for research projects

Pete Bachant edited this page May 6, 2016 · 5 revisions

Pros

  • Track changes to all files: software, notes, etc. (really just a benefit of Git)

  • Issue tracker provides a history of though processes and conversation behind solving problems

  • Easy to create a website for each project by pushing an index.html file to the gh-pages branch

  • Can use Git Large File Storage to version large files without taking up a huge amount of disk space

  • README.md provides a nice "home base" for each project, stating its purpose, how it works, etc.

  • Issue tracker can be made into a board using https://waffle.io

  • Pull requests allow others to suggest improvements without needing write access

  • Can experiment with software changes (using branches) without losing any work

  • Issue tracker can be used to automatically document how issues were solved by linking to commits, e.g., by putting issue numbers in commit messages:

    git commit -am "Fix bug in calculating C_P; resolves #52"

Cons

  • Can't diff/merge binary files, e.g., SolidWorks models, Word documents, Excel spreadsheets

Methods

Dealing with CAD files

These can be kept on some other cloud drive, e.g., Dropbox, and linked in the README. Once Onshape has a few more features, it will be a viable alternative to local CAD software.

Raw data and simulation results

These usually shouldn't be committed to the repo unless they are very small in size. They can be zipped up and put on a cloud drive or Figshare, then the repo can contain a script or function for downloading the raw data.

Experiment repos versus paper repos

Currently, I create a repo for each experiment, and another for each paper about that experiment. Others may like to include everything in one. I haven't done this yet since there have sometimes been multiple papers about one experiment, and including all in there may detract from the experimental repo's purpose for disseminating the data/software.

One technique I have played with a little is adding experiment repos as submodules to paper repos. This is especially helpful if working with multiple experiments. Submodules also track which version of the experiment repo was used for each paper automatically, which is convenient. See https://github.com/petebachant/CFT-Re-dep-paper for an example of submodule usage in a paper repo.

Figures

If figure files are small, there isn't much harm in committing them, though it's probably not the best practice. A better alternative would be to include submodules of the projects used to create the figures, so they can be created automatically.

Clone this wiki locally