-
Notifications
You must be signed in to change notification settings - Fork 7
hyperSpec
Package hyperSpec provides "infrastructure" for working with spectroscopic data in R, e.g.
- import functions for various proprietary file formats
- plotting functions
- functions that allow seamless or almost seamless use of hyperSpec objects with models as
pls::plsr()
,MASS::lda()
etc. - arithmetic functions that allow typical preprocessing done with spectra such as intensity normalization.
Over the years, some parts of hyperSpec have seen a steady growth, in particular the file import functions. Unfortunately, this has lead to hyperSpec having many dependencies as well as a large base of test data (i.e. spectra files in a wide variety of proprietary [often binary] file formats), making hyperSpec not as easy to maintain as it could and should be:
- we had to switch to
git-lfs
for the test files as the git repo became too large - this causes a steep learning curve for potential contributors. - File import tests and related vignettes are built offline, and not checked on CRAN.
- Having a large number of dependencies makes hyperSpec vulnerable:
We recently had a situation where a test on CRAN failed and as we were fixing this and getting ready to submit an update to CRAN, a dependency for a file import function became orphaned. We could not submit a fix for the first issue as long as the second was not fixed. Checking with our users that the proposed change in the dependency would not break their code took a while, so hyperSpec was archived on CRAN (thrown out) for some weeks. It would have been easily possible to deal with each of the two issues separately within the time frame granted by CRAN - but the combination caused a "lock down".
The aim of this GSoC proposal is three-fold:
- Making hyperSpec easier to maintain by outsourcing e.g. file import functionality into specialized small packages.
- Shielding hyperSpec against breaking due to changes in a dependency.
- Provide better integration with other relevant existing packages with "bridging" packages.
We'd like to better integrate some of the following packages with hyperSpec:
-
Packages providing preprocessing for spectra: baseline and EMSC
Claudia has contact to their creator/maintainer Kristian. -
ggplot2 and tidyverse: hyperSpec has rudimentary
qplot()
functionality, and we recently started a hyperSpec.tidyverse package to fortify hyperSpec for use with dplyr and magrittr functions. -
File import: readJDX(maintained by Bryan)
There are a few packages that one may use instead of hyperSpec, but they are less extensive and instead specialized on particular applications or particular types of spectroscopy. Bryan maintains a long list of FOSS packages for spectroscopy.
There are several possibilities from which the student can choose:
- Move file import functions out of hyperSpec into new packages and possibly implement import filters for new file formats.
- Fortify hyperSpec so that it integrates well with tidyverse (dplyr, magrittr).
- Provide spectroscopy-related functionality, i.e. integration with baseline and EMSC packages
- Provide integration with matrixStats
As this is quite modular, different parts can also be combined.
This project should produce several small packages which provide two enhancements to the spectroscopy community:
- Small packages are easier to install, use and maintain than one big hyperSpec with lots of dependencies: they "shield" hyperSpec from dependency changes.
- Enhanced functionality for packages that "bridge" hyperSpec with other packages such as baseline or EMSC.
Students, please contact us in the hyperSpec GSoC 2020 issue after completing at least one of the tests below.
- Claudia Beleites ([email protected]) - creator of hyperSpec, chemist/spectroscopist, mentored with R/GSoC several times
- Bryan Hanson, EVALUATING, creator of readJDX and Chemospec, mentored with R/GSoC several times (incl. together with Claudia)
- Roman Kiselev ([email protected]) - contributor to hyperSpec, engineer/spectroscopist, also Python expert.
Please contact us if you are stuck with your task. These tests are unlike an exam in that there is no penalty to communicating with us mentors: on the contrary, good communication is one of the key aspects of a good Google Summer of Code.
- Install hyperSpec, covr and lintr from CRAN and hyperSpec.tidyverse from github.
-
hyperSpec.tidyverse has some issues marked as good first issue. Note in the issue thread that you'll tackle this, fork the repo and write code, documentation, unit test and a brief explanation how to use this in the vignette and submit a pull request.
-
Fork hyperSpec or any of the packages in
r-hyperspec/
from github, use covr to find some function that does not yet have unit tests and write a unit test for one of these functions. These packages have their unittests in the.R
files after the respective function definition, using a custom functiontest<-
to attach them to the function in question. Unit tests for file import functions count as very hard, see below - but for other functions the testing can be done without the need to set upmake
etc.
- Set up a github repo and a package skeleton and copy file import code, test files and if available also unit test code for one particular file format into the new package. Write one (additional) unit test.
Students, please post a link to your test results here.
- EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.