-
Notifications
You must be signed in to change notification settings - Fork 6
ruptures for change point detection
ruptures (1.7k stars on github as of Mar 2025) is a popular python implementation of classic and state-of-the-art change-point detection algorithms, which are reviewed in Selective review of offline change point detection methods. Rebecca Killick maintains a helpful web page with references.
The Time Series CRAN Task View has a section reviewing R packages for change-point detection. However, there are drawbacks to existing R/python packages.
- changepoint and ruptures: several common algorithms, but current implementations do not achieve asymptotically optimal time complexity.
- gfpop, PeakSegOptimal, binsegRcpp: efficient C++ code, but R API not consistent with other tools.
The goal of this project is to re-implement classic (binary segmentation) and state-of-the-art (PELT, FPOP) change-point detection in modern C++ (using, for instance, Armadillo), which can be interfaced with R (and eventually Python).
- Re-implement core algorithms in C++
- common distributions (normal, Poisson, ...)
- Dynamic programming
- PELT for multi-variate data.
- FPOP for uni-variate data.
- Binary segmentation
- R package with consistent API to access each algorithm.
- Documentation
- vignettes
- tests
Ruptures is extremely popular so this project could have a large impact.
Contributors, please contact the mentors below after completing at least one of the tests below.
- EVALUATING MENTOR: Charles Truong is the maintainer of ruptures in Python.
- Toby Hocking [email protected] is the author of numerous R packages, and has been a mentor/admin for R-GSOC since 2013.
Contributors, please do one or more of the following tests before contacting the mentors above.
- Easy: Write an R function
getCumsum
- Input is
, an array of size TxD containing doubles. - Output is
, an array of size (T+1)xD such that the first row is filled with 0s and . Here, is the i-th row of . - Write a C++ implementation using Armadillo and a R wrapper using RcppArmadillo.
- Input is
- Medium: Write a C++ class
Cost
- This class is constructed from a double array
of size TxD. - Add a member function
double eval(int start, int end)
which returnswhere is the Euclidean norm and is the empirical mean of on the segment .
- This class is constructed from a double array
- Hard: Write an efficient
Cost::eval
function with constant time complexity. Use the following relations to precompute relevant quantities.
and
and
Contributors, please post a link to your test results here.
Minh Long Nguyen | Github | Solutions for all 3 tests
Dev Goel | Github | Solutions
Olivier Boulant | Github | Solutions
Jiayi Qian - Github link