ruptures for change point detection

Background

ruptures (1.7k stars on github as of Mar 2025) is a popular python implementation of classic and state-of-the-art change-point detection algorithms, which are reviewed in Selective review of offline change point detection methods. Rebecca Killick maintains a helpful web page with references.

Related work

The Time Series CRAN Task View has a section reviewing R packages for change-point detection. However, there are drawbacks to existing R/python packages.

changepoint and ruptures: several common algorithms, but current implementations do not achieve asymptotically optimal time complexity.
gfpop, PeakSegOptimal, binsegRcpp: efficient C++ code, but R API not consistent with other tools.

Details of your coding project

The goal of this project is to re-implement classic (binary segmentation) and state-of-the-art (PELT, FPOP) change-point detection in modern C++ (using, for instance, Armadillo), which can be interfaced with R (and eventually Python).

Re-implement core algorithms in C++
- common distributions (normal, Poisson, ...)
- Dynamic programming
  - PELT for multi-variate data.
  - FPOP for uni-variate data.
- Binary segmentation
R package with consistent API to access each algorithm.
Documentation
vignettes
tests

Expected impact

Ruptures is extremely popular so this project could have a large impact.

Mentors

Contributors, please contact the mentors below after completing at least one of the tests below.

EVALUATING MENTOR: Charles Truong is the maintainer of ruptures in Python.
Toby Hocking [email protected] is the author of numerous R packages, and has been a mentor/admin for R-GSOC since 2013.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

Easy: Write an R function getCumsum
- Input is $X$ , an array of size TxD containing doubles.
- Output is $Y$ , an array of size (T+1)xD such that the first row is filled with 0s and $Y [i] = \sum_{j < i} X [j]$ . Here, $Y [i]$ is the i-th row of $Y$ .
- Write a C++ implementation using Armadillo and a R wrapper using RcppArmadillo.
Medium: Write a C++ class Cost
- This class is constructed from a double array $X$ of size TxD.
- Add a member function double eval(int start, int end) which returns $\sum_{t = start}^{end - 1} {| X [t] - m |}^{2}$ where $| \cdot |$ is the Euclidean norm and $m$ is the empirical mean of $X$ on the segment $[i, j [$ .
Hard: Write an efficient Cost::eval function with constant time complexity. Use the following relations to precompute relevant quantities.

$\sum_{t = start}^{end - 1} {| X [t] - m |}^{2} = (\sum_{t} {| X [t] |}^{2}) - ({| \sum_{t} X [t] |}^{2} / (end - star))$

and

$\sum_{t = start}^{end - 1} {| X [t] |}^{2} = (\sum_{t = 1}^{end - 1} {| X [t] |}^{2}) - (\sum_{t = 1}^{start - 1} {| X [t] |}^{2})$

and

$\sum_{t = start}^{end - 1} X [t] = (\sum_{t = 1}^{end - 1} X [t]) - (\sum_{t = 1}^{start - 1} X [t]) .$

Solutions of tests

Contributors, please post a link to your test results here.

Jiayi Qian - Github link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly