Skip to content

ruptures for change point detection

Olivier Boulant edited this page Apr 7, 2025 · 8 revisions

Background

ruptures (1.7k stars on github as of Mar 2025) is a popular python implementation of classic and state-of-the-art change-point detection algorithms, which are reviewed in Selective review of offline change point detection methods. Rebecca Killick maintains a helpful web page with references.

Related work

The Time Series CRAN Task View has a section reviewing R packages for change-point detection. However, there are drawbacks to existing R/python packages.

Details of your coding project

The goal of this project is to re-implement classic (binary segmentation) and state-of-the-art (PELT, FPOP) change-point detection in modern C++ (using, for instance, Armadillo), which can be interfaced with R (and eventually Python).

  • Re-implement core algorithms in C++
    • common distributions (normal, Poisson, ...)
    • Dynamic programming
      • PELT for multi-variate data.
      • FPOP for uni-variate data.
    • Binary segmentation
  • R package with consistent API to access each algorithm.
  • Documentation
  • vignettes
  • tests

Expected impact

Ruptures is extremely popular so this project could have a large impact.

Mentors

Contributors, please contact the mentors below after completing at least one of the tests below.

  • EVALUATING MENTOR: Charles Truong is the maintainer of ruptures in Python.
  • Toby Hocking [email protected] is the author of numerous R packages, and has been a mentor/admin for R-GSOC since 2013.

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

  • Easy: Write an R function getCumsum
    • Input is X , an array of size TxD containing doubles.
    • Output is Y , an array of size (T+1)xD such that the first row is filled with 0s and Y [ i ] = j < i X [ j ] . Here, Y [ i ] is the i-th row of Y .
    • Write a C++ implementation using Armadillo and a R wrapper using RcppArmadillo.
  • Medium: Write a C++ class Cost
    • This class is constructed from a double array X of size TxD.
    • Add a member function double eval(int start, int end) which returns t = start end 1 | X [ t ] m | 2 where | | is the Euclidean norm and m is the empirical mean of X on the segment [ i , j [ .
  • Hard: Write an efficient Cost::eval function with constant time complexity. Use the following relations to precompute relevant quantities.

t = start end 1 | X [ t ] m | 2 = ( t | X [ t ] | 2 ) ( | t X [ t ] | 2 / ( end star ) )

and

t = start end 1 | X [ t ] | 2 = ( t = 1 end 1 | X [ t ] | 2 ) ( t = 1 start 1 | X [ t ] | 2 )

and

t = start end 1 X [ t ] = ( t = 1 end 1 X [ t ] ) ( t = 1 start 1 X [ t ] ) .

Solutions of tests

Contributors, please post a link to your test results here.

Minh Long Nguyen | Github | Solutions for all 3 tests
Dev Goel | Github | Solutions
Olivier Boulant | Github | Solutions

Clone this wiki locally