Raising the ClaSS of Streaming Time Series Segmentation

This is the supporting website for the VLDB paper "Raising the ClaSS of Streaming Time Series Segmentation". It contains the used source codes, the data sets, raw results, and analysis notebooks. It reflects the state of the paper for reproducibility and is purposely not further updated.

Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 538 data points per second for the Apache Flink streaming engine.

student_commute.mp4

Benchmark Results

We have evaluated ClaSS and eight competitors on 107 benchmark and 485 data archive time series from experimental studies. The following table summarises the average Covering performance (higher is better) and the corresponding wins / ties. More details are in the paper. The raw measurements are here and analysis Jupyter notebooks are here.

Segmentation Algorithm	Average (in %)	Std. Dev. (in %)	Wins & Ties (in %)
ClaSS	81.2 / 51.5	19.0 / 17.1	72.9 / 46.8
ChangeFinder	47.3 / 42.3	23.5 / 19.7	11.2 / 19.6
FLOSS	52.1 / 35.6	22.7 / 13.0	11.2 / 9.3
Window	46.1 / 29.1	24.7 / 27.7	11.2 / 13.4
DDM	53.5 / 26.2	16.9 / 24.5	9.3 / 8.5
BOCD	48.1 / -	19.0 / -	7.5 / -
ADWIN	38.3 / 26.2	20.6 / 20.5	3.7 / 5.2
HDDM	36.5 / 24.6	24.8 / 18.5	4.7 / 4.3
NEWMA	43.4 / 21.5	20.6 / 26.2	8.4 / 9.3

Organisation

This repository is structured in the following way:

benchmark contains the source codes used for running the paper experiments.
datasets consists of the TSSB benchmark data sets.
experiments contains the raw measurement results for ClaSS and the competitors.
figures includes the paper plots, generated by the Jupyter notebooks.
videos includes the paper videos, generated by the animation code.
notebooks consists of Jupyter notebooks, used to download data sets and analyse results.
src contains the sources codes for ClaSS, the competitors and utility methods.

Installation

You can download this repository (by clicking the download button in the upper right corner). As this repository is a supporting website and not an updated library, we do not recommend to install it! Extract or adapt code snippets of interest. We are currently working on integrating ClaSS as a part of the maintained and updated claspy library.

Citation

If you want reference our work in your scientific publication, we would appreciate the following citation:

@article{Ermshaus2024ClaSS,
  title={Raising the ClaSS of Streaming Time Series Segmentation},
  author={Arik Ermshaus and Patrick Sch{\"a}fer and Ulf Leser},
  journal={Proceedings of the VLDB Endowment},
  volume={17},
  number={8},
  pages={1953--1966},
  year={2024},
  publisher={VLDB Endowment}
}

Resources

The sources codes for the competitors in the benchmark evaluation come from multiple authors and projects. We list here the resources we used (and adapted) for our experiments:

Evaluation Metrics (https://github.com/alan-turing-institute/TCPDBench)
Change Finder (https://github.com/nel215/change_finder)
FLOSS (https://stumpy.readthedocs.io/)
Window (https://centre-borelli.github.io/ruptures-docs/)
BOCD (https://github.com/y-bar/bocd)
DDM, ADWIN & HDDM (https://scikit-multiflow.readthedocs.io/)
NEWMA (https://github.com/lightonai/newma)

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
benchmark		benchmark
datasets		datasets
experiments		experiments
figures		figures
notebooks		notebooks
src		src
videos		videos
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
technical_report.pdf		technical_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Raising the ClaSS of Streaming Time Series Segmentation

Benchmark Results

Organisation

Installation

Citation

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ermshaua/classification-score-stream

Folders and files

Latest commit

History

Repository files navigation

Raising the ClaSS of Streaming Time Series Segmentation

Benchmark Results

Organisation

Installation

Citation

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages