Skip to content

imputeTestbench for multivariate time series

Pranay0495 edited this page Mar 25, 2023 · 3 revisions

Background

Data Cleaning is one of the important and time-consuming steps in the Data Science and Data Analytics field. There are numerous methods, models, and algorithms for data cleaning processes that might be categorized under imputation, outlier detections, formatting, and visualizations among others. The process of evaluating these methods for a given dataset is challenging considering the volume of the dataset and the time/space complexities of the methods. Automation of the performance evaluation process can lead to a significant reduction in human efforts and time consumption with an unbiased comparison environment. GSoC-2021 produced an R package, named cleanTS (Publication) that has a huge potential in cleaning large time series in an efficient, accurate, and unbiased manner, which also reduced the human efforts and intervention in the process. This package is getting popular being a handy tool along with its capabilities in handling several anomalies in the time series simultaneously. Handling the missing values and patterns in the time series dataset is one of the crucial processes in the cleanTS package, and it is handled mostly with the imputeTestbench (Publication) package. An imputeTestbench package is an autoML tool that automates the process of performance evaluation and comparison of imputation methods for a given time series dataset at different scenarios. Again this tool has been used by several research teams considering its capabilities such as artificially generating missing patterns in the time series and evaluating multiple imputation methods simultaneously. In the present form, the imputeTestbench package is capable of handling the time series or temporal datasets. This tool in the present format is applicable only for univariate time series and has a huge potential in handling multivariate and multidomain time series datasets. Also, the computation performance of the imputeTestbench package can be enhanced by introducing parallel processing and high-performance computing concepts. Considering these as a motivation, we are inviting a contributor who would like to modify the imputeTestbench package for the multivariate time series datasets with better computational capabilities.

Related work

The proposed package will be the modification of the imputeTestbench package and will be made adaptive so that it can be integrated with cleanTS package.

Details of your coding project

The goal of this coding project is to upgrade the 'imputeTestbenchG', an R package. The expected tasks for this project are as follows:

  • Understand the concept of the existing 'imputeTestbench' package and other time series-related AutoML tools.
  • The package needs to be adapted to work with the multivariate time series datasets from multiple domains.
  • Currently, the imputeTestbench package uses the base R functions and data structures. The performance of the package can be improved by switching to data.tables and integrating it with Apache Spark (or a similar system). Further, the performance can be improved by using parallel processing.
  • Introduce/Embed several state-of-the-art methods for time series imputation into the existing 'imputeTestbench' package.
  • The R’s shiny package is a framework that allows creating web applications. Having a dashboard for the package makes it easy to use and display output more intuitively. This also removes the programming dependency for the package.

Expected impact

This project will introduce a new R package that can be a stepping stone in AutoML applications in time series data imputation processes.

Mentors

EVALUATING MENTOR: Neeraj Dhanraj Bokde, Assistant Professor, Center for Quantitative Genetics and Genomics, Aarhus University, Denmark. [email protected]. Neeraj is Ph.D. in Data Science and contributed several R packages related to time series analysis, testbenches, and domain-specific ones. Neeraj has been a GSOC mentor since 2020. https://www.neerajbokde.in/

Tests

Contributors, please do one or more of the following tests before contacting the mentors above.

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: Download the imputeTestbench package and demonstrate it with a naturally occurring time series. Document it with RMarkdown.

  • Medium: Suggest possible updates or a new feature you would like to include in the next version of the imputeTestbench package.

  • Hard: Develop a dummy code of 5 functions and a vignette and pass it with no Error/Warning/Note through https://win-builder.r-project.org/

Solutions of tests

Contributors, please post a link to your test results here.

  • EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Contributor Name GitHub Profile Test Results
Avinab Neogy Github Profile Test Results
------------------ ----------------------------------------------------- ---------------------------------------------------------
Pranay Jajodia Github Profile Test Results