-
Notifications
You must be signed in to change notification settings - Fork 0
Updates in CleanTS Package
Time series analysis is an important technique for analyzing data that changes over time. Univariate time series analysis, which focuses on analyzing the behavior of a single time series, has been well-studied and is widely used in many fields, including finance, economics, and engineering. However, in many real-world applications, it is often necessary to analyze multiple time series together. Multivariate time series analysis, which focuses on analyzing the relationships between multiple time series, is, therefore, an important area of research.
R is a popular programming language for time series analysis, with many packages available for both univariate and multivariate time series analysis. However, there is still a need for more packages that can handle complex and large-scale multivariate time series data.
There are several R packages that provide functionality for multivariate time series analysis, including "mtsdi" and "MTS". However, these packages may not be suitable for large-scale datasets. The CleanTS R package is a package for univariate time series analysis that provides functions for data cleaning, time series decomposition, and anomaly detection, among others. The package is designed to be user-friendly and to provide fast and accurate results. However, the package does not currently support multivariate time series analysis.
There are several existing packages that provide functionality for multivariate time series analysis in R, including "tsDyn" and "vars". These packages provide a range of functionality for multivariate time series analysis, including dynamic regression models, vector autoregressive models, and Granger causality tests. However, there is a need for a new package that is specifically designed for large-scale multivariate time series analysis.
Another relevant package is "imputeTestbench", which provides a suite of functions for imputing missing values in time series data. While this package is not specifically designed for multivariate time series analysis, it provides useful functionality for handling missing data in time series, which is often a challenge in multivariate time series analysis.
The proposed project, to modify the CleanTS package to handle multivariate time series data, will fill this gap and provide a new package for multivariate time series analysis that is fast, accurate, and user-friendly, while also handling missing data. The new package will provide a comprehensive set of functionality for multivariate time series analysis, including data cleaning, decomposition, anomaly detection, and other useful techniques for analyzing multivariate time series data.
The goal of this project is to extend the functionality of the CleanTS package to support multivariate time series analysis. The specific tasks involved in this project include:
- Designing and implementing a new data structure to store multivariate time series data. This should include metadata for each time series, such as the variable name, units, and description.
- Modifying the existing functions in the package to work with the new multivariate time series data structure. This may involve changing the arguments and return values of the functions, as well as adding new functions to handle multivariate time series data.
- Creating new functions for multivariate time series analysis, such as cross-correlation, cross-spectral analysis, and Granger causality analysis.
- Updating the package documentation to include information on how to use the new multivariate time series functionality, as well as any changes to existing functions.
- Testing the modified package to ensure that it works correctly with both univariate and multivariate time series data. This may involve creating new test cases and updating existing ones.
By the end of this project, we expect to have a modified version of the CleanTS package that can handle multivariate time series data, along with updated documentation and test cases. The new functionality should be fully integrated with the existing CleanTS package, and provide users with a powerful tool for multivariate time series analysis and cleaning.
EVALUATING MENTOR: Neeraj Dhanraj Bokde, Assistant Professor, Center for Quantitative Genetics and Genomics, Aarhus University, Denmark and Senior Researcher at DEWA R&D, Dubai. [email protected]. Neeraj is Ph.D. in Data Science and contributed several R packages related to time series analysis, testbenches, and domain-specific ones. Neeraj has been a GSOC mentor since 2020. https://www.neerajbokde.in/
Contributors, please do one or more of the following tests before contacting the mentors above.
Students, please do one or more of the following tests before contacting the mentors above.
-
Easy: Download the cleanTS package and demonstrate it with a naturally occurring time series. Document it with RMarkdown.
-
Medium: Suggest possible updates or a new feature you would like to include in the next version of the cleanTS package.
-
Hard: Develop a dummy code of 5 functions and a vignette and pass it with no Error/Warning/Note through https://win-builder.r-project.org/
Contributors, please post a link to your test results here.
- EXAMPLE CONTRIBUTOR 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.
Contributor Name | GitHub Profile | Test Results |
---|---|---|
Siddharth Pathak | https://github.com/SiddharthanilPathak | https://github.com/SiddharthanilPathak/GSoC--2023-Test-Results |
Pranay Agrawal | https://github.com/pranayx | https://github.com/pranayx/Pranay_Agrwala_GSoC_2023_Tests |