Develop single-dimensional matrix profile for multi-dimensional time series #634
Replies: 8 comments 7 replies
-
@NimaSarajpoor There's a lot to unpack here but here are a couple of points off of the top of my head:
Maybe you can elaborate on what is currently missing as I may not be understanding your point clearly? |
Beta Was this translation helpful? Give feedback.
-
Adding @SaVoAMP @mihailescum to this conversation as they have thought about this a lot and may have some comments to contribute! |
Beta Was this translation helpful? Give feedback.
-
Agree. I just wanted to share my idea before I forget about it.
Thanks for sharing the link! I took a look and there are some similarities for sure. As provided in here:
IF I UNDERSTAND CORRECTLY: What I am proposing is to simply keep
Instead of Maybe I should find some 2D or 3D data to test it out and see if it gives me new/interesting insight about the data! I believe it gives me something new! However, I would like to test it in some real-world data @seanlaw @SaVoAMP @mihailescum do you have any suggestion for 2D or 3D data?
That is not my proposal. I should have been more clear. Let's say D1 and D2 are two distance profiles of S1 and S2 at an index Note that we can use this for query matching as well! So, if I have 2D query Is this reasonable when we have huge number of dimensions? Probably not. Because, as mentioned in the Eamon's paper, there might be some noise in some dimension of data. Is it better than multi-dimensional matrix profile? I do not want to use the term "better". It may have its own advantage. I need to use it to see what kind of insight I can get from data that is different than @seanlaw |
Beta Was this translation helpful? Give feedback.
-
I was working with a three-dimensional boxing data set (consisting of acceleration data of 8 different boxers) that is also labeled. I have found here that relatively similar results emerge when examining the data in one, two, or all three dimensions for a punch motif. I could obtain nice results, especially when concatenating all the punches of a boxer of the same type (for example, only frontal punches with the left hand), so that I only had to look for a single motif. However, even with a 30-dimensional data set for analyzing human motion I could obtain relatively good results with different choices of |
Beta Was this translation helpful? Give feedback.
-
@SaVoAMP |
Beta Was this translation helpful? Give feedback.
-
@NimaSarajpoor I think this is probably better suited for the Discussion section as it isn't quite an issue with the existing code and, instead, is a atypical and yet-to-be-confirmed case. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Motivation
I have noticed that there are some interest in using matrix profile in multi-dimensional time series. We can think of it as:
I believe the first one is tackled in Matrix Profile VI, and its tutorial is under progress in PR #557 .
In the paper Matrix Profile VI, we can see the following note:
There are a few things to notice:
So, I think it is worth it to have support for single-dimensional matrix profile for multi-dimensional time series data.
Challenge-(I)
(this is based on what I read about generalizing single-dimensional matrix profile to multi-dimensional time series data. I do not remember the source though. I think it was from Eamonn Keogh.)
One of the main challenges is how to combine m distances across m-dimensional time series data.
Example:
Let's say we have two-dimensional data: T1 (first dimension) and T2 (second dimension). And, let's focus on subsequences at index i and j. Therefore:
But, what is that function
f
for combining the two distancesd1
andd2
(and gives one single value)? Two common options are:This two are equal only when
p=1
. Otherwise, they are different. Both seems reasonable approach. However, we can go with the second approach and provide a module for that instumpy
(see section "implementation" below)Challenge-(II)
Another challenge that I remember I read from the same source was how to avoid the domination of one dimension? well, this is probably an issue in non-normalized version. However, like many machine learning problems, it is usually up to the user on how to normalize time series. Maybe they normalize
T1
andT2
by their maximum. Or, they may standardize the WHOLE dataT1
and the WHOLE dataT2
. So, I believe this shouldn't be our concern.Implementation$(d^{p} + d '^{p})^{1/p}$ for p-norm non-normalized matrix profile. The challenge might be in using Pearson correlation in normalized version. However, there is a nice solution for that!
I think we can easily do
Note that the factor 2 can be eliminated because if we scale all pairwise distances by the same number, it does not change the result.
Therefore:
$D = \sqrt{2m ( 1 - P)}$ , where P is average of pearsons. So, we can use all those rolling/running variance stuff and simply just take average of pearsons!
Cool!
Beta Was this translation helpful? Give feedback.
All reactions