Cannot Reproduce (By Hand) the Matrix Profile Values Generated with the Stump Algorithm #105
Replies: 5 comments
-
| @ssget2sumit Thank you for your question. Can you please provide some code to demonstrate what you are trying to do? STUMPY is taking two equal length chunks (aka “subsequence”) from a longer time series and computing the z-normalized Euclidean distance between them. This is repeated via a sliding window process. A more thorough explanation can be found in this Matrix Profile Tutorial but note that, for simplicity, what is shown/calculated is only the Euclidean distance so you want to apply a z-normalization yourself to each chunk. Also, note that for self-joins (comparing a time series to itself) also involves adding an exclusion zone in order to avoid a self-match. | 
Beta Was this translation helpful? Give feedback.
-
| For completeness, here's a super slow and naive implementation taken from tests/test_stump.py and compared with the output from  You should be able to see that this assertion succeeds. Let me know if you have any further questions and I'd be happy to clarify/help where possible. | 
Beta Was this translation helpful? Give feedback.
-
| Thanks Sean for your quick support.
Wishing you a very Happy New Year 2020  !!!!
Case 1 :
Suppose we are having a long time series  from 02-July 2018 to 08-Sep 2018.
It contains 69 observations
I need to find a pattern for my input data as mentioned below :
Input Data Contains : 10 observations
07-Jul-18 10852.9
08-Jul-18 10947.25
09-Jul-18 10948.3
10-Jul-18 11023.2
11-Jul-18 11018.9
12-Jul-18 10936.85
13-Jul-18 11008.05
14-Jul-18 10980.45
15-Jul-18 10957.1
16-Jul-18 11010.2
Business Use Case :  Need to find the pattern of my input  on the output
data set [ demo_data.csv]
First Question when we  say  z Normalization then :
1. We are doing z- normalization for whole data set  which contains 69
observations. PFA demo_Data.csv
Then after z normalization we are taking those normalized input data set.
Then comparing those normalized input data set which is having 10
observations with normalized  output data set based upon chunks as 10.
My question is if we are using z normalization on whole data set it can
happen some of the data is having higher  values then variance is also more.
or
1. We are doing normalization for splitted chunks based upon 10
observations
Example : Normalization of first 10 observations i.e
02-Jul-18 10657.3
03-Jul-18 10699.9
04-Jul-18 10769.9
05-Jul-18 10749.75
06-Jul-18 10772.65
07-Jul-18 10852.9
08-Jul-18 10947.25
09-Jul-18 10948.3
10-Jul-18 11023.2
Then Normalization of Second 20 Observations i.e
11-Jul-18 11018.9
12-Jul-18 10936.85
13-Jul-18 11008.05
14-Jul-18 10980.45
15-Jul-18 10957.1
16-Jul-18 11010.2
17-Jul-18 11084.75
18-Jul-18 11134.3
19-Jul-18 11132
20-Jul-18 11167.3
Then Calculate the Euclidean distance based upon z-normalized distance
obtained.
Second Question :
Why we are using euclidean distance because it is  one to one
mapping.Instead of that we can use Dynamic Time Series Wrapping distance
formula.
Please do the needful on this.
Regards,
Sumit… On Mon, Dec 30, 2019 at 7:29 PM Sean M. Law ***@***.***> wrote:
 @ssget2sumit <https://github.com/ssget2sumit> Thank you for your
 question. Can you please provide some code to demonstrate what you are
 trying to do?
 STUMPY is taking two equal length chunks (aka “subsequence”) from a longer
 time series and computing the z-normalized Euclidean distance between them.
 This is repeated via a sliding window process. A more thorough explanation
 can be found in this Matrix Profile Tutorial
 <https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html>
 but note that, for simplicity, what is shown/calculated is only the
 Euclidean distance so you want to apply a z-normalization yourself to each
 chunk.
 Sent with GitHawk <http://githawk.com>
 —
 You are receiving this because you were mentioned.
 Reply to this email directly, view it on GitHub
 <https://github.com/TDAmeritrade/stumpy/issues/105?email_source=notifications&email_token=AMNIU5SPJR3P4YSN3V4VT5TQ3H5DTA5CNFSM4KBLES62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH2LPOY#issuecomment-569685947>,
 or unsubscribe
 <https://github.com/notifications/unsubscribe-auth/AMNIU5ULEE22PCT6NPZLSALQ3H5DTANCNFSM4KBLES6Q>
 .
 | 
Beta Was this translation helpful? Give feedback.
-
| Regarding question 1, it is the latter. We are z-normalizing each 10 observation chunk separately (independently, relative to its own mean and standard deviation from the 10 observations) and then computing the Euclidean distance between the two z-normalized chunks. This is performed as a sliding window across the entire time series. Regarding question 2, STUMPY is typically used for finding patterns and anomalies from within your time series (many local comparisons and the output is many, many values). The research has shown that z-normalized Euclidean distance is enough to do the job. Now, there is something called MPdist for comparing two time series globally and returning a single distance metric but it has not been implemented yet. Sent with GitHawk | 
Beta Was this translation helpful? Give feedback.
-
| @ssget2sumit Closing this for now. Feel free to re-open if you have any further questions. | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am not able to match the match the matrix profile value.
Apart from this I need to know Z-Normalization needs to be done on Whole data set.
Or it needs to run on Chunks of data based upon window size. (Z-Normalization based upon Chunks of data)
I tried to match with matrix profile value when data is normalized and then do the pair wise distance calculation.
Even I tried when chunks of data is normalized and then do the pairwise distance calculation.
Beta Was this translation helpful? Give feedback.
All reactions