Another question on multidimensional series pattern matching #686
Replies: 2 comments 3 replies
-
@PatrickKudo Thanks for the question. First off, when you want to put your code inside of a code-block in a comment, rather than using single backticks at the start/end of each line, you can use three consecutive backticks at the start and at the end of the code block like:
Now, on to your question.
So, it depends on your goal. Right now, you are comparing the query subsequence with each column separately and each column will return a set of "top matches". It is completely possible that
Of course, you'll have to do some simple bookkeeping to keep track of what the "new" indices mean relative to the original un-concatenated time series but, hopefully, you get the point. Note that we add an |
Beta Was this translation helpful? Give feedback.
-
When you say "false positives", It sounds like you are looking for exact matches (i.e., you are looking for subsequences in a longer time series that matches your query subsequence exactly)?
For 10^6, this is just long enough that you may start seeing a difference in computing time. I think what matters is how many columns do you have in total? And how many different target subsequences will you be querying for? |
Beta Was this translation helpful? Give feedback.
-
Hello, I was reading this conversation as well as the fast pattern matching tutorial and I think I have a basic understanding of how to use the MASS algorithm. However, I am wondering if there's a more effective approach to what I'm trying to do.
Basically, assume I have a multidimensional dataframe df with 10 variables and I want to find a match of the 1st time series (T1) using the other 9 time series. I set up a subsequence of T1 as Q_df, then iterate through and calculate distance profiles for each column. Then I find the lowest distance profile value's corresponding index, and write it to an array.
For picking the best candidate matches, I was thinking I would compare values from idx_start_array against the start index of Q_df (arbitrarily started with 25), and whichever is closest to that index is the best match. But I was wondering if it would be better to find a match by comparing the z-norm distance profiles for all column variables. I also considered using stumpy.match, but I am still trying to understand it, especially the no-threshold application involving the stumpy.config.STUMPY_EXCL_ZONE_DENOM setting.
Is my approach reasonable or is there a way to make stumpy.match work somehow for my data? Thanks!
-Patrick
Beta Was this translation helpful? Give feedback.
All reactions