How to avoid finding all-zero motifs (and also all-zero motif matches)? #618
Replies: 2 comments 1 reply
-
@soja-soja Thank you for your question and welcome to the STUMPY community. Are you able to provide a real example with code? In STUMPY, we try very hard not to make too many assumptions about your data as it is impossible to know whether those "all-zero" values are "important" or not. It depends on the use case. However, if you have some domain expertise and know that there are certain regions that you don't want to match for then you may consider setting those regions of your time series to Now, I'm not saying that you should set all zero values to zero! Only the subset of regions in your time series that you want to NEVER find a match for. Another option is to consider using annotation vectors but I don't know if this will be relevant to your use case as it is a post-processing step. Let me know if that makes sense or if you can provide a real data+code example that we can work through. Otherwise, it's a bit hard to communicate and understand what exactly the problem is. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your prompt response and warm welcome :) to clarify the situation, please consider this code for example (wrote that quickly, it is rough)
all 10 motifs that are found, contains only zero values. As the they are perfect matches. But instead, what I wanted to focus on, are the non-zero values. Even if I set them perfectly 100 datapoints apart:
and furthermore, set their values to the same:
Still, all the algorithm catches, is all-zero motifs. and now the points:
I think having a |
Beta Was this translation helpful? Give feedback.
-
Hi,
I noticed that in a time-series with a relatively large number of consecutive zero values, either at the beginning or right before any other meaningful motif emerges, the algorithm gets stuck in finding motifs with all-zero values (as they are perfect matches, right? )
The parameters of stumpy.motifs function, focus on eliminating the worst-case motifs (setting the upper limit on distance), while these perfect motifs seem to be worthy of exclusion (set a min cut-off threshold or ideally, ignoring all zero subsequences), and in my use-case they are vital to be excluded (both for performance - not spending time on finding all-zero motifs is really time saving - as well as missing the motifs or interest due to finding a large number of all-zero motifs :( )
Was wondering if I'm missing something and already there is way to avoid that, or if that can be added to the great package you are working on?
(that is also an issue for the instances (matches) of a motif, so if the motif itself is not all-zero, but is matched with a number of all-zero subsequences - especially in cases where there are one or only a few non-zero values in the motif itself)
Beta Was this translation helpful? Give feedback.
All reactions