How to avoid finding all-zero motifs (and also all-zero motif matches)? #618

soja-soja · 2022-06-07T17:45:26Z

soja-soja
Jun 7, 2022

Hi,

I noticed that in a time-series with a relatively large number of consecutive zero values, either at the beginning or right before any other meaningful motif emerges, the algorithm gets stuck in finding motifs with all-zero values (as they are perfect matches, right? )

The parameters of stumpy.motifs function, focus on eliminating the worst-case motifs (setting the upper limit on distance), while these perfect motifs seem to be worthy of exclusion (set a min cut-off threshold or ideally, ignoring all zero subsequences), and in my use-case they are vital to be excluded (both for performance - not spending time on finding all-zero motifs is really time saving - as well as missing the motifs or interest due to finding a large number of all-zero motifs :( )

Was wondering if I'm missing something and already there is way to avoid that, or if that can be added to the great package you are working on?

(that is also an issue for the instances (matches) of a motif, so if the motif itself is not all-zero, but is matched with a number of all-zero subsequences - especially in cases where there are one or only a few non-zero values in the motif itself)

seanlaw · 2022-06-07T22:19:59Z

seanlaw
Jun 7, 2022
Maintainer

@soja-soja Thank you for your question and welcome to the STUMPY community. Are you able to provide a real example with code?

In STUMPY, we try very hard not to make too many assumptions about your data as it is impossible to know whether those "all-zero" values are "important" or not. It depends on the use case. However, if you have some domain expertise and know that there are certain regions that you don't want to match for then you may consider setting those regions of your time series to np.nan. Hopefully, I'm not misunderstanding your point (this is where a real example with data and code would help).

Now, I'm not saying that you should set all zero values to zero! Only the subset of regions in your time series that you want to NEVER find a match for. Another option is to consider using annotation vectors but I don't know if this will be relevant to your use case as it is a post-processing step.

Let me know if that makes sense or if you can provide a real data+code example that we can work through. Otherwise, it's a bit hard to communicate and understand what exactly the problem is.

0 replies

soja-soja · 2022-06-07T23:46:10Z

soja-soja
Jun 7, 2022
Author

Thanks a lot for your prompt response and warm welcome :)

to clarify the situation, please consider this code for example (wrote that quickly, it is rough)

import stumpy
import numpy as np
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt

%matplotlib inline

window = 60
ts = np.zeros(6000)
values = [4,4,3,5,2,4,2,5,3,4,4,3,5]
for i, v in enumerate([89, 663, 715, 1362, 1932, 2315, 2617,3485,3763,4160, 4942,5502,5723]):
    ts[v] = values[i]

mp = stumpy.stump(ts, m=window)

motif_idx = np.argsort(mp[:, 0])[0]

motif_distances , motif_indices = stumpy.motifs(ts, mp[: , 0], cutoff = np.inf, max_distance= np.inf, max_matches = 10, max_motifs = 10)
fig, axs = plt.subplots(2, sharex=True, gridspec_kw = {'hspace': 0}, dpi=200, figsize=(14,4))
axs[0].plot(ts)

for motif_id in range(len(motif_indices)): 
    for indx in motif_indices[motif_id]: 
        rect = Rectangle((motif_idx, 0), window, max(ts), facecolor = 'lightgrey')
        axs[0].add_patch(rect)
        axs[1].axvline(x= indx, linestyle='dashed',  color=['lightcoral', 'lightblue', 'lightgreen', 'lightyellow'][motif_id % 4])
        

for motif_id in range(len(motif_indices)): 
    any_non_zero_match = False
    for indx in motif_indices[motif_id]:
        if sum(ts[indx:indx+window]) !=0:
            any_non_zero_match
        
    if any_non_zero_match:
        print("found at least one non-zero match")

all 10 motifs that are found, contains only zero values. As the they are perfect matches. But instead, what I wanted to focus on, are the non-zero values. Even if I set them perfectly 100 datapoints apart:

for i , v in enumerate(range( 100, 1300, 100)):
    ts[v] = values[i]

and furthermore, set their values to the same:

for i , v in enumerate(range( 100, 1300, 100)):
    ts[v] = 10

Still, all the algorithm catches, is all-zero motifs.

and now the points:

I agree with the point of not making too many assumptions about ones data, but I'm thinking about having the ability to exclude trivial motifs base on domain knowledge.
That was my first attempt as well, dealing with zeros in my timeseries, but that will change the nature of the timeserie. I can remove the trailing zeros from beginning and the end of timeseries, but still, if there are enough zeros between non-zero datapoints, they will be picked up as motifs. And also, the zeros are important for me, as I care about the delay between each non-zero datapoint.
While the annotation vector is interesting and I think can be used to address the issue (at least partially), given the number of time-series I'm processing, it seems to be a patch for something that could have been avoided much earlier, don't you think so?

I think having a min_distance parameter, or even better flags such as ignore_allZero_motifs=True|False as well as ignore_allZero_matches=True|False would be a great addition to address the above mention issue way closer to the source of problem 🤔 what do you think?

1 reply

soja-soja Jun 8, 2022
Author

In case it helps anyone else who has the same issue:

for now and based on your suggestion for using annotation vectors, I can do:

mps = mp[: , 0]
new_max = 3 + max(mps)
for i in range(len(mps)):
    if ts[i] ==0:
        mps[i] = new_max

and then pass that to stumpy.motifs function. Tested that and confirm it helps to ignore the all-zero motifs,
Although, I believe the performance gain still is a valid point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid finding all-zero motifs (and also all-zero motif matches)? #618

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to avoid finding all-zero motifs (and also all-zero motif matches)? #618

soja-soja Jun 7, 2022

Replies: 2 comments · 1 reply

seanlaw Jun 7, 2022 Maintainer

soja-soja Jun 7, 2022 Author

soja-soja Jun 8, 2022 Author

soja-soja
Jun 7, 2022

Replies: 2 comments 1 reply

seanlaw
Jun 7, 2022
Maintainer

soja-soja
Jun 7, 2022
Author

soja-soja Jun 8, 2022
Author