use MERLIN to get motifs #550
NimaSarajpoor
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
@ninimama Can you help me understand why we would want to do it this way? What is the benefit compared with |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
In issue #417, the MERLIN algorithm is mentioned and in [WIP] PR #505, as of now, it is being implemented.
Here, I would like to share my idea on using MERLIN to get top-motif (not discord). This is a draft and, for now, I focus on top-motif. It needs to be improved for returning
top-k
motif.Main Idea:
Let us assume we have a time series
T
and we are looking for the top-motif of lengthm
(so we havek = n - m + 1
subsequences). Can we agree that if, for a givenmin_dist
, the algorithm MERLIN returnsk-1
candidates, the one that is left is motif? (because it has at least one NN to which its distance is less thanmin_dist
, so already it has priority compared to the ones that are selected as candidates for discords). And, that's it! That is the core idea. Below, I explain how we can implement it.Implementation:
We can start with
min_dist
and find some candidates of discords (recall that these cannot be top-motif. So, we just need to scan their right/left neighbors in the first phase very quickly). We store them in a variable that I call itis_cands_cache
. It is aBoolean
vector that keep track of already-discovered candidates of discords. Then, we can exclude them and search among the remaining ones with lowermin_dist
. So, we are trying to exclude the candidates of discords!... please see below:And,
Let us test it on random data:
They give same answer. In stumpy, it takes
0.15
seconds and in the proposed MURLIN-based approach, the running time is65
second.Please let me know what you think.
Beta Was this translation helpful? Give feedback.
All reactions