Replies: 14 comments
-
@nsankar Can you please provide a few details first?
|
Beta Was this translation helpful? Give feedback.
-
@seanlaw It is a simulated time series: import numpy as np random_ts = np.random.uniform(size=size) for _ in range(5): data.append(np.arange(100)) |
Beta Was this translation helpful? Give feedback.
-
I guess what I am trying to understand is what is your real world use case and why you need to use Dask. If your use case is truly a fictitious example with five short, randomly generated time series then you don't need Dask. To start, I recommend that you go through this The |
Beta Was this translation helpful? Give feedback.
-
In short, it would be relatively straight forward to compute the pairwise distance matrix by creating two nested for loops that iterate over each pair of time series and then computing/storing the mpdist. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw What I have is a large multivariate anomaly data which I use for anomaly detection using HDBSCAN clusters. In this case, I was looking at using MPDist which I wanted to compute using dask. But I wanted to first try the simulated timeseries data possibly with stumpy as a first try to check the MPDist working.. That is the scenario.. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Alright. Thanks |
Beta Was this translation helpful? Give feedback.
-
@nsankar I understand but depending on your real data and the answers to the aforementioned questions:
my response would be drastically different and I don't want to steer you into a direction that is less favorable. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw I have 6 dimensions in the time series each representing a metric. , length is 28800 (1 minute sample frequency) - 20 days log, This data is on my local system (private data) |
Beta Was this translation helpful? Give feedback.
-
@seanlaw I am trying using the code given in issue #149 as a reference to compute MPDist on a sample data of 10,000 rows using coiled/dask 10 worker cluster for computing the profile . i.e. |
Beta Was this translation helpful? Give feedback.
-
@nsankar Thank you for the added context. Given that you have 6 x 28,800 time series, I do not recommend using Dask as these time series are pretty short and all of our Dask-supported functions are really targeted for computing matrix profiles for much longer time series lengths (i.e., >100,000 data points in length). Instead, you should be able to compute your pairwise distance using something like this (untested code):
On my 2-core (4 hyperthreads) Macbook Pro, this took around two minutes (102 seconds to be precise) to complete for this small 6 x 28,800 dataset. Note that since we are using Of course, for longer time series lengths, you may play around with replacing the
with
but note that Dask won't help you if you are on your local machine. Dask only helps for longer time series and if you have access to a Dask distributed cluster. And, if you have access to a GPU or are on Google Colab, you may try replacing WARNING: We are currently in the middle of updating our definition of AB-joins (i.e., computing the matrix profile between |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Thank you for the excellent guidance. I am going to study , try these and shall revert . Questions : (1) Why do you say Dask only helps for longer time series? I would like to know if there anything beyond making a long time series distributed (divide and conquer stuff..) you are relating to? (2) As to the definition of AB-joins , Is this a new API being added to stumpy or is it getting added to an existing API? Thanks again. |
Beta Was this translation helpful? Give feedback.
-
Well, Dask can be used for a lot of things but, currently in STUMPY, the most time consuming task is computing the matrix profile and the total computational time is
No, this is not a new API. AB-joins have been in STUMPY since day one (but not well advertised). However, recently, I had noticed that our definition of an AB-join was actually flipped. So, when you did |
Beta Was this translation helpful? Give feedback.
-
Thanks @seanlaw for enlightening ! |
Beta Was this translation helpful? Give feedback.
-
@nsankar If you don't mind, I'm going to close this for now but feel free to re-open if you have any more questions. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw Is it possible to get a pairwise distance matrix using stumpy
I want to try the same procedure using stumpy (dask/coiled) and HDBScan. Kindly provide the steps for the same with the stumpy code snippet. I believe it would require using mstumped API for a multidimensional data ?. Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions