Distance matrix calculation for time series clustering #774
Replies: 2 comments
-
Q1In your definition of distance matrix, the distance matrix contains the pairwise distance between whole sequences. Q2If you want to compare whole sequences, then set m equal to the sequence length. GeneralRead the papers. There is also a paper about distance matrices: |
Beta Was this translation helpful? Give feedback.
-
@rsangole Thank you for your question and welcome to the STUMPY community. I don't know the answer but I'd like to share a few observations and possible answers to your questions:
Sadly, this is the nature of the beast. As I'm sure you are already aware, any
A couple of notes/reminders:
In theory, this would/should be much faster, though, again, it will be costly. Where STUMPY shines is when you want to compare the distance between subsequences and also for longer time series. With
Yes, this is correct. However, if you read the original MPdist paper, they explain that one rarely really wants to compare the full time series. If you look at the recommended MPdist tutorial, this is related to the question of whether or not a phase-shifted time series is the "same" as its unshifted counterpart. Unfortunately, this is a question that can only be answered by the user as it can differ depending on your use case. |
Beta Was this translation helpful? Give feedback.
-
@seanlaw
(Continuing the discussion here from our slack thread...)
Background
I'm trying to develop some time-series clustering for ~3k time-series, each 39 long.
After normalizing each time series using
StandardScaler
, here's a glimpse of the dataset:Attempts
Getting inspiration from your link, my attempt to create the full distance matrix is:
The compute takes (figuratively) forever (10+ hours, before I called it quits on my MB Pro 2.4Ghz 8-core machine).
Questions
m
to39
?Beta Was this translation helpful? Give feedback.
All reactions