Anomaly Detection on multivariate (redundant) sensor data #992
Replies: 2 comments 1 reply
-
@alex-stefitz Welcome to the STUMPY community and thank you for your kind words. I was wondering if it would be possible to share a 14-day sample of your multidimensional data (probably a CSV with 14 columns that we can read into From what I understand, you are trying to find a situation when one (or a few) sensors don't behave/look like the majority of the other sensors within a given day. Does that sound correct? So, in your first plot, you are trying to identify the orange line and, in the second plot, the blue line? And in the third plot, there are no anomalies in any of the sensors but there is some variation that is "acceptable"? Without fully understanding your problem, I don't think that computing the 1D matrix profile on each 1D time series will be helpful since it has no knowledge of the other time series. I also don't think that |
Beta Was this translation helpful? Give feedback.
-
Thank you for your quick response! The time series currently has about 9 month (≈9*30*96=25960) of observations and 10 columns. This will most likely be extended to 1,5 years and I will get another one also with 1,5 years and ~25 sensors. Unfortunately, I can't share the original data I'm working on, but with the help of the PV-Live dataset I quickly created some data which is really close to the one I'm working with. I attached 3 tables: The clean data, the data including anomalies and a table describing the anomalies. (For my experiment setup, I work with artificially added errors, that's why I can provide the clean file now, but generally my problem is an unsupervised one without ground truth) _pv-artificial-irr_clean.csv And yes, I think your understanding is correct. Of course it would also be nice if the approach realizes behaviour which is generally impossible (e.g. significantly negative values, no drop to 0 in the night, ...), but my main focus is the comparison of sensors among each other, so if all show the same behaviour (no matter what is it), it could be considered correct and not an anomaly. And yes, also your interpretations of the plots are correct! In the first two plots, I want to detect the orange and the blue sensor. In the third plot, there is no anomaly, but you nicely see different weather situations (clear-sky on the first day, some clouds on the second) as well as the difference between the two orientations. Thank you for your help and have a nice day/evening! |
Beta Was this translation helpful? Give feedback.
-
Hello,
First of all thanks for this cool tool and the active support, it looks really promising!
I have following scenario: I'm writing my master thesis about anomaly detection of redundant irradiance sensor data. I have data from about 15 irradiance sensors on a specific location. They face two different orientations, so the expectation is that the ones facing the same direction measure more or less the same (small deviances are totally fine due to local shading or just measurement inaccuracies), while the ones facing different directions still behave similarly, but there can be a shift when it reaches the peak.
Generally, the data should follow quite a strict schema: always ~0 during the night, rising in the morning, peak around noon, then falling again until sunset. However, due to all sorts of possible weather phenomenons, basically all behaviours during the day are possible (clear-sky day; sunny day until noon, then thunderstorm and cloudy; fast moving clouds and therefore high fluctuation, completely bad day with less than 10% of theoretically possible irradiance....), and this is alright as long as all sensors agree. Weather is random, so not too many re-occuring patterns to expect.
The data set has some known errors: While the whole data set is multivariate, the errors usually are not. It is possible that two (or more) errors exist at the same time, but they should be considered independent. The errors can show in different ways, the most common are that the broken sensor (=one variable)
The following pictures show data with errors (first two pictures) and different variations of correct data.
The task is to detect the errors as good as possible. I created an algorithm which is able to detect the errors quite well using pairwise regression, but I need a comparison technique and would love to use Matrix Profile for that, since i like that it proved to perform great in many cases even though it is generally a quite simple approach.
I spent quite some time into MP in the last weeks, but unfortunately, I was not able to get it working. The problem is that applying MP on one sensor does not seem to work. When doing this, i get really high values for days with interesting weather phenomenons (which makes sense since this behaviour has not be seen before), but not even all days with very obvious errors (e.g. random walk) get high MP values.
(I set m to 96, since that's the number of observations per day, and added a NaN between every day so that only whole days are compared with each other. I saw that in one of the _in_official tutorials, it does not change much to have a continous graph)
So then I tried to expand it to multivariate following the paper Matrix Profile XXVIII. However, this also does not seem to work since I still mainly find special days (which makes sense, since this behaviour can be seen in all sensors).
So I don't really know how to proceed. I spent quite some time into MP, adapted my library for it and would really love to use it, but I really don't know how to adapt it in my case: To summarize it up, I need it to detect if one variable is performing different than the others, but it's totally fine to have weird behaviour, if all sensors show it.
If you have any questions, feel free to ask. I would be really happy for your support!
Alex
Beta Was this translation helpful? Give feedback.
All reactions