You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm exploring outlier detection algorithms, and I found a newly published journal paper offering a new cross-density-distance outlier detection algorithms. I checked the paper and it's mainly doing the following steps with human-factored parameters to allocate weighs to manipulate thresholds:
after using the kernel height estimation function (KHE): $KHE(x \mid Z)=\sum_{k=1}^n z_k^{(j)} \cdot K\left(\frac{x-z_k^{(i)}}{h}\right)$ $K(u)=e^{-0.5 \cdot u^2}$
... A substantial amount of academic literature exists that explains the most optimal selection of bandwidth ( $h$ ) (Scott, 2015). However, throughout our experiments, we utilized Silverman's rule of thumb (Silverman, 2018):
In this work, we propose another transform function $C_w \in[d, 1]$, to derive a confidence weight for an outlier score given the total number of observations beneath the corresponding histogram. The fundamental concept entails starting from the minimum weight $(d)$ and elevating the outlier score as we approach the anticipated minimum baseline size expectation:
Compute the final ensemble score Oi for data point $X i$ passed through all submodels
Despite some vague mathematic relationships and figures that try to justify the claimed scoring approach, I understand a single fact trying to distinguish outlierness concerning contextualized features potentially.
The author states after the equation (#15) in the paper (page #7):
Throughout our experiment, we set the density score weight ( $𝜔_{𝑑𝑒𝑛𝑠𝑖𝑡𝑦}$ ) to 0.8 and the distance score weight ($𝜔_{𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒}$) to 0.2.
It's not clear how the author comes up even empirically with these coefficients to weigh and mix scoring the way the user prefers!
The impact of this approach is indicated Fig 4 between density-based and distance-based scoring using empirical coefficients denoted in equation (#15) as weights or the impact of human assistance!
Fig. 4: pic credit from the owner of the algorithm in the paper '
but the soundness of the claimed algorithm with no open source to reproduce results and with some unnecessary or empirical-based stages like reformulating ensemble models like IsolationForest or ensembled HBOS detection models. They showed that the expert version of this algorithm has super good performance
I would be happy if someone familiar with outlier detection mathematics checked the math and shared the insight if this algorithm could be as valid as it claimed since there is no open-source GitHub repo to reproduce the results of further investigations. In my personal opinion, math is not clear and sound using:
Kernel density Estimation
quantile transformer
to modify some scoring using human factor which manipulates scoring to get the best results while the art is finding non-parametric detection to minimize human factor parameterizations. Maybe I'm not fully informed and as an enthusiast want to understand the reliability and validity of this promising detection.
The text was updated successfully, but these errors were encountered:
clevilll
changed the title
this new published OD algorithm is valid or comparable with PyOD detection modales?
Is this new published OD algorithm valid or comparable with PyOD detection modales?
Nov 22, 2024
I'm exploring outlier detection algorithms, and I found a newly published journal paper offering a new cross-density-distance outlier detection algorithms. I checked the paper and it's mainly doing the following steps with human-factored parameters to allocate weighs to manipulate thresholds:
after using the kernel height estimation function (KHE):
$KHE(x \mid Z)=\sum_{k=1}^n z_k^{(j)} \cdot K\left(\frac{x-z_k^{(i)}}{h}\right)$
$K(u)=e^{-0.5 \cdot u^2}$
in sub-section 2.2. Ensemblers stated
and in equation (#18):
Despite some vague mathematic relationships and figures that try to justify the claimed scoring approach, I understand a single fact trying to distinguish outlierness concerning contextualized features potentially.
The author states after the equation (#15) in the paper (page #7):
but the soundness of the claimed algorithm with no open source to reproduce results and with some unnecessary or empirical-based stages like reformulating ensemble models like IsolationForest or ensembled HBOS detection models. They showed that the expert version of this algorithm has super good performance
I would be happy if someone familiar with outlier detection mathematics checked the math and shared the insight if this algorithm could be as valid as it claimed since there is no open-source GitHub repo to reproduce the results of further investigations. In my personal opinion, math is not clear and sound using:
to modify some scoring using human factor which manipulates scoring to get the best results while the art is finding non-parametric detection to minimize human factor parameterizations. Maybe I'm not fully informed and as an enthusiast want to understand the reliability and validity of this promising detection.
The text was updated successfully, but these errors were encountered: