Exploring weighted-balanced tree as the alternative solution for detecting anomaly. It's worth to notice, the anomaly detection will employ the isolation forest algorithm as its core, with a binary search tree utilized for the anomaly classification process. Currently, im suspecting and have an assumption that the binary search tree may encounter unsuccessful operations when the tree nodes become unbalanced / skeewed tree operation (let's say we assume there are 5 tree nodes which of them weren't sorted by default), making it challenging to maintain balance between the two nodes.
This is just proof-of-concept for my research for a master's degree, it may have biased and unxepected results since it wasn't properly tested and evaluated
Eventually, you need to generate the .so
files first if you want to run and ported the WeightBalancedTree
function in Python. You need to compile the main code first by execute this command
gcc -c -fPIC weight_balanced_tree.c
And afterwards, create the shared library so that can be execute
gcc -shared -o weight_balanced_tree.so weight_balanced_tree.o
Ensures that the shared library was generated and converted successfully
ls -l weight_balanced_tree.so
APIs benchmarking
You can do the benchmarking using make benchmark
command to get the comparison between Binary Tree and Weighted Binary Tree (it's worth to noting that both of the algorithm is using similar numbers of dataset). At my machine, both of them resulting with different performances with the binary tree it's slightly faster than Weighted Binary Tree
Algorithm | Method | Result Time (seconds) |
---|---|---|
Binary Tree | Searching | 0.000036 seconds |
Insertion | 0.000437 seconds | |
Weight Binary Tree | Searching | 0.001664 seconds |
Insertion | 0.003228 seconds |
Detection Anomaly benchmarking
As you can see, for use in anomaly detection using the same dataset, the Weighted binary tree is relatively faster than the binary tree due of in the weighted binary tree there's a rebalancing tree process which tries to balance the nodes in the two objects. Thus, from here it leads to the faster performance when searching find anomaly, while the binary tree doesn't use the rebalance method, when the tree node dataset increases, the search process will gradually be slower due to an unbalanced occurrence between the two objects in the tree node
Algorithm | Result Time (seconds) |
---|---|
Weighted Binary Tree (non-HTTP request) | 0.000083 seconds |
Weighted Binary Tree - Constant Time | 0.000054 seconds |
Binary Tree (non-HTTP request) | 0.000331 seconds |
This project also represents how the process of finding anomalies uses constant time and linear time, if you look at the comparison, with constant time the search for anomalies will be faster but what needs to be noted is: in the constantDetectAnomaly
method itself there are no additional parameters for the classification feature for identifying anomalies, this is what makes the search process a little faster, but there may be a bias in the results to find whether the anomaly is true or not