This repository contains the source code used to produce the results for the master thesis (in Python3) in the main directory and the source code of the master thesis itself (in LaTeX) in thesis folder.
Nowadays cyber-physical systems are widely used in different application domains. In parallel, machine learning algorithms are used widely to detect the anomalies in the behaviour of these systems. However, this detection is limited to two states: normal behaviour and faulty functioning. This master thesis aims to extend this detection to differentiate between attacks and normal faults. In first place, a power system is described as an example to work on. Then, various machine learning algorithms are evaluated on the given datasets, and this using two machine learning toolkits - scikit-learn and Weka. Later, various tools for feature analysis are presented and an algorithm to find the features that contributed the most into the false predictions is described. Finally, three solutions to the initial problem are presented and evaluated.
The integral text of the master thesis can be found in this pdf file. Below is presented the source code used for each of the chapters in the thesis.
- files_calc.ipnyb: conversion of dataset from
.arff
to.csv
and analysis of distribution of classes throughout files.
- ai_all.py: script to calculate comparison metrics values for all the classifiers for the 3 available datasets (multiclass, binary, three classes). As output il creates
pickle
files containing the results to be processed afterwards, - plot.ipynb: tool for creating plots for all comparison metrics using
pickle
files created by the previous script, - roc.py: a script to create roc curves for classifiers running on binary data (not displayed in the thesis),
- ai.py: legacy script for calculate comparison metrics values for all the classifiers for 3 class dataset. It creates also the ROC curve and the confusion matrix,
- ai_binary.py: legacy script for calculate comparison metrics values for all the classifiers for binary dataset,
- ai_multiclass.py: legacy script for calculate comparison metrics values for all the classifiers for multiclass dataset,
- proc.py: script for converting
csv
toarff
in order to run tests in WEKA, - plotting.py: legacy script for creating plots for all comparison metrics,
- param_optim.ipynb: script for finding the best set of parameters for the discussed classifiers.
- features.ipynb: checking the capabilities of LIME and YellowBrick packages,
- trees_visualisation.ipynb: checking the capabilities of dtreeviz,
- lime_features_classification.py: getting the features' importances using LIME package (algorithm discussed on page 53 of the thesis.
- featfun.ipynb: attempt to enhance the predictions of Decision Tree classifier,
- featfun_rf.ipynb: attempt to enhance the predictions of Random Forest classifier,
- featfun_mlp.ipynb: attempt to enhance the predictions of Multilayer Perceptron classifier,