Skip to content

Jeff-67/Compare-classifiers-with-Random-Forest-to-build-a-motor-false-prediction-and-classification-system

Repository files navigation

Using Random forest to design an on-line electrical motor false classifying and predicting system

Abstract

Electric motors are the important power source for intelligent manufacturing; however, motor eccentric is a serious problem that will occurs when robots or machine have operate for a while. This fault will cause damage to power modules, but traditional solutions were mostly depends on expensive sensors and the are not able to make a precise prediction. In this project, we will foucus on the eccentricity fault by collecting big data and do the further classification and prediction based on Random Forest.

Here is the code!!

Guide line

Data introduction

  • Real time raw data

  • Features extraction

Model training

  • Random Forest algorithm intriduction

  • training and test data preprocessing

  • Feature filtering

  • Data size choosing

  • hyper parameter tunning

  • Status Voting Method

Why Random Forest?

  • KNN

  • SVM

  • Decision Tree

  • Comparison Result

Future work

Data introduction

A proper data of electrical motor false classification consists of the real-time operating raw data and controlling command of rolling element bearings via the acquisition station (shown in Fig.1), data processing, feature extraction from the data sets and classification into functional (State0) or defective (State1, State2, State3) status (shown in Fig.2) of rolling element bearing. To be more specific, I get the data from a real packing machine motor.

Entire process of collecting big data Status definitions
screen shot 2018-12-07 at 4 25 40 pm screen shot 2018-12-07 at 4 29 11 pm
Fig.1 Fig.2
  • Real time raw data

    Acquisitions of the rolling element bearing’s real-time raw data and controlling command from servo motor driver were performed on the oscilloscope which is shown in Fig. 3. It provides 8 channels which have 16 bytes memory and 4KHz sampling frequency for each. Fig. 4 displays the acquisition’s rule of the training data which was set for appropriate experimental temperature, working time and experimental station running speed.

    Oscilloscope and command interface Acquisition’s rule
    screen shot 2018-12-07 at 4 29 41 pm screen shot 2018-12-07 at 4 47 12 pm
    Fig.3 Fig.4
  • Features extraction

    Servo Driver provides many of real-time signals and commands which can be obtained from oscilloscope. I choose 8 motor related real-time information (all are in time domain) to be the features for model learning(shown in Fig.5).

    The description of 8 channels for data acquisitions
    screen shot 2018-12-07 at 5 11 02 pm
    Fig.5

Model training

Scikit-learn is an open-source machine learning software, which is easy to use. Scikit-learn provides classification, regression, clustering and dimensionality reduction libraries for python programming. I used Scikit-learn and random forests algorithm to classify rolling element bearing and motor condition and make prediction.

  • Random Forest algorithm intriduction

    Here is the code!!

    The characteristics of random forest are adding an additional layer of randomness to bagging that constructing each tree using a different bootstrap sample of the data. Bagging is a well- known type of classification trees, which is that successive trees do not depend on earlier trees. Each tree is independently constructed using a bootstrap sample of the data set. In the end, a simple majority vote is taken for prediction. In standard decision trees, each node is split using the best split among all variables. In random forests, each node is split using the best among a subset of predictors randomly chosen at that node. This strategy makes random forest performing better than many other classifiers, including decision trees, discriminant analysis and support vector machines. Random forest is also robust against overfitting.

    screen shot 2018-12-07 at 9 58 15 pm

  • training and test data preprocessing

    Here is the code!!

    The training data used in this part are collected from the 10 groups of original training data sets which contains the four motor states and I selected 2500 volumes of data from every state containing the information about CH1 ~ CH8 randomly. Finally, a group of 10000 training data containing four states was obtained. The purpose of this step is to make the collection of training data become more various, so as to prove the credibility of training data.

    (Note: I deal with the test data in the same way, the volume of test data is 10000 at first also).

    screen shot 2018-12-07 at 10 33 23 pm

    In addition to normal data preprocessing, I do the advance data preprocessing (Shown in Fig.6) and get the better prediction accuracy (Shown in Fig.7).

    CH1max :The maximum value of CH1 in the training data set

    CH1min : The minimum value of CH1 in the training data set

    CH1peak :The peak value of CH1 in the training data set

    CH1peak = CH1max -((CH1max -CH1min)*2%)

    Advance data preprocessing by focusing on the peak value The comparison of different data preprocessing
    screen shot 2018-12-07 at 10 24 50 pm screen shot 2018-12-07 at 10 38 22 pm
    Fig.6 Fig.7
  • Feature filtering

    In this part, I seperate all the features into four gruops (Shown in Fig.8 below), each of them is: speed relative features, location related features, torque related features and the other feature. After, I combine those feature cluster in various way (Shown in Fig.9 below), the result indicates that the combination of all features get the highest OOB_Score, thus I regard CH1~CH8 as the data features.

    Features cluster Result
    screen shot 2018-12-07 at 10 53 23 pm screen shot 2018-12-07 at 10 54 15 pm
    Fig.8 Fig.9
  • Data size choosing

    The results of six kinds of data size are shown below, we can see that the prediction accuracy is the highest when the amount of training data is 100K, which proves that the more the amount of training data is, the better the training model will be.

    The comparison of different data size
    screen shot 2018-12-07 at 10 38 34 pm
  • hyper parameter tunning

    Here is the code!!

    After doing proper feature extraction, training data preprocessing and deciding proper training data size, model improving is through hyper parameter tunning and determining the proper one by OOB_Score. I use the powerful function called sklearn.model_selection.GridSearchCV to run all the parameters fisrt and then narrow down to the parameter 'n_estimators'. Fig. 10 shows that in the case of n_estimators=70, the OOB_Score is 0.9389, which is the best score between n_estimators=10~80. At last, it can be seen in Fig. 11 that the prediction accuracy of each state of n_estimators=70 is 0.998, 0.9981, 0.9989 and 0.998, all are better than the case of n_estimators=10. Consequently, the learning model of this project is based on 8 features of real-time raw data and controlling command from servo motor driver, 100k pieces of training data size, advance data preprocessing of CH1 peak value and n_estimators=70.

    OOB_Score of different n_estimators Prediction accuracy of n_estimators=10,70
    screen shot 2018-12-07 at 10 38 43 pm screen shot 2018-12-07 at 10 38 53 pm
    Fig.10 Fig.11
  • Status Voting Method

    After constructing the learning model by random forest algorithm, status voting method is designed to make sure that the diagnosis prediction results are precise. Fig. 12 displays how the status voting method works, and beneath shows the algorithm:

    w0 = State0 votesnumber

    w1 = State1 votes number

    w2 = State2 votes number

    w3 = State3 votes number

    w0 + w1 + w2 + w3 = 1000(totalvotesnumber)

    w = max(w0,w1,w2,w3)

    State_result = State(wmax)

    The comparison of different data size
    screen shot 2018-12-08 at 3 03 45 pm
    Fig.12

Why Random Forest?

Here is the code!!

I compare Random forest with other three kinds of classfiers, which are KNN, decision tree and SVM and i will give brief introduction about those algorithms.

Here is the code!!

  • KNN screen shot 2018-12-08 at 7 19 00 pm

  • SVM screen shot 2018-12-08 at 7 19 11 pm

  • Decision Tree screen shot 2018-12-08 at 7 19 23 pm

Fig.13 and Fig.14 indicates the results of comparing random forest with the three classifiers I mentioned, as i predicted, random forest is the best classifiers in this project.

screen shot 2018-12-08 at 7 47 36 pm screen shot 2018-12-08 at 7 47 26 pm
Fig.13 Fig.14

Last but not least, decision tree and random forest are quite similar in some case, therefore I list out some certain differences between them.

screen shot 2018-12-08 at 7 58 43 pm

Future work

The future work could be related to the implementation of the presented approach on the cloud system, IoT or embedded system computation platform as well as to do more non real-time motor faults’ feature extractions making prediction system much more complete.

About

In this project, I am using random forest to predict and diagnose the failure of electrical motor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages