Electric motors are the important power source for intelligent manufacturing; however, motor eccentric is a serious problem that will occurs when robots or machine have operate for a while. This fault will cause damage to power modules, but traditional solutions were mostly depends on expensive sensors and the are not able to make a precise prediction. In this project, we will foucus on the eccentricity fault by collecting big data and do the further classification and prediction based on Random Forest.
Here is the code!!
-
Real time raw data
-
Features extraction
-
Random Forest algorithm intriduction
-
training and test data preprocessing
-
Feature filtering
-
Data size choosing
-
hyper parameter tunning
-
Status Voting Method
-
KNN
-
SVM
-
Decision Tree
-
Comparison Result
A proper data of electrical motor false classification consists of the real-time operating raw data and controlling command of rolling element bearings via the acquisition station (shown in Fig.1), data processing, feature extraction from the data sets and classification into functional (State0) or defective (State1, State2, State3) status (shown in Fig.2) of rolling element bearing. To be more specific, I get the data from a real packing machine motor.
Entire process of collecting big data | Status definitions |
---|---|
![]() |
![]() |
Fig.1 | Fig.2 |
-
Real time raw data
Acquisitions of the rolling element bearing’s real-time raw data and controlling command from servo motor driver were performed on the oscilloscope which is shown in Fig. 3. It provides 8 channels which have 16 bytes memory and 4KHz sampling frequency for each. Fig. 4 displays the acquisition’s rule of the training data which was set for appropriate experimental temperature, working time and experimental station running speed.
Oscilloscope and command interface Acquisition’s rule Fig.3 Fig.4 -
Features extraction
Servo Driver provides many of real-time signals and commands which can be obtained from oscilloscope. I choose 8 motor related real-time information (all are in time domain) to be the features for model learning(shown in Fig.5).
The description of 8 channels for data acquisitions Fig.5
Scikit-learn is an open-source machine learning software, which is easy to use. Scikit-learn provides classification, regression, clustering and dimensionality reduction libraries for python programming. I used Scikit-learn and random forests algorithm to classify rolling element bearing and motor condition and make prediction.
-
Random Forest algorithm intriduction
Here is the code!!
The characteristics of random forest are adding an additional layer of randomness to bagging that constructing each tree using a different bootstrap sample of the data. Bagging is a well- known type of classification trees, which is that successive trees do not depend on earlier trees. Each tree is independently constructed using a bootstrap sample of the data set. In the end, a simple majority vote is taken for prediction. In standard decision trees, each node is split using the best split among all variables. In random forests, each node is split using the best among a subset of predictors randomly chosen at that node. This strategy makes random forest performing better than many other classifiers, including decision trees, discriminant analysis and support vector machines. Random forest is also robust against overfitting.
-
training and test data preprocessing
Here is the code!!
The training data used in this part are collected from the 10 groups of original training data sets which contains the four motor states and I selected 2500 volumes of data from every state containing the information about CH1 ~ CH8 randomly. Finally, a group of 10000 training data containing four states was obtained. The purpose of this step is to make the collection of training data become more various, so as to prove the credibility of training data.
(Note: I deal with the test data in the same way, the volume of test data is 10000 at first also).
In addition to normal data preprocessing, I do the advance data preprocessing (Shown in Fig.6) and get the better prediction accuracy (Shown in Fig.7).
CH1max :The maximum value of CH1 in the training data set
CH1min : The minimum value of CH1 in the training data set
CH1peak :The peak value of CH1 in the training data set
CH1peak = CH1max -((CH1max -CH1min)*2%)
Advance data preprocessing by focusing on the peak value The comparison of different data preprocessing Fig.6 Fig.7 -
Feature filtering
In this part, I seperate all the features into four gruops (Shown in Fig.8 below), each of them is: speed relative features, location related features, torque related features and the other feature. After, I combine those feature cluster in various way (Shown in Fig.9 below), the result indicates that the combination of all features get the highest OOB_Score, thus I regard CH1~CH8 as the data features.
Features cluster Result Fig.8 Fig.9 -
Data size choosing
The results of six kinds of data size are shown below, we can see that the prediction accuracy is the highest when the amount of training data is 100K, which proves that the more the amount of training data is, the better the training model will be.
The comparison of different data size -
hyper parameter tunning
Here is the code!!
After doing proper feature extraction, training data preprocessing and deciding proper training data size, model improving is through hyper parameter tunning and determining the proper one by OOB_Score. I use the powerful function called
sklearn.model_selection.GridSearchCV
to run all the parameters fisrt and then narrow down to the parameter 'n_estimators'. Fig. 10 shows that in the case of n_estimators=70, the OOB_Score is 0.9389, which is the best score between n_estimators=10~80. At last, it can be seen in Fig. 11 that the prediction accuracy of each state of n_estimators=70 is 0.998, 0.9981, 0.9989 and 0.998, all are better than the case of n_estimators=10. Consequently, the learning model of this project is based on 8 features of real-time raw data and controlling command from servo motor driver, 100k pieces of training data size, advance data preprocessing of CH1 peak value and n_estimators=70.OOB_Score of different n_estimators Prediction accuracy of n_estimators=10,70 Fig.10 Fig.11 -
Status Voting Method
After constructing the learning model by random forest algorithm, status voting method is designed to make sure that the diagnosis prediction results are precise. Fig. 12 displays how the status voting method works, and beneath shows the algorithm:
w0 = State0 votesnumber
w1 = State1 votes number
w2 = State2 votes number
w3 = State3 votes number
w0 + w1 + w2 + w3 = 1000(totalvotesnumber)
w = max(w0,w1,w2,w3)
State_result = State(wmax)
The comparison of different data size Fig.12
Here is the code!!
I compare Random forest with other three kinds of classfiers, which are KNN, decision tree and SVM and i will give brief introduction about those algorithms.
Here is the code!!
Fig.13 and Fig.14 indicates the results of comparing random forest with the three classifiers I mentioned, as i predicted, random forest is the best classifiers in this project.
![]() |
![]() |
Fig.13 | Fig.14 |
Last but not least, decision tree and random forest are quite similar in some case, therefore I list out some certain differences between them.
![]() |
The future work could be related to the implementation of the presented approach on the cloud system, IoT or embedded system computation platform as well as to do more non real-time motor faults’ feature extractions making prediction system much more complete.