A COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND OTHER STATISTICAL METHODS FOR ROTATING MACHINE CONDITION CLASSIFICATION A. C. McCormick and A. K. Nandi Abstract Statistical estimates of vibration signals such as the mean and variance can provide indication of faults in rotating machinery. Using these estimates jointly can give a more robust classication than using each individually. Articial neural network architectures and some statistical algorithms are compared with emphasis on training requirements and real-time implementation as well as overall performance. Introduction Analysis of vibrations can indicate fault conditions in rotating machinery[1] such as shaft unbalance or rubbing. One common approach is to estimate time-invariant features from the vibration time series which change when a fault occurs in the machine. These features can then be input into some form of classication system to decide the machine's condition. Articial neural networks such as multi-layer perceptrons (MLPs) provide a system which can theoretically provide Bayes optimal classication of the condition based upon many features[2]. Training networks can however take a substantial length of time and is not guaranteed to nd the optimal solution. Radial basis function (RBF) neural networks provide an alternative architecture which can be trained in a much shorter period of time. Traditional statistical discriminant analysis[3] algorithms can be very simple to implement and do not require a time consuming training algorithm. But in many cases, they require certain assumptions to be made about the input data. If these assumptions are not valid, they may not provide as good a solution as a neural network. Experimental Set-Up The vibrations were measured from a small experimental machine set. This consisted of an electric motor which drove a shaft with a ywheel. Small weights could be attached to the ywheel unbalancing the shaft and rubbing could be applied using a screw attached to a frame. The vibrations were measured horizontally and vertically using accelerometers attached to a bearing block. This set-up allowed the creation of four machine conditions: NN - no faults applied; NR - only the rub fault is applied; WN - only the unbalance fault is applied; WR - both rub and unbalance faults are applied. The shaft Department of Electronic and Electrical Engineering, University of Strathclyde, 204 George Street, Glasgow, G1 1XW
rotation speed was varied between 77rev/s and 100rev/s and vibration time series were recorded for all four conditions at a variety of speeds in this range. Input Features The horizontal and vertical time series can be combined to produce a complex time series. The magnitude of this time series gives an indication the distance the shaft has been displaced from its central position. The mean value of this magnitude gives a good indication of the unbalance conditions as this fault causes the shaft to move in a circular orbit[4]. The variance can indicate rubbing as this fault causes the shaft to follow a highly erratic path. Moments can also be computed very quickly and are therefore well suited for real time implementation. The erratic motion caused by rub gives rise to a rapid variation in the speed of this motion. Consequently the statistics of the derivative can help in deciding if a rub fault exists. Integrating the signal attenuates the high frequency components in the signal associated with the rub faults leaving only a low frequency fundamental peak of an amplitude which depends on the degree of unbalance. These signal processing operations act as crude lters. However as they do not have well dened pass bands they may be succeptable to small amounts of noise at extreme frequencies. Therefore additional sets of features were produced from low pass ltered and high pass ltered versions of the signal. Moments were calculated from the magnitudes of all these time series. Articial Neural Networks Two types of articial neural networks are considered: multi-layer perceptrons and radial basis function networks[5]. Both are feed-forward networks: they have no memory and the output is only dependent on the current inputs which therefore have to be time invariant. Multi-layer perceptrons consist of perceptron neurons which consist of a weighted sum and a nonlinearity which approximates a thresholding function. If the non-linearity is assumed to be a threshold then each neuron divides the input hyperspace into two regions with a linear boundary. By using many (n) neurons, the input space can be divided into 2 n regions. A second layer can group these regions so that the input space can be mapped to the output through a complicated transform which can be the Bayes optimal classication. Approximations to a thresholding function are used for the non-linearities as these allow the network to be trained using a LMS optimization algorithm. Backpropagation allows multi-layer networks to be trained by attributing errors in the output neurons to the outputs of earlier layer neurons. A third layer is often used as this can make it easier to train the network. Optimization algorithms can get stuck at local optima and therefore enhancements to the standard backpropagation algorithm such as momentum and adaptive learning can be used. Radial basis function neural networks have the same overall ability as multi-layer perceptrons; they can approximate any arbitrary function to any arbitrary accuracy. They consist of two layers: the rst layer consists of radial basis function neurons the second of linear neurons. Radial basis neurons compute the distance between their weight values and the input vector. This distance is then operated on by a radial basis function which has a value of 1 if the input is zero and tends to zero as the input tends to innity. The Radial Basis Function used was the Gaussian function.
Statistical Methods The simplest form of discriminant analysis is nearest centroid classication. In this method, a centroid is calculated for each condition by averaging known features. The Euclidian distance is calculated between the unknown feature set and all the centroids. The condition is then classied as that of the nearest centroid. This method assumes that each feature is of equal importance and is of the same order of magnitude. Linear discriminant analysis involves calculating a weighted sum of the features and is similar to using an individual neuron. The weights are chosen to maximize the ratio of the dierence between the means of two groups to the variance. This method can be used to independently detect the weight and rub faults. The method can also be extended to multiple group problems using the canonical vector approach. The weights are chosen by maximizing the ratio of the between groups covariance and the within groups covariance. Both these methods assume common covariance between groups and normality. Nearest neighbour classication uses a large number of known sets of input features and consequently makes no assumption about the distribution of the features. The condition is classied as that of the known feature vector which has the shortest Euclidian distance from the unknown feature vector. Results Eleven dierent 2-second time series were recorded for each condition at various speeds between 77rev/s and 100rev/s. The signal was sampled at 12kHz giving time series of 24000 samples. Moments were calculated using non-overlapping windows of 1000 samples giving 24 estimates of the mean and variance for each time series. This data was divided into a training set consisting of the rst 8 estimates with the remaining 16 estimates being used for testing. To gauge the usefulness of each feature, the training data was used to set thresholds for detecting the rub and weight faults separately. The percentage of estimates from the test data classied using this method are shown in table 1. Pre-processing Moment Rub Fault Weight Fault None mean 56.2% 97.7% variance 84.8% 63.1% Dierentiation mean 88.8% 58.2% variance 80.0% 48.6% Integration mean 54.5% 100% variance 48.6% 92.6% Low Pass mean 52.3% 100% Filtering variance 61.4% 88.6% High Pass mean 88.2% 57.1% Filtering variance 81.2% 51.6% Table 1: Classication using thresholding Clearly the weight fault can be detected using just the mean of the integrated or low pass ltered signal. Detection of rubbing using thresholding is however less certain achieving a success rate of 88.8%.
Multi-layer perceptrons were trained using a variety of architectures. The number of neurons in the network has to be determined by trial and error. This means that a large number of networks have to be trained for dierent numbers of neurons. All the networks have 10 inputs and 4 output neurons. They were trained for either 10000 epochs or until a target sum-squared error (SSE) of 20 was achieved; if the errors are assumed to be normally equally distributed then this target gives a probability of false classication of less than 10?6. This assumption is generally not valid but this error target was found to allow classication of the training data almost perfectly. Networks with one hidden layer were trained for up to 23 hidden neurons and a variety of two hidden layer networks were also trained. The classication success rates, number of epochs, SSEs and training times on a 33MHz 486DX PC are shown below in table 2 for a selection of the networks trained. Architecture Training Success Training Time/s No. Epochs SSE Test Success 10:2:4 75.0% 2066 10000 92 75.0% 10:5:4 99.7% 2556 7475 20 98.6% 10:10:4 99.7% 1402 2435 20 99.9% 10:15:4 100% 4035 5081 20 99.0% 10:17:4 100% 6262 7067 20 100% 10:20:4 100% 6965 6861 20 99.1% 10:23:4 100% 4255 3672 20 98.7% 10:3:3:4 99.9% 992 2605 20 99.7% 10:5:4:4 100% 1049 2010 20 98.6% 10:5:6:4 99.4% 724 1181 20 98.9% 10:6:6:4 97.4% 1488 2043 20 97.0% 10:8:7:4 100% 1066 1293 20 99.6% 10:9:9:4 99.2% 1954 2028 20 99.3% 10:10:10:4 99.2% 1751 1650 20 98.0% Table 2: Classication using MLP networks When given enough neurons, networks with one or two hidden layers could classify the condition successfully in almost all cases. A tighter SSE target could have possibly improved the result. The time taken to train the networks with two hidden layers was signicantly less than the time taken to train networks with a single hidden layer. There is however a much larger number of possible architectures to test which would increase the overall training time signicantly. Radial basis function neural networks were trained using the same SSE target. The training algorithm for radial basis neural networks consists of adding neurons until the SSE target is achieved. After each iteration, the network was tested. The classication success rates and the cumulative time taken for training are shown for every 5 iterations in table 3 up to 23 neurons where the algorithm reached the SSE target. No. of Neurons Training Success Training Time/s Test Success 5 88.1% 87 90.2% 10 95.2% 114 95.3% 15 97.7% 162 94.9% 20 100% 196 98.4% 23 100% 240 100% Table 3: Classication using RBF networks
Clearly the RBF network was trained in a signicantly shorter time than the MLP networks. Also there is the added advantage that the optimal number of neurons was chosen by the training algorithm. This number of neurons was larger than for the MLP networks. The training data was used to set parameter estimates for the statistical algorithms. These were then used to classify the test data. The success of each method along with the length of time to classify the data set is shown in table 4. Method Test Time/s Test Success Nearest Centroid 2.20 84.1% Linear Discriminant Analysis 0.77 92.9% Canonical vectors 4.12 90.3% Nearest neighbour 73.98 100% Table 4: Classication Using Statistical Techniques Of the statistical methods only the nearest neighbour technique could classify all the data successfully. However this algorithm was by far the slowest. This can also be compared to the time it took for the neural networks to classify the test data: 1.9s at worst for the MLPs and 5.22s for the RBF networks. The test data took about 59s to record and therefore the nearest neighbour algorithm may not be suitable for real-time implementation. Condence in any system trained using measured data is increased by measuring more data. This will increase the training times of neural networks but will not aect the system in recall mode. For the nearest neighbour algorithm however each new point of measured data adds an extra distance calculation increasing its already poor test time. Conclusions Multi-layer perceptron and radial basis function neural networks have been used to classify the condition of a small rotating machine using statistical parameters estimated from the vibration time series as inputs. It was found that both networks achieved similar success in classication however the RBF networks could be trained in a signicantly shorter length of time. MLP's however required fewer neurons and were faster in recall operation. Statistical techniques were also tested: method's such as nearest centroid and linear discriminant analysis made assumptions about the distributions of inputs which were not valid for the data used and therefore performed poorly. The nearest neighbour algorithm makes only assumed that the distributions do not overlap and therefore performed as well as the neural networks. However the algorithm is time consuming and may not be suitable for real-time implementation. Acknowledgement It is a pleasure to thank Dr J. R. Dickie for obtaining the data on which these results are based. The authors, especially A C M would like to thank the assistance of both the EPSRC and DRA Winfrith in the form of the CASE award. Also the loan of the machine set from Solatron Instruments and the nancial assistance for experimental support from the University of Strathclyde are acknowledged.
References [1] J. T. Renwick. Vibration analysis - a proven technique as a predictive maintenance tool. IEEE Transactions on Industry Applications, 21:324{332, March 1985. [2] E.A. Wan. Neural network classication: A bayesian iterpretation. IEEE Transactions on Neural Networks, 1:303{305, 1990. [3] P. A. Lachenbruch. Discriminant Analysis. Hafner press, 1975. [4] A. C. McCormick and Nandi A. K. Rotating machine condition classication using articial neural networks. In R. B. K. N. Rao, R. A. Smith, and J. L. Wearing, editors, Proceedings of COMADEM '96, pages 85{94, University of Sheeld, July 1996. Sheeld Academic Press. [5] S. Haykin. Neural Networks: A Comprehensive Foundation. Macmillan, 1994.