Acoustic emission signal filtering based on artificial intelligence

,


Introduction
Acoustic emission (AE) is one of the most effective structural health monitoring methods because of its great sensitivity and ability to detect damage in a variety of materials [1].AE method provides important information about the performance of the structure by determining the time of origination, location, size, and direction of the damage [2].AE has been widely used in a range of engineering applications, materials, material compositions, and infrastructures [3].However, several issues with the technique limit the development of automated and real-time health monitoring of structures.In the common practice of AE, the primary waves known as P-waves are the first measure of the signal arrival at the sensor, utilized to determine the location of the damage.However, recorded AE signals are commonly clouded by spurious events or noise, which leads to misinterpretation and inaccurate evaluation [4].Environmental noise (electrical effects) and internal noise (friction and reflection) are the two main sources of AE noise.Hence, before extracting AE parameters, filtering of initially collected AE signals must be performed.Previous studies have focused on both the filtering and denoising techniques before the damage assessment of a structure however use of machine learning algorithms at an early stage of data collection has not been done so far.Yu et.al.studied the noises that can be removed based on traditional filtering methods such as Swansong filters [5], [6].The most basic level of filtering AE data is typically accomplished by setting a threshold for data acquisition, eliminating the low amplitude signals.These settings have only been used in various studies [7], [8] comprising the investigation or development of damage detection algorithms for concrete structures.Some current studies also presented the improvement of AE signals through a soft filtering model based on the complex use of Huang transform and wavelet analysis for an accurate source location of damage [9].However, data reduction to decrease the cost of data handling and analysis is also an important aspect in early stages.Since massive data are acquired during monitoring of a structure, for example 30000-40000 AE signals are acquired from a simple bending test of an RC beam, interpreting all the sources of noise, even in carefully controlled laboratory environments is difficult and time-consuming.Therefore, an AE system that automatically learns essential features from all the AE data, may provide insights for discriminating between true (crack-related) signals and spurious (noise-related) signals, enhancing AE efficiency and minimizing false alarms.Accurate and robust identification using adequate signal processing and machine learning methodologies allows enhanced monitoring of damage in structures including the ability to automatically raise alerts if structural integrity is seriously damaged.Several studies have been conducted in the literature on machine learning methodologies used in the field of non-destructive testing.Jierula et.al. used machine learning to detect and monitor the damage locations of AE signals in RC structures [10].To classify the column source locations, substantial AE parameters were identified using a support vector machine classifier and neighbourhood component analysis.Hoshyar et.al.proposed a hybrid approach to identify damage in concrete beams by data filtration and normalization, to extract input features for the self-advising support vector machine (SA-SVM) model for further classification of damages [11].Yapar diagnosed the damaged states of bridges using Wavelet and Fourier transform techniques applied to the recorded signals for denoising AE signals based on the artificial neural network (ANN) approach [12].It was observed that the AE data was successfully processed in noise elimination, event source location, and source characterization.Abdelrahman et.al.investigated the use of wavelet analysis of AE signals to develop empirical-based algorithms to differentiate between clean and degraded signals in the time-frequency domain [4].The focus was to develop a robust filtering technique for the acoustic emission dataset.It was observed that developed filter results were validated using acoustic emission data collected during a load test.The establishment of approaches reported in the literature has proved to be successful but the implementation of such approaches in improving filtration processing at an early stage reduces the AE data reduction robust to machine learning models.Therefore, an effort had been made to perform a more reliable signal filtration using time and frequency domain parameters to broaden the use of AE monitoring.For this purpose, a machine learning approach using classification algorithms was used to exclude noise and wave reflections from true signals in the concrete.

Methodology
One of the challenges in achieving structural damage assessment using AE techniques is processing large data sets generated during data acquisition.However, machine learning techniques such as supervised and unsupervised learning based on classification algorithms offer a solution to this problem.In a data-driven approach, the importance of machine learning and extracted features is critical.It is fundamentally composed of four steps: data pre-processing, training, validation, and testing.Cleaning raw data, selection and extraction of important features prior to training machine learning are considered very important to achieve better accuracy of models.Machine learning methods are classified as supervised, semisupervised, unsupervised, or reinforcement learning based on the nature of the input data.The target/outputs for the dataset are either unknown or known in unsupervised and supervised algorithms, respectively.Density estimation is typically an unsupervised problem, whereas classification and regression are supervised approaches [13].Logistic regression, Decision tree, Support vector machine (SVM), and K-nearest neighbour (KNN) methods are used in this study for this purpose.The most common method of machine learning used in predicting binary classification problems is logistic regression, in which sigmoid function is used to map real numbers from 0-1 using a given set of features or independent variables [14].Decision tree classifier is another type of classification-based algorithm, which sorts the classes from root to leaf and terminal nodes.A sub-node splits further sub-nodes known as the decision node for predicted class.Based on criteria such as the Gini diversity index, information gain, and entropy, the decision tree classifies the features into various classes [15].Support vector machine (SVM) is considered as a powerful algorithm particularly for linear & non-linear classification problems which are relevant to the kernel functions.The algorithm allows a line or a hyperplane to divide the dataset into output classes [16].Another most popular classification algorithm is K-Nearest Neighbours (KNN) which works on feature similarities to proximate the input set of features as unknown vectors in the training dataset and identifies the class with higher number of observations to nearest neighbours as the predicted class using weighted squared inverse Euclidean distance [16].In this study, for all these algorithms, the AE signals labelled as "clean" and "noise" were used to predict two output classes using time and frequency domain parameters.70% of the dataset was used for training and 30% for testing with a 5-fold cross-validation method.All the testing results of the models achieved were compared in terms of confusion matrix, accuracy, receiver operating characteristic (ROC) and feature importance based on accuracy.

AE Data Acquisition
As damage localization is usually carried out using Hsu-Nielsen Calibration Method and a waveform of pencil lead break (PLB) shows similarities with a crack signal, a series of Hsu-Nielsen tests were conducted on a concrete specimen in dimensions of 600x100x100 mm.AE signals were generated by breaking 0.5 mm 2H pencil leads on different locations of the specimen and an equal set of pencil leads were broken while holding the pencil at the 45° angle to the surface of the specimens.Four AE sensors having resonance frequencies of 150 kHz (R15α) were placed on the test specimen by silicon grease to detect AE data.They were connected to an 8-channel SAMOS AE system by MISTRAS and to pre-amplifiers of 40 dB gain.Threshold was set as 40 dB.Tests were repeated four times at five different locations.The sensors recorded all hits of these events which include both clean (PLB-related) and noise (reflection-related) waveforms.Locations of the sensors and PLB sources are shown in Figure 1.

Signal Labelling for Filtration
Pre-processing of AE signals acquired from PLBs were initially filtered through a threshold filter of 40 dB.This approach helps in data reduction of those signals which are below 40 dB amplitude.Later, the remaining signals were classified based on the time of arrival to differently located sensors.In each event, the first wave emission originated from PLB to each sensor was treated as a "clean" signal and the others which were composed of secondary waves or reflections from the boundaries of the concrete specimen were treated as "noise".To prepare the dataset for the classification of such signals in machine learning models they were labelled as 0 and 1 for the binary classification problem.

Feature Extraction
Feature extraction of the AE waveforms before evaluation of machine learning algorithms is reported in this section.To improve accuracy and reduce computational load, feature extraction and selection of features play an important role by reducing dimensionality and improving the machine learning classifier performance.To estimate damage level or source locations etc., signals without reflections or noise are much needed to perform reliable analysis.Establishing such an approach has the potential to expand the use of AE monitoring and increase its reliability.In this study, to classify all the labelled AE signals, feature extraction of AE signals was done in both the time domain and frequency domain.In the statistical time domain, parameters named a) mean, b) standard deviation, c) kurtosis, d) skewness, e) peak to peak value, and f) crest factor, were measured from the data points of the waveform and extracted using equations ( 1) to ( 6) respectively.
(2) On the other hand, in the frequency domain, parameters like peak intensity amplitude using Fast Fourier Transform (FTT) and peak intensity of Power Spectral Density (PSD measures the distribution of power in frequency contents by decomposing the time-series signal) were calculated and used as an input feature in machine learning models.The FFT of clean and noisy signals in the frequency domain are also shown in Figure 2. It can be observed that the intensity of signals is solely dependent on the peak amplitude.The clean signal represents a much higher value in amplitude than noisy signals around the frequency of 150 kHz.The explicit difference between the peak intensity of amplitudes makes this parameter important to be selected for feature selection in machine learning models.To check the performance of the classification algorithms, it was observed that model no 2. using a decision tree as a classification algorithm resulted in % 100 accuracy of properly categorizing the observation with 0% error, while model no. 4 using the KNN algorithm resulted in the worst performance among all other models as found by maximum false observations.The accuracy of the models in percentage is also shown in Figure 4 demonstrating the decision tree as the best classification algorithm for AE signal filtration with 100% accuracy.On the other hand, SVM, logistic regression and KNN showed an accuracy of 95%, 90% and 85%, respectively.To evaluate the feature importance, Figure 6 was created which represents the performance comparisons of the models trained with maximum accuracy and minimum accuracy provided by the decision tree and KNN, respectively.The performance was evaluated by removing single feature from the dataset each time during training the network.As seen, the accuracy of the models in the decision tree remains in the range of 99%-99.5%,with no clear indication of an important feature while in KNN model it changes significantly with feature selection.It was observed that only in KNN model when removing skewness features from the input vector, the model's accuracy increased from 85% to 90%.Similarly, removing the standard deviation value from the statistical time domain and the peak intensity amplitude from the frequency domain, the model's accuracy reduces from 85% to 82%.Based on the above results, it was found that the skewness and standard deviation of the statistical time domain of the dataset, as well as the peak intensity amplitude of the frequency domain, are the most important features of the model when training the network in signal filtering.

Conclusions
In this study, a machine learning approach using classification algorithms was used to exclude noise and wave reflections from true signals in the concrete.For this purpose pencil lead break AE activities were generated on a concrete specimen.The performance of the different machine learning models show that the decision tree is the best classification algorithm in the filtration of AE signals among logistic regression, support vector machine (SVM) and Knearest neighbour (KNN) classification algorithms since decision tree supports automatic feature interaction.The decision tree test results classified the AE signal with 100% accuracy while the least accuracy obtained in model was through KNN classifier with an accuracy of 85%.Furthermore, the skewness and standard deviation of the statistical time domain, as well as the peak intensity amplitude of the frequency domain proved to be the most significant features of the model when training the network in AE signal filtering.

Figure 1 :
Figure 1: Schematic representation of the test setup

A
confusion matrix was used to evaluate whether the observations of clean and noise signals were properly classified or not based on output class (true class) and target class (prediction class).
Figure 3 illustrates the testing dataset results of a confusion matrix by different classification algorithms, showing the number of observations for two output classes from clean and noise datasets.The diagonal cells with blue colour represent observations where both the true class and predicted classes are the same.Whereas the light brown and white cells represent false and true observations of categorized AE signals, respectively.

Figure 3 :Figure 4 :
Figure 3: Confusion matrix of the testing models

Figure 5 :
Figure 5: ROC of the testing models

Figure 6 :
Figure 6: Feature importance of the testing models based on accuracy