· Home · Table of Contents · Fundamental & Applied Research | ## Probabability of detection for magnetic rubber inspections of F-111 steel componentsC A Harding and G R HugoDefence Science and Technology Organisation 506 Lorimer Street, Fishermans Bend, Victoria, 3207, Australia. Contact |

Magnetic rubber inspection is used extensively on F-111 aircraft to detect cracks in critical steel components. Scheduled inspection intervals are based on a durability and damage tolerance analysis which requires as input data an assessment of the reliability of the technique. DSTO and the Royal Australian Air Force (RAAF) recently conducted an experimental program to determine the probability of detection for magnetic rubber inspection of F-111 components. This program involved a series of inspections performed on coupon specimens by RAAF technicians under conditions simulating those experienced during in-service inspections. A program of Monte Carlo simulations was used to demonstrate the validity of different statistical methods for analysis of the relatively small experimental data set obtained from the field trial.

**Keywords:** Magnetic rubber inspection, reliability, probability of detection

Magnetic rubber inspection (MRI) is a nondestructive evaluation (NDE) technique, which is used extensively on F-111 aircraft to detect cracks in D6ac steel components, including the wing-pivot fitting, the wing carry-through box and several other critical structures within the airframe. The magnetic rubber technique is a variation on magnetic particle methods, in which a liquid rubber containing suspended magnetic particles is poured into a dam surrounding the area to be inspected on a magnetised component. After the rubber sets, the cast is removed and examined for evidence of cracks or other discontinuities, which appear as dark lines on the surface of the cast. Inspections can be performed using either an applied magnetic field, which is maintained whilst the rubber sets (active field), or the residual field from magnetisation of the component prior to pouring the rubber. A key feature of the inspections conducted on the F-111 D6ac steel structure is the need to reliably detect very small fatigue cracks down to 0.010 inch [Imperial units are conventionally used in connection with the F-111.] (0.25 mm) in length. MRI is labour intensive but capable of detecting such small defects.

Scheduled inspection intervals for nondestructive inspection of F-111 are based on a durability and damage tolerance analysis (DADTA), which incorporates as input data an assessment of the reliability of the NDE technique. The DADTA assumes that the airframe contains pre-existing flaws at critical locations, and then models the growth of these flaws given a representative flight loading spectrum. The initial flaw size is assumed to be the smallest that can be *reliably* detected by the NDE technique. It is taken to be the minimum crack size (a_{9095}) for which a 90% probability of detection (POD) has been demonstrated experimentally with 95% statistical confidence. Here, the 95% statistical confidence level accounts for the uncertainty inherent in determining the POD from a finite statistical sample. The DADTA modelling predicts a number of flight hours for the defect(s) to grow to a size, which could cause failure of the component. The inspection interval for periodic NDE is then taken to be a fraction (typically one-half) of the number of flight hours for the assumed defect to grow to a critical size.

The withdrawal of all USAF F-111 from service circa 1998 left Australia as the only nation operating this aircraft type, with the expectation that the F-111 would remain in Royal Australian Air Force (RAAF) service for a further 20 years until its planned withdrawal date in 2020. As part of a coordinated package of research to support the RAAF as sole-operator of the F-111, DSTO conducted a review of all available information concerning the reliability of magnetic rubber inspections. In this review, insufficient documentary evidence could be located to satisfactorily demonstrate the required POD (Hugo & Scala 2001). Consequently, DSTO commenced an experimental program to determine the POD (including a_{9095}) for RAAF magnetic rubber inspection of F-111. It was anticipated that this information could allow inspection intervals for magnetic rubber inspections to be increased, thereby achieving significant savings on maintenance costs and reducing aircraft unavailability due to scheduled inspections.

During the initial review, it was noted that the published methodologies for analysis of POD data were generally developed and demonstrated for the analysis of relatively large data sets. Since the experimental program for RAAF MRI would necessarily be a relatively small trial, a number of possible analysis algorithms were examined to assess their applicability for the modest quantity of data to be obtained.

- the reliability of the master magnetic rubber inspection results and
- the accuracy of the crack lengths measured from the master inspection casts.

The experimental program involved simulated field inspections of a series of coupon specimens by RAAF technicians. Two specimen types were used (Figure 1): a 'bolthole' specimen representative of typical cracks occurring in boltholes, and a 'mousehole' specimen representing cracks in more general structure, including radii. The latter specimen was designed to be similar to the fuel-flow vent holes (mouseholes) within the F-111 wing-pivot fitting. Field inspections were conducted with the specimens inserted inside a scrap wing-pivot fitting in order to realistically simulate the effects on reliability of the restricted access typically encountered by RAAF technicians.

Fig 1: Specimen types used for field trial, shown mounted inside wing-pivot fitting. |

The coupon specimens were fabricated from D6ac steel, heat-treated to the same condition as components in the F-111 airframe. Fatigue cracks were generated in the specimens at DSTO using a 'DADTA2b' spectrum loading, representative of flight loading at a typical location in the lower plate of the wing pivot fitting. Small corrosion pits (up to 50 µm in size) were electrochemically generated in the specimens prior to fatiguing to act as fatigue crack initiators. The use of corrosion pits as crack initiators was necessary in order to reduce the scatter in crack initiation times sufficiently to be able to successfully generate the very small fatigue cracks (down to 0.004" in length) required for the trial. Both bore and quadrant (corner) cracks with lengths ranging from 0.002" (0.05 mm) to 0.090" (2.3 mm) were successfully generated using this procedure. For the mousehole specimens, the mousehole shape was produced by electric discharge machining from an initial keyhole notch shape after generating the fatigue cracks, taking care not to remove the cracks during the machining process.

From a total of 103 specimens prepared, a set of 21 bolthole and 28 mousehole specimens were selected to achieve as uniform a distribution of crack sizes as possible. Uncracked specimens were used as placebos, some of which had been fatigued but were uncracked, whilst others were as machined. The trial included a total of 360 inspections on cracked holes and placebos, in roughly equal proportion, according to a randomised schedule of inspections. Six RAAF technicians of different experience levels participated in the trial and were drawn from the Base NDT section at 501 Wing RAAF Amberley. Trials were conducted at RAAF Amberley under similar levels of pressure due to workload as those encountered by technicians for on-aircraft inspections. To prevent collusion between technicians, they reported their results by session number and by the station number of each of four coupons located inside the wing-pivot fitting for each session.

Two different methods were used to magnetise the specimens in the field trial. The mousehole specimens were inspected using an active field from a horseshoe magnet spanning the mousehole, whilst the bolthole specimens were inspected using the residual field following magnetisation using a central conductor inserted through the hole (applied current of 500A for 5 sec).

Technician results were reported using defect codes adapted from those used in service. Technician reports were compared to the results of 'master' magnetic rubber inspections in order to determine a 'hit' or 'miss' result for each inspection of each confirmed crack. The master inspections were performed at DSTO with the specimens under an applied tensile load in a mechanical testing machine, which was found to give significantly clearer indications because the crack mouths were opened by the applied load. Master inspections were performed both before and after the field trials. A subset of specimens were broken open for fractographic examination in order to confirm

- verification of required POD at a single specified crack size and
- determination of POD as a function of crack size.
- those which assume a particular functional form for POD(
*a*) (curve fitting methods), and - those which make no assumption about the relationship between POD and crack size.
- The magnitude of the POD at large crack sizes is dependent upon the form of the mathematical function assumed for the POD curve. In particular, for the functional forms most favoured in the literature (log-normal and log-logistic), the POD asymptotes to 1 at large crack sizes. This may be inconsistent with reality, for which human factors might cause the POD to be less than 1 even for very large cracks. It is not possible to test the appropriateness of the assumed mathematical function using a small data set, as it requires a much larger experimental data set to test the shape of a curve than to fit unknown parameters in an assumed function.
- The methodology conventionally used to infer the 95% confidence limit POD curve is derived from the properties of statistics in the limit that the sample size tends to infinity (Cheng & Iles, 1983). It is therefore appropriate for large data sets, but may be less applicable for smaller data sets.

Methodologies for determining POD fall into two basic categories,

Determination of POD as a function of crack size requires a series of inspections on specimens containing a range of crack lengths,a, to infer a curve, POD(*a*), which plots the POD as a function of crack size. Statistical methods may be further differentiated into

Other methods, which make no assumptions about the functional relationship between POD and crack size, include the range interval method (RIM) and optimised probability method (OPM) (Berens & Hovey 1981, Bruce 1998). These methods can be applied to data sets of any size and have the advantage that the POD curves inferred by them cannot be compromised by an inappropriate choice of functional form for POD(*a*). However, the confidence limits derived from these methods are almost always more conservative than those obtained from curve fitting methods and may be very conservative when applied to small data sets.

The applicability of various statistical methods for relatively small data sets was assessed using a program of Monte Carlo simulations. Synthetic hit / miss results were generated for a random set of crack lengths according to an assumed true POD(*a*). The functional form used for the true POD(*a*) curve was a cumulative log-normal distribution, which is commonly used for analysis of POD data.

Several analysis algorithms were applied to this synthetic data set. Curve fitting methods provide a 'best fit' (MLE) POD curve and a lower 95% confidence limit curve, as well as key parameters such as the a_{9095} crack length (minimum crack length which gives 90% POD with 95% statistical confidence). The simulation procedure was repeated 1000 times to determine the distributions of the fitted POD curves and parameters and to assess the conservatism or non-conservatism of the fitted curves and parameters_{ }with respect to the assumed true POD curve.

Table 1 presents the results for 1000 simulations, each comprising 100 inspections, with the true POD curve chosen to give a_{50,true} = 0.005" (0.13mm) and *a*_{90,true} = 0.011" (0.28mm), where *a*_{50} and *a*_{90} denote the crack lengths corresponding to 50% and 90% POD. Methods 'MLE method 1' and 'MLE method 2' denote two different formulations for determining the 95% confidence limit on the maximum likelihood estimation of POD. Method 1 was based on a general procedure described by Cheng & Iles (1983), using their parameter *Q*_{2} to define the confidence region. Method 2 implemented a much simpler closed-form solution proposed by Bullock, Forsyth & Fahr (1994). For a 95% confidence limit, there is only a 5% chance of obtaining a data set which gives a confidence limit that is non-conservative with respect to the true value. Thus, we would expect the 95% confidence limit to be non-conservative (*a*_{9095}<*a*_{90,true}) at most 5% of the time, on average[The 95% confidence limits derived by these methods are for the whole POD curve as a function of crack length; i.e. there is a 95% confidence that the confidence limit curve will be conservative with respect to the true POD at all points. The chance that any single point on the curve (eg *a*_{9095}) will be non-conservative should be significantly less than 5%]. From Table 1, this expectation is easily satisfied for OPM and MLE method 1. However, for MLE method 2, the *a*_{9095} value was non-conservative for 15.9% of the simulations. This result could not have occurred by chance (probability < 10^{-10}) and indicates a serious problem with this analysis method. Indeed, a review of the derivation of the key formulae in the paper from which the method was taken revealed an incorrect assumption which, when corrected, renders the method invalid for determining confidence limits on POD curves. Thus MLE method 2 should not be used for the analysis of POD data.

Method |
Mean( |
Mean( |
% Non-conservative |

OPM |
Not determined |
0.025[For 30% of simulations, OPM failed to reach 90% at 95% confidence level for any crack length. The mean value for |
0.1% |

MLE method 1 |
0.011 (0.28) |
0.019 (0.48) |
1.0% |

MLE method 2 |
0.011 (0.28) |
0.014 (0.36) |
15.9% |

Table 1: Simulation results comparing the a_{90} and a_{9095 }values from different analysis methods to the true value a_{90,true }= 0.011" (0.28mm). Results are for 1000 simulations, each comprising 100 inspections. |

The other results in Table 1 are consistent with expectations and show that OPM and MLE method 1 are both acceptable for determining POD curves from data sets as small as 100 inspections. The results are consistent with the curve fitting (MLE) method being much more efficient that OPM. For MLE method 1, the confidence limit *a*_{9095 }was on average 70% greater than the true value *a*_{90,true}, whilst for OPM, *a*_{9095 }was on average 2.3 times the true value. OPM was unable to determine an *a*_{9095} confidence limit for 30% of simulations as the curve did not reach a POD of 90% within the range of crack sizes considered.

The results of the experimental trial, analysed using MLE 1 and OPM methods, are shown in Figures 2 and 3 for inspections on mousehole and bolthole specimens respectively. The bolthole inspections have a somewhat higher POD at small crack lengths but a significantly poorer POD at large crack lengths. This is consistent with the proportion of cracks detected (hits) at each crack length in the field data. The bolthole data includes two very significant misses at 0.021" (0.53 mm) and 0.018" (0.46 mm). For the mousehole specimens, the 'best fit' or maximum likelihood estimate of the crack length at which the POD reaches 90% is *a*_{90} = 0.009" (0.23 mm)_{, }whilst the 95% confidence limit crack length for 90% POD is *a*_{9095} = 0.012" (0.30 mm). By comparison, the more conservative OPM gives *a*_{9095} = 0.028" (0.71 mm). For the bolthole specimens, the maximum likelihood estimate *a*_{90} = 0.015" (0.38 mm). However, due to the relatively poorer POD and the limited quantity of inspection data, the 95% confidence limit does not reach a POD of 90% within the range of the field inspection data and an *a*_{9095} value cannot be reported.

The bolthole and mousehole inspections differ significantly in the magnetisation method (central conductor vs active field) and in the surface condition of the areas inspected. The mouseholes were highly polished, consistent with the surface condition during RAAF inspections of mouseholes and stiffener runouts in the F-111 wing pivot fitting. The surface in boltholes was less well polished. The POD results obtained for the mousehole specimens are thus considered to be applicable only to inspections conducted on highly polished surfaces using an active field.

It is possible that the significant misses and the poorer POD for bolthole inspections are related to the fact that, for a central conductor inspection with no defects present, there is normally no sign of the magnetic field on the cast, as the field lines run circumferentially with no leakage field in the absence of a defect. By comparison, for the active field technique, the casts always shows a 'halo' at the edges which provides post-inspection confirmation that the field was correctly applied. Thus, human error in applying the central-conductor procedure resulting in a lack of magnetisation could easily pass undetected, whereas for the active field technique inadequate magnetisation would easily be detected when inspecting the casts and the inspection would be repeated. This could explain the poorer reliability (lower POD) for the bolthole inspections at larger crack lengths, since inadequate magnetisation could cause a failure to detect cracks of any size. In spite of this, the best guess (MLE) *a*_{90} value is 0.38 mm (0.015"). This is still significantly better than could be achieved using other NDE techniques. It is likely that a usable lower confidence limit *a*_{9095} value could be obtained by extending the field trial to obtain more data for the bolthole inspections.

The open circles plot the proportion of hits obtained for the field inspection data at each crack size, with their area being proportional to the total number of inspections performed at each crack size.

Fig 2: Results of Experimental Trial for Mousehole Specimens. |

The open circles plot the proportion of hits obtained for the field inspection data at each crack size, with their area being proportional to the total number of inspections performed at each crack size.

Fig 3: Results of Experimental Trial for Bolthole Specimens. |

The POD for magnetic rubber inspections of F-111 D6ac steel components has been determined as a function of crack length based on field trials completed by RAAF technicians at RAAF Amberley. It was found that inspections of boltholes using central conductor magnetisation were less reliable (lower POD) than the inspection of a mousehole geometry using an active field technique. The applicability of different statistical methods for analysis of relatively small data sets was examined using Monte Carlo simulations. A significant error was identified in one previously published analysis method. The simulations demonstrated that a variant of the standard MLE method (MIL-HDBK-1823 1999), utilising parameter Q2 of Cheng & Iles (1983) to define the confidence limit, is acceptable for determining POD curves from data sets as small as 100 inspections.

The authors gratefully acknowledge the valuable contributions made by staff in the RAAF Non-Destructive Testing Standard Laboratory, who contributed to the planning and supervision of the field trial, and the technicians and supervising officers from 501 Wing Base NDT section whose willing cooperation was vital to the trial's success.

- Berens, A.P.& Hovey, P.W. (1981),
*Evaluation of NDE Reliability Characterization*, Air Force Wright Aeronautical Laboratories, AFWAL-TR-81-4160, USA. - Bruce, D.A. (1998), 'NDT Reliability Estimation From Small Samples and In-Service Experience',
*Airframe Inspection Reliability under Field/Depot Conditions*, NATO RTO, RTO-MP-10 AC/323(AVT)TP/2, pp3-1 to 3-22. - Bullock, M., Forsyth, D. & Fahr, A. (1994),
*Statistical Functions and Computational Procedures for the POD Analysis of Hit/Miss NDI Data*, National Research Council Canada, LTR-ST-1964. - Cheng, R.C.H. & Iles, T.C. (1983), 'Confidence Bands for Cumulative Distribution Functions of Continuous Random Variables',
*Technometrics*, vol. 25, no. 1,

pp 77 - 86. - Hugo, G. R. & Scala, C.M. (2001),
*An assessment of existing magnetic rubber inspection probability of detection data for F-111 D6ac steel structure*, Defence Science and Technology Organisation, DSTO-TN-0355, Australia. - MIL-HDBK-1823, (1999)
*Nondestructive Evaluation System Reliability Assessment*, Department of Defense Handbook, United States. - Petrin, C., Annis, C & Vukelich, S. I. (1993)
*A Recommended Methodology for Quantifying NDE/NDI Based on Aircraft Engine Experience*, NATO AGARD, AGARD-LS-190.

© AINDT , created by NDT.net | |Home| |Top| |