NDT.net • May 2006 • Vol. 11 No.5

Introduction to the Statistics of NDT

E. Ginzel - Materials Research Institute


NDT inspectors are increasingly being required to produce quantitative results using tools that are not optimum for the task. As a result, uncertainties in the quantities they provide can be expected. These uncertainties can be broadly grouped into two categories; detection uncertainty and dimensional uncertainty.


Detection uncertainty relates to whether or not a flaw has been found. This is in itself is a controversial topic. What constitutes a flaw? If we consider that any deviation from a pure condition constitutes a "flawed" item then even the smallest crystal dislocation in a metal structure could be interpreted as a "flaw". Not many (probably not any) components would be considered "unflawed" if the definition was taken to this extreme. Flaw detection must therefore take on some practical considerations. Detection of flaws in NDT is by indirect means and whether or not a flaw is detected therefore depends on a variety of factors. These factors include the nature of the flaw itself, the test method used, the sensitivity of the test method used and human factors.

Having agreed upon the test method and test protocol, there are four possible outcomes in an inspection of a component. These four options constitute the probability matrix of detection.

  1. An item is flawed and the NDT method detects it (True Positive)
  2. No flaw exists and the NDT method indicates a flaw present (False Positive)
  3. An item is flawed and the NDT method does not detect it (False Negative)
  4. No flaw exists and the NDT method has no indication of a flaw (True Negative)
These are the foundations of the concept of "Probability of Detection" or POD. In NDT, this concept was developed mainly in NASA in the USA during the 1970s. It has since expanded to a few other venues of NDT inspections but is not yet as widely accepted as might be expected.

PODs try to assess a minimum flaw size that will be reliably detected by the NDT technique. This is best done by plotting the accumulation of flaws detected against the flaw size of all the flaws "detected" or that produce a response over some threshold. Ideally all flaws over some critical size will be detected and flaws smaller than that are not "detected". The tool most commonly used for POD description is the POD curve. Figure 1 illustrates a POD curve that indicates there is an 80% chance of detecting a flaw 2.2mm high and we can state with 90% confidence that it would not be greater than 3.3mm high. Or conversely we may state that we have a 90% confidence that a flaw 2.2mm high will be detected will not have a POD less than 65%.

Fig 1: Sample POD curve

The best way to describe a confidence curve (e.g. 95 %) is to state that: If the actual POD curve were to be reconstructed over and over using the same method and data, then 95% of those constructed curves would be above the confidence curve (i.e. 5% would be below). In other words, we are 95% sure that the REAL POD curve is above the confidence curve.

The confidence level must also consider the effects of the full matrix for POD where we have the potential for false calls. In a set of specimens with known flaws over the minimum required size for detection we can see that a miss of one item could destroy our confidence in the system if a high confidence is required. E.g. if there was a selection of 30 specimens it would require that 28 samples be tested in order to provide assurance that we have we could have a 95% confidence that the results are indicative of the detection capabilities of the test system. However, if we missed a flaw in one of the 28 samples we would no longer have 95% confidence of in the probability of detection.

An example of the population size required to be tested to assess the probability reliability is found at the website http://www.surveysystem.com/sscalc.htm There they explain that the confidence level tells you the larger your sample, the more sure you can be that the test results truly reflect the population distribution. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval can be. However, the relationship is not linear (i.e., doubling the sample size does not halve the confidence interval).

Conversely, if we have a large population we need to make a test of many samples to ensure that the results are indicative of the entire population (the calculator at the referenced website indicates 278 samples are required in a population of 1000 to ensure a 95% confidence level).


Quite a different set of uncertainties exists when an NDT operator is required to provide dimensional information about the flaws detected. Dimensions usually requested include; length, maximum vertical extent, depth below a test surface and position with respect to some X-Y coordinate system.

NDT uses indirect measurement methods to assess flaw dimensions. Magnetic particle inspection uses the pattern of small particles entrapped in a magnetic field. Eddy current uses coil impedance phase and amplitude information to gauge flaw depth and size. Ultrasonic testing uses arrival times, amplitude and phase information to provide depth, area and nature of a flaw.

However, even more direct methods can suffer from variation in estimates of dimensions. Handing a vernier calliper to ten people and asking them to provide the three dimensions (length, width, height) of a steel block will still result in a scattering of dimensions being recorded. Variations will result due to; different skill levels with the calliper, measurements at different places along the block surfaces, different pressures used on the calliper actuator and even rounding of readings if the vernier is mechanical instead of digital.

Analysis of the variations of sizing in NDT becomes important when the results are used with fracture mechanics to assess if a flaw can be allowed to remain in the part or not. Materials' properties are studied in mechanical laboratories to determine the maximum flaw size that will not result in the component failing. This is an important concept for NDT users as it stresses the opposite to what most people ask of NDT. Instead of "What is the smallest flaw a test method can find?" the real concern is, "What is the largest flaw that could be missed by the NDT method?"

When an NDT operator is asked to make a judgment about the size of the flaw the result is then compared to an allowable dimension chart. But if the NDT method is not exactly accurate then there is some possibility that the dimension estimated by the NDT operator is less than actually exists, i.e. the operator under sizes the flaw. The NDT operator's estimate then results in an underestimate of the severity of the flaw. By an analysis of the sizing technique, the fracture mechanics engineer can incorporate a safety factor in the calculations that provides for the potential that the sizing technique used might undersize a flaw.

As with POD calculations, the sizing estimates are analysed using a comparison to results obtained from destructive testing. One such analysis is illustrated in Figure 2. Figure 2 is a plot of the vertical extent of flaws that were sized by destructively examining the test sample and comparing those values to the NDT estimate of vertical extent. Destructive testing involves cutting and polishing or splitting the sample by means of a "nick break" and examining the flaws under a microscope to determine the maximum flaw size. This method too has some variability associated with it. It is the maximum overall value that is generally considered the reference to which the NDT size is compared. An ideal NDT method would plot all the NDT estimates along the line that equals the destructive test sizes. However, a scatter of estimates from the ideal is more typical of NDT methods assigned the task of sizing. This scatter is indicated in Figure 2.

Figure 2 also incorporates sizing tolerance lines. These are lines parallel to the ideal indicating the error in the size as estimated by NDT compared to the ideal or "assumed" correct size.

Fig 2: Plot of NDT Size versus Destructive Size of Flaws


Probability of detection and sizing tolerances are now commonly required in several industries. The aircraft industry, and in particular the military, use this type of information for damage tolerance analysis of their components and scheduling of inspection intervals. Nuclear industries are adapting this form of analysis to assess the reliability of NDT to detect flaws in components during in-service inspections. More recently the pipeline industry has relied on this type of analysis to develop fitness for purpose acceptance criteria for the construction of pipelines.

The statistical and probabilistic assessments of the NDT technique are useful tools. However, the process is time-consuming, costly and not absolute. Statistical assessment relies on the comparison of limited samples that must be verified using destructive methods. Although it is assumed that the destructive methods are the "true" values, here too there can be uncertainty and variation. Was the macro section made at the peak vertical extent? How precise was the measurement reported by the destructive test lab? If the flaw was long and there was depth variation to the upper edge does the destructive test reflect this? When length is reported what is the limit to which the destructive test can assess the flaw?

Probability of detection is also a function of what is defined as a flaw, and since variations in determining "actual" size of a flaw can exist due to the destructive testing methods, it follows that the POD results of NDT assessments are not solely a function of the NDT technique.

© NDT.net |Top|