TABLE OF CONTENTS |
The paper exposes an inconsistency in the field of non-destructive examination (NDE) and points out the dangers of the confusion which is created by it. The physical objects which occupy a statistical population occupy a partition of the complete set of physical objects under consideration. However, many of NDE's procedures test physical objects which do not occupy such a partition. The inconsistency is set up when NDE's scientists represent physical objects which do not occupy partitions as the elements of populations in their studies of NDE's reliability. The bogus populations invalidate the definition of probability as a measure of an event whose value on a certain event is 1.
With probability invalidated as a measure of a test's reliability, NDE's scientists have proceeded by representing a different measure of an event as a "probability." The value of this measure on a certain event varies between 0 and 2 in one U. S. Nuclear Regulatory Commission study. This makes the reader who interprets the study's measure as a probability wrong on a number of counts. For example, the apparently perfect certainty which accompanies a value of 1 for the study's Probability of Detection is actually perfect ambiguity because this measure's value on the certain event of a defect is 2. Dire consequences are imaginable if a reactor engineer were to act on the USNRC's representation.
The authors recommend avoidance of the consequences of this and other instances of confusion by the translation of past reporting. The translated reports would use a language which discriminated proper from improper probabilities and pop-ulations. For the longer term, they recommend the refocusing of NDE to yield only proper probabilities and populations.
Probability is a measure of an event whose value on a certain event is 1 (Halmos, 1950). However, certain irregularities in the design of a study's statistical population can yield a "probability" whose value on a certain event varies significantly from 1. Like the yardstick whose "yard" varies significantly from the orthodox 0.9144 meters, this "probability" has the capacity to seriously mislead people. That it has been the unsuspected measure of the reliability of non-destructive examination (NDE) in error-sensitive areas of engineering prompts the following warning.
Engineers use the radiographic testing, ultrasonic testing and other methods of NDE to diagnose problems with structures which function under mechanical stress. They are particularly apt to use it in situations in which the failure of a structure might cause harm. However, NDE can itself cause harm when it is in error. Thus, engineers have been moved to establish the probabilities of error in the various methods of NDE. These "probabilities" are sometimes not probabilities.
This paper has three parts. The first part, Probability versus Pseudoprobability, establishes and contrasts two measures of an event. Probability is consistent with population while pseudoprobability is consistent with irregular or pseudo-population. While both measures are legitimate, they are different measures. Therefore, when the literature labels a study's pseudoprobability as a "probability," its reader can be seriously misled. The second part, Nuclear Confusion, exposes this kind of confusion in a U. S. Nuclear Regulatory Com-mission study. In relation to dangerous levels of damage to a nuclear reactor component, its Probability of Detection is a pseudoprobability whose value on the certain event of a defect is 2. Thus, the value of 1 which the agency reports for the "probability" of detecting a dangerous condition in a nuclear reactor suggests perfect certainty that the test is valid but is actually perfect ambiguity regarding whether the test is valid.
The third part, Conclusions and Recommendations, warns readers to expect similar surprises throughout NDE's literature. It recommends that NDE's literature be translated into the language of pseudoprobabilistics so that it does not mislead people. The paper closes by urging that NDE be restructured to define actual populations and probabilities.
In Sampling Techniques, the statistician William Cochran provides a rule which yields a population when it is followed.
Before a study's sample may be drawn, its population must be divided into the physical objects which are called units. The units must "... cover the whole of the population and they must not overlap, in the sense that every element in the population belongs to one and only one unit (Cochran, 1977)." Cochran is stating that a study's units must occupy a partition of their population. In the remainder of the paper, we will endeavor to explain what this means.
Fig 1: A partition.The class of two triangles is a partition of the rectangle. | Fig 2: A pseudopartition. The class of two circles is a pseudopartition of the rectangle. |
A class {A1, ..., An} of sets A1, ..., An is said to be a partition of set A if every element of A belongs to a set in {A1, ..., An} and no two sets in {A1, ..., An} have an element in common. Figure 1 shows a class of two triangles which is a partition of a rectangle. In this example of a Venn diagram, a class is represented by a set of geometrical objects, a set by a single geometrical object and an element of a set by a point within a geometrical object's boundary.
We will reference a class {A1, ..., An} as a pseudopartition of a set A if it is not a partition of A. Figure 2 shows an example of a pseudopartition. The class of two circles is a pseudopartition of the rectangle. Note the region of overlap between the two circles and the region of the rectangle that is not covered by any circle.
In this paper, the entity which Cochran terms an "element in the population" will be termed a data point. In the study of testing reliability, each data point is a pair of numbers rep-resenting the tested and true values of the property of a physical object which is measured by the test. The complete set of possible data points is termed a study's sample space. An event is a subset of a sample space.
Fig 3: The sample space which pertains to the study of testing reliability and the most popular partition of it. |
In characterizing the reliability of a test, the test's investigator selects a partition of the sample space. Then he estimates a value for the probability of each event in the partition. Though a variety of partitions are available, investigators usually select the simplest. Its events are termed a true positive, a false negative, a true negative and a false positive. Figure 3 shows the sample space which pertains to the study of testing reliability and this partition of it.
The relative frequency of an event is the number of physical objects which participate in the event divided by the number of data points in the sample space. The probability of an event models the relative frequency of the same event. Conversely, the value of the relative frequency of an event provides empirical verifiability for the value which some theory claims for the probability of this event. Empirical verifiability is the hallmark of science (Popper, 1959).
From the standpoint of empirical verifiability, a certain event has particular significance, for the value of its probability is 1 by the definition of probability. The relative frequency of a certain event is the number of physical objects which participate in the event divided by the number of data points in the event. For example, in the testing of patients for the virus Hepatitis B, the relative frequency of the certain event of a true positive is the number of true positive patients divided by the number of true positive outcomes. A relative frequency of a certain event whose value is 1 is consistent with the definition of probability. A relative frequency of a certain event whose value is not 1 is inconsistent with the definition of probability.
The relative frequency of a certain event is a measure of the degree of coverage of this event by physical objects. When the relative frequency of every event in a partition of a study's sample space is 1, this signifies that each physical object in the complete set of them participates in one and only one event. If this is true, the class of sets of physical objects which participate in the various events is a partition of the complete set of physical objects.
There are two other possibilities. In the first possibility, there are physical objects which participate in more than one event in a partition of the sample space. One says, in this case, that the sets of physical objects corresponding to these events overlap. In the second possibility, fewer physical objects participate in an event than there are data points. One says, in this case, that the set of physical objects corresponding to this event under covers the event. If the class of sets of physical objects corresponding to the partition of a sample space exhibits under coverage or overlap, it is a pseudopartition of the complete set of physical objects.
A pseudopartition of the complete set of physical objects is inconsistent with the empirical verifiability of a probabilistic model whereas a partition is consistent with the empirical verifiability. As empirical verifiability is the hallmark of science, great scientific importance attaches to differentiating the pseudopartition from the partition. In normal statistics, one terms a physical object which belongs to a partition a unit, a set of them a population and a subset of a population a subpopulation. No uniform terminology has emerged in abnormal statistics so we will coin a terminology. We will term a physical object which belongs to a pseudopartition a pseudounit, a set of them a pseudopopulation and a subset of a pseudopopulation a pseudosubpopulation. To complete the analogy, we define pseudoprobability as a measure of an event whose value on a certain event equals the value of the relative frequency of this event. The value of the pseudoprobability of a certain event is empirically verified by the relative frequency of the same event. Thus, a pseudopartition of a complete set of physical objects is consistent with the empirical verifiability of a pseudoprobabilistic model.
The physicist Lazar Mayants has noted that the discipline which includes the concepts of probability, partition, population, subpopulation and unit is empirically verifiable and can therefore be considered a science (Mayants, 1984). He terms this science probabilistics. It may be noted that the discipline which includes the concepts of pseudoprobability, pseudopartition, pseudo-population, pseudosubpopulation and pseudounit is also empirically verifiable and thus also a science. We will term this science pseudoprobabilistics.
Though probabilistics and pseudoprobabilistics are each internally consistent, their concepts cannot be mixed. Thus, it is important to keep the concepts separate linguistically.
One way in which the concepts can be confused is by the representation of pseudoprobability as "probability." This kind of confusion has arisen in a study of the U. S. Nuclear Regulatory Commission (USNRC).
E. R. Bradley and his colleagues studied the reliability of NDE in a nuclear power reactor's steam generator tubes (Bradley et al., 1988). The tubes have the import of containing a reactor's cooling water while transferring the reactor's heat to the outside. Possible consequences from the rupture of tubes include the meltdown of the affected reactor's core and release of its huge inventory of radioactive materials into the biosphere.
Nonetheless, the tubes have proven susceptible to weakening by corrosion. A number of tubes have burst during reactor operations and the USNRC has reacted by imposing periodic inspections of the tubes by NDE. Bradley's has been the only substantial study of the reliability of these inspections.
Each tube's inspector scans it with a remote sensor. The sensor is inserted into the bore of each tube which has been slated for inspection. Then it is translated along this tube's axis. As it moves, the sensor emits a stream of data. The data stream is decoded by the inspector.
Decoding produces a set of discrete indications of damage. Each indication contains an estimate of the radially directed penetration of the tube by corrosion, the identity of the tube and the axial position of the sensor. The indications are recorded.
Bradley asked various teams of inspectors to test a prescribed set of tubes under the rules by which they would be inspected in practice. These rules had been written by the American Society of Mechanical Engineers (ASME) and enacted by the USNRC. Then he and his colleagues dismantled the tubes in an attempt at establishing the reliability of their inspection. In modeling the reliability, Bradley selected the usual sample space partition, that is, the one containing a true positive, a false negative, a true negative and a false positive.
In the mechanical trades and in statistics, the event of a true positive or a false negative is often termed a defect. The event of a true negative or a false positive may be termed a non-defect by extension. The two events of a defect and a non-defect form a partition of the sample space in the study of testing reliability.
A simplification arises when a defect and a non-defect are taken to be certain. With a defect certain, the probability or pseudoprobability of a true positive and the probability or pseudoprobability of a false negative sum to the relative frequency of the certain event of a defect. As the two probabilities or pseudoprobabilities are dependent, the reliability of a test may be characterized in terms of only one of them. Bradley bases his characterization on the probability or pseudoprobability of a true positive. He terms this measure the Probability of Detection.
Similarly, with a non-defect certain, the probability or pseudoprobability of a true negative and the probability or pseudoprobability of a false positive sum to the relative frequency of the certain event of a non-defect. As the two probabilities or pseudoprobabilities are dependent, the reliability of a test may be characterized in terms of only one of them.
In reading Bradley's report, one notices an oddity in the use of terminology which clouds understanding of whether the complete set of physical objects under the ASME-USNRC test contains units or pseudounits. The physical objects are identified as "defects" but a defect is the name of an event and not a physical object, as several statisticians take pains to tell their readers (Bowker et al., 1972; Juran, 1974).
However, in a recent paper Perdijon observes that in NDE, "defect" often designates an object which has a definite boundary in space. He suggests that the term "discontinuity" be reserved for such an object (Perdijon, 1993). In certain cases, a discontinuity's boundary surrounds some material. In these cases, a physical object is defined by the boundary.
This conception of the "defect" fails, however, to resolve the question, for Bradley's "defects" are nearly all the voids which are formed on the exterior of a tube by corrosion. A void is a definite boundary surrounding no material and is not physical. Such an object possesses no properties of its own for NDE to measure.
If they have no properties, however, Bradley's "defects" do not support the establishment of the property values which Bradley reports. Thus, we gather that Bradley is imputing to the discontinuities the properties of the physical objects in which they are embedded. However, we conclude, there is so little difference between these physical objects and the discontinuities that Bradley does not feel obliged to observe a distinction. According to this theory, Bradley's "defect" is a discontinuity's boundary plus a thin layer of material which adheres to its exterior. The layer is thin enough that the dimensions of a "defect" are essentially the same as the dimensions of the discontinuity which is embedded in it. However, this "defect" is a physical object because of the material. We will adopt this theory of Bradley's "defect" in proceeding with our analysis of his findings, while placing quotation marks around the word to remind our readers that it designates a physical object and not an event. When the word designates an event, it will be represented in italics as a defect.
Bradley's "defects" were defined by metallurgists performing destructive tests after the tubes had been inspected. The 108 "defects" were quite short, with a median, axially directed length of 1 inch. At ± 3 inches, the axially directed positional uncertainty of each indication was quite long by comparison.
Thus, every indication falling on a "defect" might not have referenced this "defect." Conversely, every indication falling off a "defect," but within 3 inches of it, might have referenced this "defect." Because of the great positional uncertainty, then, the question of whether an indication falling within 3 inches of the end of a "defect" referenced this "defect" or did not reference it had a completely ambiguous answer.
Bradley dealt with this problem in way which is highly significant to the theme of this paper. If a "defect" had an indication within 3 inches of it, he assigned it to a true positive.
Otherwise, he assigned it to a false negative. He then divided the number of "defects" he had assigned to a true positive by the total number of "defects" and assigned this ratio to the Probability of Detection.
A result of this methodology is presented in Bradley's report as a graph with the Probability of Detection as the ordinate and Metallurgical Wall Loss as the absissa. Metallurgical Wall Loss is the degree of penetration of a tube's wall thickness by a "defect." This graph is presented as this paper's Figure 4.
Fig 4: Graph of Probability of Detection versus Metallurgical Wall Loss (Bradley et al., 1988.) |
Bradley's probability estimate results from a methodology in which he assigns a "defect" to a false negative if he does not assign it to a true positive. This methodology is consistent with the assumption that the ASME-USNRC test defines a partition of the complete set of physical objects under testing. Bradley suggests this to be so, for he states that
An inspected unit of material can always be placed into one of the following categories:
This is a description of a partition.
However, Bradley's rule of evidence produces a conflict with this assumption. It assigns every "defect" to a true positive if it is within 3 inches of an indication. The same rule assigns every such "defect" to a false negative also. Let us examine the consequences of this ambiguity.
Bradley's Probability of Detection (see Figure 4) is either the probability of a true positive or the pseudoprobability of a true positive. It is a probability if the relative frequency of the certain event of a defect is 1 and a pseudoprobability otherwise. Let n_{tp} designate the number of "defects" Bradley assigned to a true positive and n_{fn} the number he assigned to a false negative. Let f_{d} designate the relative frequency of the certain event of a defect. Then
(1) |
We have doubled the number of "defects" Bradley assigned to a true positive in computing the number of physical objects corresponding to a true positive or a false negative because Bradley's rule of evidence assigns a "defect" to a false negative each time it assigns one to a true positive but Bradley did not make the assignment to a false negative.
Bradley estimates the Probability of Detection in Figure 4 from
Probability of Detection = | (2) |
Combining Equations (1) and (2), one concludes that the relative frequency of the certain event of a defect is given by
One sees that the value of the relative frequency of the certain event of a defect rises from 1 (when the Probability of Detection is at its minimum value of 0) to 2 (when the Probability of Detection is at its maximum value of 1.) When the Probability of Detection has a value of 0, it is a probability. When the Probability of Detection has a value which is not 0, it is a pseudoprobability.
The condition that the Probability of Detection has a value of 1 is particularly pertinent as this is its value, according to Figure 4, at dangerous levels of damage. Under this condition, the Probability of Detection shown in Figure 4 is a pseudoprobability with a value of 2 on the certain event of a defect. Thus, the value of 1 for the Probability of Detection at dangerous levels of damage should be viewed as 50% of the value of the pseudoprobability of the certain event of a defect and not the 100% of the value of the probability of the certain event of a defect which Figure 4's choice of language implies.
50% of a pseudoprobability with a value of 2 on the certain event of a defect tells a far different story than 100% of a probability. In particular, the value of 2 for the pseudoprobability of the certain event of a defect indicates complete overlap of the pseudosubpopulations corresponding to a true positive and a false negative. The value of 1 for the Probability of Detection at dangerous levels of damage indicates nothing more than that 50% of the pseudounits corresponding to a defect are assigned by the ASME-USNRC test to a true positive. The other 50% are assigned by this test to a false negative. There is complete ambiguity regarding whether this test is valid in diagnosing dangerous levels of damage, but there seems from the use of language to be complete certainty that the test is valid. A reactor engineer who took the complete ambiguity of Figure 4 to be the complete certainty its language implies might well commit a disastrous error.
The remaining portion of Bradley's methodology relates to the certain event of a non-defect, that is, a true negative or a false positive. Bradley finds that there are numerous indications without "defects" within 3 inches and assigns them to a false positive. He assigns no object at all to a true negative. He suggests no value for any probability of a false positive or a true negative but suggests this to be inconsequential because "...the safety issue involved in NDE reliability does not depend on the amount of nondefective material correctly passed, but whether the defective material is identified."
Let us analyze these results from the standpoint of probabilistics and pseudoprobabilistics. Though Bradley reports no value for any probability or pseudoprobability of a true negative or a false positive, his data provide the basis for establishing exact values for them. The analysis which leads to these results starts with the calculation of the relative frequency of the certain event of a non-defect.
Bradley's assignment of his count of indications beyond 3 inches from a "defect" to a false positive means that there are countable data points in a non-defect. While there are data points, there are no physical objects. The relative frequency, however, is given by the ratio of physical objects to data points.
It follows that the value of the relative frequency of the certain event of a non-defect is 0.
As the value of this relative frequency is not 1, any measure of a subset of a non-defect must be a pseudoprobability and not a probability. It follows that a true negative and a false positive have pseudoprobabilities. The values of these pseudo-probabilities sum to 0.
One knows from the definition of a measure (Halmos, 1950) that the two pseudoprobabilities have values greater than or equal to 0. It follows that the pseudoprobabilities of a true negative and a false positive have the value of 0.
While Bradley dismisses the absence of any probability of a true negative or a false positive from his report as unimportant in a safety study, this expresses the opinion that there is absolutely no value in keeping nuclear power generating equipment operating, for it is well known that any test's probability of a true positive may be improved arbitrarily, at the expense of increasing its probability of a false positive. This can be accomplished, for example, by issuing indications at randomly chosen locations of a steam generator tube. A less value-laden explanation of the absence would be that the ASME-USNRC test is inconsistent with the definition of probability.
The irregularities in Bradley's pseudopopulation may be represented by Figure 2's Venn diagram. In this representation, the two circles represent the "defects" which correspond to a defect. One of the two circles represents those "defects" which are assigned to a true positive by the ASME-USNRC test. The other circle represents those "defects" which are assigned to a false negative by the test. The area of the rectangle which is not covered by a circle represents the event of a non-defect. It is not covered by physical objects. The region of overlap between the two circles is nil when the level of damage in a steam generator tube is nil. When the level of damage becomes dangerous, the two circles overlap totally.
The "defects" of the ASME-USNRC test, then, form a "population" which combines complete lack of coverage with complete overlap. Yet this extremely irregular population has been the support for estimates of "probabilities" which have been published for the consumption of nuclear reactor engineers by the USNRC! This incident prompts our interest in where else this kind of confusion may lie hidden.
There is reason to believe that this is not an isolated instance of a severely irregular population. The potential for them lies in the persistent definition of NDE as a field which detects "defects" in materials (Weismantel, 1975). Under pressure to establish NDE's reliability, NDE's scientists have taken these "defects" to occupy the populations of their studies. For example, the authors of The Reliability of Non-destructive Inspection have stated only that "defects" occupy the populations (Silk et al., 1987). Others have offered the equivalent view that flaws occupy the populations. The USNRC's research director has joined a former president of the ASME in stating this position (Beckjord, 1991; Fernandes, 1994). A scientific advisor to the USNRC on the reliability of NDE in nuclear reactors has offered this same view (Bush, 1991). So have the authors of a study of NDE's reliability in a British nuclear reactor (Cartwright et al., 1988). However, these views have definitely been wrong with respect to the kind of test which assigns "defects" to data points by the proximity of "defects" to indications. This has been the usual kind of test in the non-destructive examination of nuclear reactor components. The ASME-USNRC test of nuclear steam generator tubes provides an example.
Each such test assigns a "defect" to a false negative when it assigns it to a true positive or assigns no physical object to a non-defect. The degree of under coverage of a non-defect may be decreased by a loosening the rule of evidence which binds an indication to a "defect," but this increases the degree of overlap between the sets of physical objects corresponding to a true positive and a false negative. Whether there is under coverage, overlap or both, the "populations" of these "defects" are pseudopopulations.
The architects of tests of this kind might have responded to calls for establishing NDE's reliability by abandoning these tests or by writing the reports of studies of their reliability in the language of pseudoprobabilistics. However, they have responded in neither way. Instead, they have mixed the incompatible concepts of probabilistics and pseudoprobabilistics.
The result of this confusion can be dangerously misleading. Thus, we recommend its elimination by the translation of the reports of past studies in pseudoprobabilistics into the language of pseudoprobabilistics. The translation should be accomplished quickly, in view of the obvious dangers of continued confusion.
For the future, pseudoprobabilistic studies should be clearly discriminated from the probabilistic ones by the language of their reporting. However, there would be a problem in stopping with this reform only. Pseudoprobabilistics is not a substitute for probabilistics because probability measures risk but pseudoprobability does not. When the value for the pseudoprobability of a certain event falls below 1, this signifies a lack of evidence for the establishment of a level of risk. When the value for the pseudoprobability of a certain event falls above 1, this signifies ambiguity about the level of risk. Only probability measures the risks of NDE.
It is a pleasure to recognize Jean Perdijon of France's Commissariat A L' Energie Atomique for engaging in a correspondence that illuminated the paper's theme. Barbara Von Haunalter was gracious enough to donate her time to the preparation of graphic art.
NDT.net |