06:52 Jul-03-2003 Ed Ginzel R & D, - Materials Research Institute, Canada, Joined Nov 1998 ^{1208}

Request for references on POD

I am directing this question to Terry Oldberg as he has made some previous remarks on the matter, but other contributions are welcome too. Mr. Oldberg: I have noted that you have made several contributions to NDT.net concerning probability and NDT. Recent discussions in one of my present areas of activity (ultrasonic inspection of pipeline girth welds) seems to be dominated by concerns for statistical assessment of our techniques.

My observations indicate that several people extrapolate Probability of Detection concepts to Probability of Sizing correctly. I have a difficult time relating Thresholds of "Detection" to what would be a subsequent process, i.e. Sizing.

Could you please recommend some SIMPLE references that would help me get a better background to the principles.

07:27 Jul-03-2003 Terry Oldberg Engineering, Mechanical Electrical Nuclear Software Consultant, USA, Joined Oct 1999 ^{42}

Re: Request for references on POD Hi Ed

I'm not exactly sure what you are looking for. However, if you seek to understand the statistics of NDT reliability, I recommend that you reject almost everything that is in print on this topic and build up your understanding from first principles. This is not intellectually difficult. The difficult part for people seems to be freeing their minds of the notion that people in positions of power and authority in NDT have handled this problem correctly.

I would start with an understanding of elementary Measure Theory. It is described on Page 30 of Paul Halmos's book "Measure Theory" ( Springer-Verlag, New York, 1974). As it is a graduate text in mathematics, Halmos's book might seem intimidating. However, the content of Page 30 requires understanding only of some elementary concepts from set theory and modern algebra that are developed on Pages 1-29.

Probability is an example of a measure. You need to understand that its value on a set that is called the "sample space" is 1. The sample space is the set of all possible outcomes of an event. For example, in a coin flip, it is the set {heads, tails.} In studies of the reliability of tests, it is often the set {false positive, true positive, false negative, true negative}.

This then leads into the topic of classical statistics. In the empirical world, there are physical objects, called "sampling units," each of which has a count of 1. This count of 1 is the physical counterpart of the value of 1 for the probability of the sample space. In a coin flip, the sampling unit is the coin. For the "detect and measure" test of NDT, this question often lacks a satisfactory answer. Lazar Mayants book "The Enigma of Probability and Physics" might help you better understand the connection between probability theory and the empirical world.

People working in NDT have undermined their field by assuming that probability theory applies in situations in which the count of each sampling unit is not 1. In this situation, the axiom that says the probability of the sample space is 1 is empirically invalidated. To understand how this can happen, in general, I would read Christensen and Reichert's paper "Unit Measure Violations in Pattern Recognition: Ambiguity and Irrelevancy," in the journal Pattern Recognition, Oct. 1976. The phrase "Unit Measure" references the axiom of probability theory that says probability's value on a sample space is 1. This axiom, Unit Measure, is what is violated by NDT. To understand how Unit Measure violations arise in NDT, read Oldberg and Christensen's paper "Erratic Measure" at http://www.ndt.net/article/v04n05/oldberg/oldberg.htm ; so far as I know, it is the only, published work on this topic.

In the period since "Erratic Measure" was published, I've learned that people ( including at least one university professor with a PhD in physics) generally need a more tutorial approach that is available in the published form of the paper. If you need a tutorial, I'd be happy to send you a cost proposal for providing same.

By the way, as your main interest is inspection of girth welds, I should mention that the ASME Section XI Committee has done something interesting on inspection of girth welds in the austenitic piping of nuclear plants. The Section XI Code specifies that, in trials of the inspection techniques, the weld shall be divided into "grading units" and that the purpose of the inspector shall be to assign a value to a property of each grading unit. "Grading unit" is the ASME's term for sampling unit.

This test preserves Unit Measure and generates a statistically valid portrait of the reliability of the inspection. It might be useful for you to set the goal of understanding why this is so and why it is not so for the typical "detect and measure" test. If everyone understood this, we would be well on our way toward placing NDT on a sound, statistical foundation.

Terry Oldberg

: I am directing this question to Terry Oldberg as he has made some previous remarks on the matter, but other contributions are welcome too. : Mr. Oldberg: : I have noted that you have made several contributions to NDT.net concerning probability and NDT. : Recent discussions in one of my present areas of activity (ultrasonic inspection of pipeline girth welds) seems to be dominated by concerns for statistical assessment of our techniques. . : My observations indicate that several people extrapolate Probability of Detection concepts to Probability of Sizing correctly. I have a difficult time relating Thresholds of "Detection" to what would be a subsequent process, i.e. Sizing. . : Could you please recommend some SIMPLE references that would help me get a better background to the principles. . : Sincerely : Ed Ginzel .

03:21 Jul-03-2003 David Forsyth R & D TRI/Austin, USA, Joined Nov 2001 ^{41}

Re: Request for references on POD I have some disagreements with Terry Oldberg's paper, or at least with my understanding of it. Since this is a relevant topic, I would like to take up the discussion, in the hopes that this can be enlightening.

The paper is available at: http://www.ndt.net/article/v04n05/oldberg/oldberg.htm

I have no concerns with anything up to the section entitled "Nuclear Confusion". And in this section, I agree that the word "defect" has been used incorrectly by many. Discontinuity is more appropriate for the description of an NDI "indication" or "hit". A discontinuity is a defect if the structures engineer says so: That is, is the discontinuity renders the object under inspection unable to perform its intended function, such that the object is defective, then the discontinuity is a defect.

This renders the rest of this section difficult to read. Paragraph 14 mixes the "defect" with the word indication. My understanding of this paragraph is:

"...If an NDI indication of a defect was within 3 inches of an actual defect (as determined by metallography), this NDI indication was assigned to a true positive. Otherwise, he assigned it to a false negative. He then divided the number of NDI indications he had assigned to a true positive by the total number of defects (as determined by metallography) and assigned this ratio to the Probability of Detection."

Then in paragraph 18, again I find the terminology difficult. My understanding of this paragraph is as follows.

"However, Bradley's rule of evidence produces a conflict with this assumption. It assigns every NDI indication to a true positive if it is within 3 inches of a defect (as determined by metallography). The same rule assigns every such indication to a false negative also."

This is where I do not follow the chain of reasoning: I do not see why the single NDI indication is assigned to both categories. I have not seen a POD study where this is the case. I do not have a copy of Bradley's study at hand to examine.

It is my understanding that this double counting is what Oldberg and Christensen identify as the problem with most POD studies. I do not think this is done in most POD studies.

I hope this stimulates some discussion!

Regards, Dave

: Hi Ed . : I'm not exactly sure what you are looking for. However, if you seek to : understand the statistics of NDT reliability, I recommend that you reject : almost everything that is in print on this topic and build up your : understanding from first principles. This is not intellectually difficult. : The difficult part for people seems to be freeing their minds of the notion : that people in positions of power and authority in NDT have handled this : problem correctly. .

02:41 Jul-04-2003 Terry Oldberg Engineering, Mechanical Electrical Nuclear Software Consultant, USA, Joined Oct 1999 ^{42}

Response to Dave's questions about "Erratic Measure" Hi Dave

Thanks for the opportunity to clarify "Erratic Measure."

On the semantic problems, the paper uses "defect" in the sense that you use "discontinuity." It uses "indication" to designate a subset of the inspection data that is felt by the inspector to constitute an image of a defect.

For the future, we need to adopt a common useage of these terms in order to communicate without confusion. In the future, I'll assume you are adopting the useage of "Erratic Measure" unless you tell me otherwise.

I gather that something in addition to semantic confusion is bothering you. I think you are wondering why, in its analysis of Bradley's data, the paper assigns each detected defect to both a true positive and a false negative. The paper does not do this in the same manner as Bradley. Bradley assigned each, detected defect only to a true positive. The double counting is a feature of "Erratic Measure" and not of Bradley's methodology. My co-author and I are the culprits in the double counting and not Bradley.

The reason lies in the rule which Bradley and his colleagues used in classifying events. They assigned a defect to a true positive if it was within 3 inches of an indication and to a false negative otherwise. However, it can be determined from Bradley's data that, in each case in which a defect was within 3 inches of an indication, this same indication was within 3 inches of at least one more defect.

That a single indication detected two or more defects is a violation of the Unit Measure axiom of probability theory. Thus, although Bradley used it, probability theory cannot be used as a description of the inspection reliability in this context.

There are many solutions to the problem of describing a statistical system in the presence of Unit Measure violations. The solution that is used in "Erratic Measure" borrows an idea from Christensen and Reichert's paper "Unit Measure Violations in Pattern Recognition: Ambiguity and Irrelevancy" (in the journal Pattern Recognition, Oct. 1976.)

One abandons Unit Measure but retains measure theory plus the feature of probability theory in which the probability of an event varies between 0 and 1. The new measure, called "pseudoprobability" in "Erratic Measure," expands and contracts in length like a rubber yardstick. With this, particular, yardstick, each, detected defect generates a count of 1 for a true positive and a a count of 1 for a false negative for every, detected defect.

If you think this sounds illogical, you are correct: While probability theory is consistent with classical logic, pseudoprobability theory is not. The source of the illogic is not the authors of "Erratic Measure," however. The source is the designers of the test.

"Erratic Measure" exposes another way in which tests violate Unit Measure that you didn't mention. I'd be glad to discuss this with you if you wish.

One or the other or both ways of violating Unit Measure are present in many and perhaps all defect detection tests. This is veryundesirable, for it limits what we can know about these tests' reliability. The best, long term solutions is to redesign NDT's tests so they all preserve Unit Measure.

Terry

: I have some disagreements with Terry Oldberg's paper, or at least with my understanding of it. Since this is a relevant topic, I would like to take up the discussion, in the hopes that this can be enlightening. . : The paper is available at: : http://www.ndt.net/article/v04n05/oldberg/oldberg.htm . : I have no concerns with anything up to the section entitled "Nuclear Confusion". And in this section, I agree that the word "defect" has been used incorrectly by many. Discontinuity is more appropriate for the description of an NDI "indication" or "hit". A discontinuity is a defect if the structures engineer says so: That is, is the discontinuity renders the object under inspection unable to perform its intended function, such that the object is defective, then the discontinuity is a defect. . : This renders the rest of this section difficult to read. Paragraph 14 mixes the "defect" with the word indication. My understanding of this paragraph is: . : "...If an NDI indication of a defect was within 3 inches of an actual defect (as determined by metallography), this NDI indication was assigned to a true positive. Otherwise, he assigned it to a false negative. He then divided the number of NDI indications he had assigned to a true positive by the total number of defects (as determined by metallography) and assigned this ratio to the Probability of Detection." . : Then in paragraph 18, again I find the terminology difficult. My understanding of this paragraph is as follows. . : "However, Bradley's rule of evidence produces a conflict with this assumption. It assigns every NDI indication to a true positive if it is within 3 inches of a defect (as determined by metallography). The same rule assigns every such indication to a false negative also." . : This is where I do not follow the chain of reasoning: I do not see why the single NDI indication is assigned to both categories. I have not seen a POD study where this is the case. I do not have a copy of Bradley's study at hand to examine.

. : It is my understanding that this double counting is what Oldberg and Christensen identify as the problem with most POD studies. I do not think this is done in most POD studies. . : I hope this stimulates some discussion! . : Regards, Dave

08:06 Jul-08-2003 Dave Forsyth R & D TRI/Austin, USA, Joined Nov 2001 ^{41}

to POD or not to POD Thanks Terry for the quick and detailed response, I have some points which you could clarify for me. I have not quoted your response for the sake of brevity, it is available on the forum at http://www.ndt.net/wshop/forum/messages-1/5684.html

I will use the term defect in the text below to remain consistent with your paper, although I prefer discontinuity. As an example, the American Airlines flight 587 Airbus 300 aircraft had a number of known discontinuities in the vertical stabilizer, but it was not defective.

I have not read Bradleys paper, but it sounds like they were not very stringent in deciding whether the NDI indication actually corresponded to a defect. That certainly can artificially inflate POD. I am not sure of what can be gained from the pseudo-probability analysis, I do not know how this can be used in a risk assessment, which is usually the goal of the whole exercise of estimating NDI reliability via POD.

If the problem is this rule of assigning NDI indications to physical defects, do you feel is it possible then to be more rigorous in this rule and satisfy probability theory?

You do imply there are other issues: Erratic Measure exposes another way in which tests violate Unit Measure that you didnt mention. Id be glad to discuss this with you if you wish.

I would like to hear more on this, I think it is relevant to the forum.

09:54 Jul-08-2003 Terry Oldberg Engineering, Mechanical Electrical Nuclear Software Consultant, USA, Joined Oct 1999 ^{42}

Re: to POD or not to POD Dave:

The situation in Bradley's research was that the position of the eddy current probe was known to the researchers only to within plus or minus 3 inches measured along the axis of the tube being inspected. Bradley assumed (quite illogically) that any defect that could have been detected by an indication was detected by it. However, whenever a single, large, defect was within 3 inches of an indication, this indication was within 3 inches of at least one other, large defect, plus lots of small defects and unknown numbers of the defects that were left undiscovered by the limited amount of metallography that Bradley performed. Thus, corresponding to every, certain event of a true positive was not the single sampling unit that was required by probability theory but many sampling units.

The violation of probability theory would have been averted had the inspectors been required to estimate the x,y,z coordinates of a point within the detected defect and had the researchers counted a defect as detected if and only if it had contained such a point. This, however, would have resulted in a POD for large defects of close to 0. Bradley's methodology extracted a value for the POD of a large defect that was close to 1. However, this "probability" was not a probability because of the violation of probability theory.

The seeds of this same category of error are present in trials of defect detection methods, in general, for real materials contain a very high density of defects (most of them small and non-threatening) and a mechanism for determining the identity of the one being detected is often absent in modern instrumentation.

The other way in which Bradley's results violate probability theory relates to the certain event of a false positive OR a true negative (logical OR implied). Such events were present but the number of sampling units corresponding to each such event was 0 and not the 1 that was required by probability theory. Thus, in this case also, probability theory was empirically invalidated.

This latter type of violation seems to exist in all, published research on the reliability of defect detection methods. How to define the sampling units underlying the probability of false call seems to be an unsolved problem.

IQI in double wall radiography Dear friends and colleagues

In double wall radiography of pipes, diameter of IQI wire shall be based on which of these: 1- Single wall thickness 2- Double wall thickness I have problem for this selection. Logically single wall thickness is the correct selection. But double wall thickness is more practical. I will be pleased, if you write me your idea and experience in this case. Please write me which standards suggesting single wall thickness as basis and which of them suggesting double wall thickness as basis for sensitivity calculation and IQI selection. My email is edalatik@yahoo.com.au