AI-Supported Segmentation of Industrial CT Data

The segmentation of datasets has always been a vital part of many image or volume processing applications, especially regarding tomography data. Common approaches nowadays use either manually tuned methods or rely on techniques like Deep Learning. However, such methods often are useful for segmenting only components of a certainly well-deﬁned type which mostly sufﬁces for clinical data but not for industrial applications anymore. In order to overcome these limitations we propose an interactive approach to volumetric segmentation encompassing robust classiﬁers and localized volume processing. The resulting algorithm is ﬂexible enough to be used for a broad variety of different segmentation tasks while still generating high quality results. The local processing further enables the segmentation of larger volumes which cannot be handled by existing applications.


Introduction
In the field of image processing for computed tomography (CT), segmentation is a crucial step in many applications. Segmentation means the partitioning of a volumetric dataset into non-overlapping regions. This either splits the entire dataset into different parts or only selected components are extracted from the (three-dimensional) image. We focus on the latter. Prominent examples include the detection and extraction of organs from human CT scans and subsequent analysis for diseases. The detection of the exact location of organs or other biological material is also very important for planning minimally invasive operations. In industrial computed tomography, the segmentation of objects is of interest too for, e.g., visual inspection of deformed components, comparisons with simulations or applications in dimensional metrology. One of the major challenges in the industrial use case is the size of the scans: For a clinical dataset the size ranges typically from several hundred Megabytes up to few Gigabytes. For industrial CT the sizes can be significantly bigger due to higher resolutions and higher radiation levels, and with the ever-increasing effort in improving the scanners it is likely that they get continuously bigger. In this field, entire cars or small airplanes may be scanned at once [1], whose scan sizes range up from several hundred Gigabytes to even Terabytes. Yet another challenge is that there is a much higher variability of objects of interest in industrial scans compared to use cases where only a specific organ shall be segmented. Now, a multitude of different objects may be of interest for users which necessitates new and very general segmentation methods. Therefore, algorithms developed for clinical applications are unlikely to be applicable to such scans due to the large memory consumption problem not accounted for in clinical scenarios, and due to the large variety of different objects. Moreover, these algorithms may also explicitly depend on the object class like tumors which of course should not be searched for in, e.g., a scan of a car. In order to overcome some of these problems, machine learning is applied to create robust segmentation models. However, classical algorithms like adaptive thresholding, cf. [2,3], often do not produce good results due to imaging artifacts. Others, including works on atlas-based models [4,5], graph-based methods [6,7], evolving contours [8,9] or neural networks [10][11][12] require either big training datasets, a lot of memory at once, or both. Since the sizes of CT scans are growing much faster than hardware capabilities and since annotated training data is hardly ever available, there is a trend towards interactive procedures, e.g., [13][14][15]. The main focus of such publications, however, lies on flexibility rather than the processing of large volumes. This paper introduces a segmentation procedure which combines interactive user input, local processing and machine learning methods. The interactivity allows to flexibly apply the same algorithm on a variety of different domains without the need to change the algorithm, while robust classifiers enable stable results and the purely local processing ensures that the method stays applicable to volumes without size restrictions. To the best of our knowledge, this work is one of the first to be explicitly designed for generality and processing large and unique computed tomography scans, i.e., in scenarios where no training data is available.
The organization of this paper is as follows: In Section 2, we will describe our proposed segmentation procedure and apply it on an example dataset to demonstrate the interactive improvement over several iterations. Section 3 will describe an experimental evaluation of our method on selected datasets and also qualitative results from different domains and Section 4 will conclude our contributions.

Method Description
In our approach, we apply pool-based active learning which is depicted in Figure 1. There an oracle, e.g., a human operator, is required to select a set of seed voxels, i.e., voxels which the oracle deems useful for distinguishing between the object that 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022 shall be segmented and other regions. Additionally, the oracle needs to label them accordingly, i.e., either positively (belongs to the components unter investigation) or negatively (does not belong to said components). In a next step, parameters can be set which include the environment size K 1 ∈ 2N + 1 and the set of features to compute. Alternatively, the most suitable subset of features could be automatically selected, e.g., by the method presented in [16]. Then, local features are computed. In detail, centered around each selected seed voxel a local environment of size K 1 × K 1 × K 1 is extracted. From the voxels contained in the region features are extracted and concatenated into a feature vector. The feature set can be application-dependent, ours includes but is not limited to grayscale value statistics [17], a slightly modified version of the popular Local Binary Pattern [18] and inertia tensor features [19]. In practice we suggest that a domain expert selects a subset of such local features based on a priori knowledge. Using the overall set of obtained feature vectors, a model is trained with the goal of discriminating between the positive and negative feature instances. In our application, we chose a Support Vector Machine for this purpose since it is a binary classifier with beneficial properties including an inherent sparsity and the option of producing calibrated probability estimates as predictions. The training procedure also includes a K-fold cross validation step for hyperparameter tuning. After the training completed, the segmentation phase starts. There, we iterate in parallel over each voxel in the input volume and extract a local environment of a given size around it. For voxels where no full environment can be extracted, i.e., at the border regions of the volume, we set the value at these voxel positions in the output volume to zero. Given a local environment, we compute the same set of features and the trained model predicts a probability estimate indicating the likelihood that the currently processed voxel shall be segmented. This confidence value is assigned to the voxel position in the output volume. When the segmentation phase finishes, i.e., after one linear pass over the volume, the intermediate result can be evaluted, e.g., by means of visual inspection. If the result is satisfactory, the procedure ends. If not, the oracle can provide additional seed voxels which together with the previously selected ones form the new training dataset, and the method starts again. An open question is how to choose the samples the oracle shall label. A variety of methods exist, including uncertainty sampling or estimated error reduction [20]. However, given our use case of large industrial CT scans, only uncertainty sampling seems computationally feasible, i.e., we need to identify samples or regions the model is uncertain about. Fortunately, the probability estimates which are produced by the Support Vector Machine already hint the certainty of the model about each voxel. Furthermore, given a confidence value c ∈ [0, 1], an uncertainty value u ∈ [0, 1] can be computed by considering that the critical confidence lies at 1 /2, i.e., when the SVM prediction lies close to the trained separating hyperplane. In our setting, we compute u = exp(−g(|c − 1 /2|)) for some continuous and monotonically increasing function g : [0, 1 /2] → R ≥0 such that g(0) = 0 and lim x→1/2 g(x) = ∞. An example of the interplay of interactive seed voxel selection, predicted confidences and uncertainties is depicted in Figure 2.
There, we applied our interactive procedure to a Mahle motor piston dataset where our goal was to segment the steel ring from the aluminium piston body. Initially, we selected two seed voxels inside the ring regions and marked them positively, and two outside of it which are marked negatively. The selection is shown in form of blue diamonds. After applying the scheme described above, we obtain an output volume consisting of probability estimates which show that the model needs more samples as the confidences are higher outside the ring than in it. The uncertainty estimations also reflect this as they are quite high all over 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022 the volume. In the next iteration (cf. Figure 2b) we selected additional seed voxels (blue diamonds) and combined them with the previous selection (red circles). The results improve considerably as shown by the confidences and uncertainties. The next two iterations stabilize the result and the uncertainty is nearly gone, thus we can stop labeling voxels at this point. Simple postprocessing on the confidences yields the desired result.

Results
In this section, we will apply our interactive segmentation method to selected dataset and assess the segmentation performance both quantitatively and qualitatively. All datasets were generated by the Fraunhofer Development Center for X-ray Technology in Fürth and the data courtesy lies there as well. Exceptions to this are the scans labeled "Ring" and "Rumpf", which belong to a scan of a Komet Me-163 jet courtesy of Deutsches Museum München, Germany, and parts of a Peruvian mummy, courtesy of the Lindenmuseum in Stuttgart, Germany.

Quantitative Results
For a quantiative evaluation, we selected seven datasets for which ground truth data was available. These are depicted in Figure 3 and Figure 5a. In case of the piston dataset, we aimed to segment the steel ring from the aluminium piston as in our example above. The same was attempted for the "Big Piston" scan consisting of two copies of the same piston and an additional Region of Interest of another scan. This scan was created with the intent of enlarging the volume size while also inserting components from a similar grayscale distribution so components are not only selected by their grayscale value. For the "Tasterwald" scan, the goal was to extract the small spheres while for "TPA" the shown part should be segmented in presence of high noise variance (not displayed for visualization purposes). Finally, for the "Ring" and "Rumpf" parts we aimed to segment the components overlaid in white. For each volume, we additionally selected a set of seed voxels just as we would interactively and fixed that set as well as a dataset-dependent set of features. Using this, we apply our proposed segmentation procedure, after which we applied speckle removal. That is, we iterate over the confidence value result and set the value of all voxels to zero which have less than η nonzero neighbors in a K 2 × K 2 × K 2 environment centered around it. The used parameters are listed in Table 1. Finally, a connected components analysis groups voxels and the components overlapping with positively labeled voxels are extracted. These components then are evaluated against the ground truth data for the well-known performance metrices of Intersecton over Union (IoU), precision, recall and the F 1 score. The quantitative results are listed in Table 2. We see that, according to these metrics, good segmentation results can be obtained using our approach. At this point, we want to remind the reader that voxelwise comparison metrics are sensitive to individually misclassified voxels. For another dataset we further measured the execution times of our approach which was implemented purely CPU-based inside a Fraunhofer framework and executed on a Dell Precision Tower 3620 having an Intel ® Core™ processor with eight logical cores and an average clock rate of 4.2 GHz. The volume sizes herefore are specified in Table 3. The measurements are depicted in Figure 4a. Additionally, we also specified dataset-dependent thresholds below which all voxels are skipped and thus no classification computations need to be made there. That way, the processing can be accelerated significantly in interactive applications as is reflected in Figure 4b. In both images, the feature set encompassed grayscale value statistics and inertia 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022   features [19], the runtime might increase or decrease depending on the computational complexity of the features as they have to be computed for every processed voxel environment. The bar plots per volume coarsely indicate the fact that our algorithm is linear with respect to the number of voxels. However, note that additional implementation details might influence the measurements including caching done by the operating system or scheduling operation for memory access. Overall, we achieve good runtime efficiency given the large dataset size on a purely CPU-based system.

Qualitative results
In addition to the quantitative evaluation, we now apply our method to unique scans where no training data is present. As before, we selected seed voxels interactively and defined a dataset-dependent skip threshold interactively as well. The configuration is listed in Table 4. For each result, we applied speckle removal and a connected component analysis as postprocessing. The results are visualized in Figure 5.
In contrast to segmenting the springs in the "Ford" scan, we now apply the same algorithm with a similar parametrization yet with different seed voxels on that scan in order to segment the windshield. The result shows that this thin planar structure can be detected well by our algorithm. In a similar fashion, we applied that method for segmenting a deformed part of a CT scan of a Honda Accord. Although that part has a very complex global structure, local patches can be successfully identified with each other and discriminated from others in the scan, thus allowing a good result. Outside of the industrial domain, we further used our interactive method to extract a wheat plant from a scan of the plant inside a pot filled with earth. The challenge there lies in the complex geometries of the roots and the low contrast between the roots and the surrounding earth. Even without dedicated models tracking root geometry, we were able to capture even most of the finer roots well by just interactively selecting voxels from and outside of them. In the final example, we used our proposed scheme to segment a scan of a Peruvian mummy into the ropes which hold it together and the result shows both the coarse ropes forming a net around the body and also the fine ones wrapped around its neck.

Comparisons to Related Works
In addition to our quantitative and qualitative results, we compared our algorithm to similar segmentation methods implemented in the freely available tools open_iA, FIJI and Slicer3D. Specifically, we tried to segment the springs of the "Ford" scan but downsampled by a factor of two in each direction, thus processing a volume of roughly 700 Megabytes since none of these tools were able to process volumes of the original 5 Gigabytes size on the evaluation hardware described above. With our method (in conjunction with postprocessing as described earlier), we were able to segment all four springs by just marking positive instances in one spring and some negative instances around it. The resulting evaluation yields a precision of 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022 0.912, a recall of 0.893, a F 1 score of 0.902 and an Intersection over Union of 0.822. In the tool open_iA, we chose the probabilistic voxelwise segmentation via SVM. Their implementation uses the current voxel value exclusively and hyperparamters have to set manually which is no easy task on its own. We varied the latter over a predefined set similar to the grid of parameters from which our hyperparameters are chosen automatically using a grid search approach. In addition, we used both the same seed voxel set as in our experiment and also newly chosen ones. Regardless of the setting combinations, our achieved result is an unstructured set of nonzero probability estimates located all over the surface of the object but not identifying any springs at all. Next, we applied the region growing algorithm inside Slicer3D where we marked seeds inside the springs and outside with respective class labels. Contrary to our approach, a user needs to specify seeds for each spring, instead of finding all four with just informations learned from one. The implementation itself has trouble processing the growing process for larger volumes, which is the primary reason we downsampled the input volume too 700 Megabytes in the first place. For this tool, we obtained a precision of 0.939, a recall of 0.699, a F 1 score of 0.801 and an IoU of 0.668. These results are not bad but inferior to our segmentation performance in this case. Finally, we used the SVM-based classifier inside FIJI which interprets the volume as a stack of images. The classifier allows for selecting a feature set interactively including grayscale statistics and, e.g., Hessian eigenvalue features. The major drawback of FIJI is that an image stack of the same size as the input volume is constructed in memory which is filled with feature vectors. Thus, a massive memory overhead is necessary and FIJI required 4.7 Gigabytes in main memory for a 78 Megabytes volume and our already downsampled Ford scan was not processable. Overall, three popular and freely available software tools either are not well suited for segmenting industrial computed tomography data or have deficiencies when processing large voxel volumes.

Conclusion
This work introduced an interactive segmentation approach based on a Support Vector Machine. Its speciality is a purely local processing which enables the segmentation of volumes without size restrictions. Additionally, the interactivity allows passing the problem description and domain knowledge ad hoc into the system and thus the very same method can be applied to a variety of domains. A quantitative and qualitative evaluation demonstrated that very good results can be obtained while the user interaction stays intuitive and that repeated application of our algorithm can improve segmentation results continuously. Whe finally want to emphasize that our method is especially well suited to also segment scans for which no ground truth data exists, and the good results indicate that this method may very well be used for an efficient ground truth data generation requiring only minor manual correction.