Synthetic X-ray Image Generation for Non-Destructive Testing using Generative Adversarial Networks

Developing and optimizing a Machine Learning (ML) based Non-Destructive Testing (NDT) tool for automotive manufacturing can be challenging due to the need for a significant amount of annotated training data. Obtaining such an extent of real X-ray images of different defects can be challenging and expensive. Hence, generating synthetic images with adequate defects is a viable option to fill this gap. This contribution presents a novel approach that harnesses Generative Adversarial Networks (GANs) to simulate industrial X-ray images of specific parts of a car wheel (rim and spokes) with defects and their annotations. The presented method draws inspiration from techniques used in medical healthcare radiography as well as the NDT field for simulating synthetic X-ray images. We propose a novel deep neural network architecture for generating high-resolution X-ray images of size 1024 x 512 pixels with porous defects in specified locations. These images are automatically generated from edge masks of the desired structure. We conducted separate evaluations for both, the generated images and the included defects. To assess the generated image quality, we used image properties and structural similarity metrics. The generated images achieved an MSSIM index of over 0.90. We analyzed the local gray-scale profile of synthetic images to evaluate the quality of generated defects. A state-of-the-art defect detector (ISAR, developed by Fraunhofer EZRT) was used to estimate defect detection based on various IoU thresholds.


Introduction
The process of die casting involves the injection of molten metal alloy under high pressure to fabricate diverse products, ranging from structural components of vehicles to wheels.During manufacturing, defects such as gas porosity, shrinkage, and cavity can compromise product quality.Industrial Radiography (IR) is a widely employed NDT method for assessing product quality.Fig. 2 depicts an example of an X-ray projected contrast-enhanced image of a wheel spoke, with small red boxes indicating defect areas identified through automated NDT inspection.Automated inspection utilizing classical image processing algorithms has so far been a standard practice in the casting industry [1].However, this approach has limitations, including a steep learning curve for end-users and the time-intensive process of parameterizing algorithms.Recent years have witnessed a paradigm shift from image processing based analysis toward Machine Learning (ML) based solutions, including Deep Learning (DL), which exhibit remarkable capabilities in addressing the complexities associated with defect detection.MLbased solutions offer promises to overcome challenges posed by classical image processing in NDT.However, their implementation relies heavily on the availability of appropriately labeled image data, which is often not readily accessible.Transitioning from classical image processing to ML-based solutions necessitates a highquality dataset that accurately captures location, size, and shape of defects on scanned images.Unfortunately, publicly available datasets meeting these criteria are scarce, and proprietary datasets are typically inaccessible.This scarcity underscores the need for a method to generate a synthetic image dataset suitable for training ML-based models in the NDT domain.Synthetic images, artificially generated by computer software or computational methods, have already gained attraction in various fields, including agriculture, healthcare, and NDT.The challenge lies in generating these synthetic datasets through computer graphics, classical simulations, image augmentation techniques, or generative methods using ML/DL.While classical simulation-based methods offer finer details (such as, e.g.Monte-Carlo simulations [2]), they have limitations such as time consumption and variability.Researchers have in the past years explored the potential of Generative Adversarial Networks (GANs) to generate synthetic image datasets, particularly in the field of medical image processing.This approach has shown promise in improving outputs for various tasks, such as the augmentation of chest X-rays for the detection and classification of lesions in the lung [3].Notably, GANs have addressed class imbalance issues in chest X-ray classification tasks [3] as well as detect malignancy in mammography images [4].The recent advancements in GANs, particularly in generating high-quality and diverse X-ray images with anomalies in medical imaging, show the potentials of generating high-quality image datasets for X-ray-based NDT.This introduces an exciting prospect for enhancing the robustness and breadth of datasets used in ML-based model development for NDT applications.

Principles and methods
Introduced by Ian Goodfellow et al. (2014), GANs have significantly advanced synthetic image generation [5].GANs, comprising a generator and discriminator, engage in a two-player minimax game, forming an adversarial relationship and challenging each other to create and distinguish between real and fake images.Despite their realism, GANs face challenges like training instability and mode collapse.Ongoing research has led to specialized variations, such as Pix2PixHD GAN, an enhanced version developed by Wang et al. [6].Notable modifications include a coarse-to-fine generator, a multi-scale discriminator, improved training stability, and instance normalization.The Pix2PixHD GAN outperforms its predecessors, generating highresolution images up to 2048 x 2048 pixels with finer details.The DetectionGAN by Posilović et al. [7], a derivative of Pix2PixHD, excels-, e.g., in the creation of synthetic ultrasonic scans and hence improves the object detector performance for defect detection.This image-to-image GAN utilizes a binary position mask to determine defect position and size in synthetic images, serving as ground truth for defect detector evaluation.Inspired by these works, our study employs a similar architecture.

Proposed Architecture
The GAN architecture, as depicted in Fig. 1, integrates two generators designed for distinct tasks.The spoke generator takes an edge image as input, producing a detailed spoke image, and the defect generator utilizes the spoke image and a mask to generate a final image with defects in the specified region.Four discriminators are strategically employed, comprising medium and large patch discriminators for the spoke generator and small and medium patch discriminators for the defect generator.
The discriminators are finely tuned to their respective tasks to optimize performance.Spoke discriminators exclusively evaluate real and generated images, emphasizing the generation of detailed spoke images.Conversely, defect discriminators assess real, generated, and mask images, focusing on finer details for precise defect generation.The training process involves generating loss values from discriminators for real and generated images, weighted accordingly to train the generators effectively.Drawing inspiration from Pix2PixHD, multiple PatchGAN-based discriminators with varying receptive field sizes (34 x 34, 70 x 70, and 142 x 142 pixels) are employed -referred to as small, medium, and large patch discriminators -each serving a distinct purpose.Aligned with the principles of the U-Net [8], the generator architecture employs strided and transposed convolution, presenting advantages over traditional max-pooling and upconvolution techniques.This deliberate design enhances the overall efficiency of the generators, contributing to the success demonstrated across diverse experiments in the proposed GAN architecture.

Dataset
This study utilizes an internal dataset from Fraunhofer EZRT, comprising X-ray-projected images focusing on the spokes of a specific wheel design (Fig. 2), each image containing at least one annotated defect.The dataset encompasses 2,170 gray-scale images, each with dimensions of 1024 x 512 pixels, recorded using a flat panel X-Eye4020 detector with a pixel size of 300μm and a bit depth of 16 bits.Annotations include defect locations, bounding box information, and defect sizes, predominantly featuring porosities.Fig. 3 provides an illustrative representation of patches with various types of defects, emphasizing the dataset's porositycentric nature.Fig. 4 showcases the distribution of defect sizes (blue) and bounding box areas (orange), highlighting that most defects fall under 80 pixels size.

Loss Function
The discriminator model is trained using the Binary Cross Entropy (BCE) loss function as provided in Equation ( 1), denoted as an 'adversarial loss', to distinguish between real and fake images.In addition to the adversarial loss, the generator is also trained using the mean absolute error loss, also known as L1 loss (Equation ( 2)), and denoted as the reconstruction loss, emphasizing differences between real and generated fake images.Combining the adversarial and reconstruction losses yields the generator's overall loss function, as shown in Equation ( 3).
Here, y and ŷ represent the training image and generated image, where G corresponds to the generator with edge (x) and defect (z) mask images.

Evaluation Metrics
We evaluated the quality of the generated images using the following metrics: • Image quality: Assessment based on pixel properties like mean and standard deviation value, comparing these properties between generated and real images.• Structural Similarity (SSIM): SSIM [9] and MSSIM [10] metrics measure image similarities.The SSIM and MSSIM are represented in Equation ( 4) and ( 5), respectively, where x and y are corresponding compared images.The μ and σ represent the mean and variance, respectively, where C1 and C2 are constants.The resulting metric values range from -1.0 to 1.0, with 1.0 indicating complete similarity, 0 representing no similarity, and -1.0 denoting inverse similarity.
• Intersection over Union (IoU): A key metric for object detection performance, IoU is calculated using bounding boxes of the predicted object B with the ground truth object A according to Equation (6).We used IoU to assess the position and size of the generated defects.

Results and discussion
Our developed methods can generate high-quality X-ray images with spokes and rims.In Fig. 6, two sample images are inferred from the saved model after training for 210 epochs.In Fig. 7, gray-scale line profiles serve as defect indicators.The spoke generator creates spoke images, and the defect generator generates defects at predefined locations and sizes.The spoke generator's image lacks defect-like properties in the profile, while the defect generator's corresponding image exhibits distinct defect characteristics.The gray-scale profile of a defect area shows an increase in brightness, signifying material loss, compared to neighboring pixels.Table 1 reveals the mean and standard deviation of the gray level values, which exhibit comparable values between generated and real images.The synthetically generated images demonstrate a higher similarity, as the SSIM and MSSIM values indicate.To further elucidate, SSIM and MSSIM values were averaged across 244 images generated by the GAN, and their distribution is illustrated in Fig. 8.The proposed GAN generates images containing defects at predetermined locations, and their detection is assessed using ISAR (Intelligent System for Automatic X-ray inspection) [1] as an evaluation metric.A set of 128 synthetically generated images featuring 1,025 defects was subjected to the analysis by ISAR, yielding a list of detected defects and their locations.Subsequently, the IoU was computed based on the defect lists from GAN-generated and ISARdetected defects.Table 2 summarizes the results, depicting the percentage of GAN-generated defects recognized at various IoU threshold values.The findings indicate that 54% of GAN-generated defects are identifiable by ISAR at an IoU threshold of 0.25.However, the recognition percentage gradually diminishes with increasing IoU thresholds, implying substantial disparities between ISAR-detected and generated defects concerning size and shape.While providing insights into the spatial distribution of generated defects, this outcome suggests the potential for further optimization of ISAR image processing parameters to achieve improved IoU values, admitting the risk of increased false detections.

Conclusions
In conclusion, this research leverages a GAN to generate a synthetic dataset of X-ray images, specifically incorporating defect annotations.The primary objective is to address the scarcity of appropriately annotated training data for developing ML-based NDT systems within casting industries.The proposed GAN architecture successfully produces high-resolution X-ray images with defect annotations, offering adaptability in controlling spoke orientation and defect quantity.Furthermore, the spoke generator demonstrates the capability to generate defect-free images, enhancing its utility in tasks requiring such datasets.While the generated images exhibit realism regarding visual perception, pixel distribution, and structural similarity, a nuanced scrutiny under contrast variations reveals discernible differences in fine detailed characters on the spokes compared to real images.The evaluation to detect the generated defects using the ISAR system demonstrates that the proposed approach to simulate realistic defects by a GAN is quite powerful, hence challenging the detection capability of a traditional image processing based approach.This work addresses immediate challenges in NDT, laying the groundwork for advancements in ML-based systems within casting industries.The proposed methodology holds promise for enhancing dataset availability and supporting further research endeavors in the broader domain of NDT applications.

Fig. 1
Fig. 1 Simplified architecture of proposed GAN

Fig. 2 Fig. 3 Fig. 4
Fig. 2 Two sample images from the dataset depicting projective X-rays of the rim-spoke connection of an aluminum wheel.The red labels indicate defects.

Fig. 5
Fig. 5 Gray-scale line profile of defects in real images

Fig. 6
Fig. 6 Example of inferred images from the trained model, red boxes indicate defects

Fig. 7
Fig. 7 Gray-scale profile of defects on images inferred from the spoke and defect generator