Towards automatic crack segmentation in 3d concrete images

Concrete is one of the most commonly used construction materials. A deeper insight into its mechanical properties, in particular cracking behaviour, can be gained from stress tests. Computed tomography captures the microstructure of building materials, including crack initiation and propagation in a fully three-dimensional manner. However, the complex microstructure of concrete renders crack segmentation a very challenging task. Both, the validation of segmentation methods and the training of machine learning approaches, are hindered by the lack of reliable ground truth segmentations for real data sets. To overcome this problem, a novel procedure for generating pairs of semi-synthetic images and ground truth was introduced by the authors in a previous study. Using this semi-synthetic data, Hessian-based percolation and 3d U-net were identiﬁed as the most promising of eight approaches for crack segmentation. Here, we discuss adaptions of the methods that allow for a handling of additional features observed in real computed tomography data of concrete, in particular local variations in crack thickness.


Introduction
Being a standard construction material in civil engineering, concrete needs to be of high quality and durability.These key properties are typically investigated via stress tests [1,2]: a controlled stress is applied to a specimen until damage or failure are observed.A commonly used testing technique to study concrete strength is the tensile test: both ends of the test specimen are fixed by the testing machine and tension is applied.When exceeding the tensile strength of the material, cracks emerge and propagate through the concrete.Visual inspection of the concrete samples only allows for an investigation of cracks at the sample surface.A natural approach to analyze crack structures in 3d and non-destructively is via micro-computed tomography (µCT) imaging.A characterization of crack characteristics such as shape, length or thickness requires a segmentation of the cracks in the 3d images.This is complicated by the fact that concrete, as a composite material, features a very heterogeneous structure consisting of cement matrix, aggregates, reinforcement fibers, and air pores.Depending on the concrete composition and test setup, cracks are of varying size and shape and exhibit a complex morphology.Due to this variability, methods for automatic crack segmentation have to be particularly robust.Obviously, simple thresholding is not sufficient as air pores and cracks typically have similar grey value ranges.There is a rich set of crack segmentation methods for 2d images of concrete.Most of them can be categorized into gray value distribution-based approaches [3,4], edge detection filters [5,6], graph-based algorithms [7,8], region-growing algorithms [9], and machine learning methods [10,11].An overview over 2d crack segmentation methods can be found in [12].The availability of 3d CT images of concrete is still limited.Therefore, crack segmentation in 3d images of concrete is a rather novel field of research.Due to the increase of dimension, mostly thin, lamellar (2d) structures with a complex connectivity need to be considered.Therefore, adaption of 2d methods is not straightforward.Yet, several methods have been developed.For example, template matching, sheet filter, Hessian percolation, and minimal paths algorithms have successfully been applied to crack structures in 3d images of concrete [13][14][15][16].Furthermore, methods from the field of medical image processing can be adapted to planar structures.These include edge detection filters such as the sheet filter [17] and the Frangi filter [18] as well as 3d convolutional neural networks such as the 3d U-Net [19,20].In a previously conducted simulation study [21], eight 3d crack segmentation methods were evaluated and compared.These include classical image processing methods as well as machine learning approaches.The methods were tested on semi-synthetic image data with cracks simulated via fractional Brownian surfaces [22], a novel way for simulating 2d crack structures in 3d.The corresponding ground truths enable an objective method comparison.Very thin cracks and low absorbing air pores were the major challenges.The two best-performing methods are Hessian-based percolation and 3d U-net.In [21], the test data set was limited to fixed crack widths to investigate the effect of parameter choice for the methods.Here, the approaches are validated based on 3d CT data of real-world concrete samples with locally varying crack structures.The test data set includes two normal concrete (NC) samples and two high-performance concrete (HPC) samples.It turns out that both methods have to be extended in order to deal with the increased complexity of the real data compared to the simulated samples.The paper is structured as follows.In Section 2, the concrete samples are described.Concrete composition and the experimental 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022setup for crack initiation are summarized.Section 3 provides details on µCT imaging and the reconstructed image data.We proceed with introducing the two image processing pipelines in Section 4. Segmentation results for the real data are discussed in Section 5. Section 6 serves as conclusion and gives an outlook on future research.

Concrete samples
The cylindrical test specimens with a diameter of 48.0 mm consist of a normal-strength (normal concrete -NC) and a highstrength (high performance concrete -HPC) fibre-reinforced concrete.An amount of 2.0 % by volume of fibres made from glass fibre reinforced polymer (GFRP) are added to each concrete mix.The load is transferred via glued-in steel pipe cross-sections with an internal diameter of 49.6 mm (Figure 1, left).A hole is drilled above the pipe cross-sections.A bolt with an M10 joint head is fixed in the hole.The test specimens are clamped in a tensile testing machine with a maximum tensile load of 10 kN via the rod end (Figure 1, right).The objectives of the test series are to open the initial crack force of the different clay compositions and to achieve a defined crack width between 0.1 and 0.4 mm.The tests are run at a speed of 0.05 mm/min.The tensile force is recorded by the testing machine at a measuring rate of 5 Hz.The displacement is recorded by 4 inductive displacement transducers with a measuring rate of 10 Hz.The mean displacement is calculated from two opposing displacement transducers.The average crack width is calculated from the difference between the two average displacement measurements.Figure 2 presents the force-deformation curves of both fibre-reinforced concretes.Up to the initial crack, the material behaviour is linear elastic.After the initial crack, the force falls abruptly until the GFRP fibres absorb the force (linear drop).The separation crack opens depending on the fibre slip.After the force is absorbed by the fibres, all tests are stopped.With the HPC, initial cracking forces are between 5.6 kN -7.1 kN.After fibre slip, an average crack width of 0.1 -0.37 mm is observed.In NC, initial cracking forces between 3.39 -5.30 kN are obtained.After fibre slip, average crack widths are between 0.13 -0.72 mm.The results presented in Figure 2 indicate that average crack widths in the range of 0.1 -0.4 mm can be set for a fibre-reinforced high-performance concrete and that the fibres can absorb the tensile force after a maximum fibre slip of 0.25 mm.In the case 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022 of a normal-strength concrete, five out of nine test specimens show the desired crack width range of 0.1 -0.4 mm.In four test specimens, the fibre slip led to the target crack width range being exceeded.

CT imaging and image data
Each sample is scanned by the laboratory µCT device at Fraunhofer ITWM, Kaiserslautern, Germany.A Feinfocus FXE 225.51 X-ray tube with maximum acceleration voltage 180.3 kV and maximum power of 12.7 W, and a Perkin Elmer flat-bed detector XRD 1621 with 2 048 × 2 048 pixels were used.The tube voltage is set to 180 kV and the integration time is 1 second.Tomographic reconstructions are obtained from 1 200 projections.Size and gray value range of the resulting images are given in Table 1.To evaluate our methods, cropped versions of the images that contain most of the crack structure are considered.Figure 3 contains example slices of the 3d images.Min-max normalization to an 8-bit gray value range is applied on each sample as a preprocessing step.

Gray value range
Size  For NC12 we observe a connected crack through the xz-plane.Its width varies from 1 to 30 voxels across all z-slices.Numerous crack branches emerge at locations were an abrupt change in direction is made.HPC11 contains a connected crack in the xz-plane.In a fixed slice, the crack width is rather constant.However, it continuously decreases from front (z = 0: 7 voxels) to back (z = 880: 1 voxel).Only few crack branches exist.Sample NC2 includes a connected crack in the xy-plane.The crack width ranges from 1 to 10 voxels, continuously increasing from front (y = 0) to back (y = 1 000).Few crack branches exist.HPC1 contains a connected crack in the xy-plane with varying width of 1-20 voxels.The crack is rather thin at the front (y = 0) and its width increases continuously until reaching the back (y = 1 050).Numerous crack branches appear, especially at locations where fibers and aggregates intersect crack structures.

Methods and procedures
In [21], crack segmentation methods are validated on a set of semi-synthetic images.These have been designed to resemble real images very well.Nevertheless, there remain many aspects in which the real 3d CT images of concrete differ from the semi-synthetic test images.Thus, new challenges arise and the methods need to be adapted.
11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022 In particular, our sample data set suggests that crack widths vary strongly.Sometimes they transition from one scale to another, e.g. when rather thick cracks branch and very thin cracks (1-2 voxels) emerge at these locations.Therefore, our methods need to be adapted to multi-scale cracks.
In addition, the samples contain very small, highly absorbing artifacts distorting the gray value histogram.Thus, the methods need to be robust with respect to image contrast.We now summarize the two methods and describe the adaptions applied to accommodate to the varying thicknesses.In the following, let I : R 3 → R be a 3d image.

Hessian-based percolation
Hessian-based percolation (HP) [9,13] is a region growing algorithm consisting of two steps: First, the Hessian matrix of the input image is computed.Based on a geometric interpretation of its eigenvalues, filters for edge detection are derived.Thresholding preselects a set of candidate voxels.Second, a percolation process is started from every candidate voxel.The process iteratively collects a set of neighboring voxels with respect to a dynamic threshold.The voxels are then considered to be crack voxels if they match a fixed shape criterion.Candidate voxels have been computed successfully via the sheet filter [13,17] and the Frangi filter [18].We obtained results of similar goodness in a large-scale simulation study [21].The results were evaluated in terms of several goodness measures such as recall and precision.We found that the Frangi filter yields slightly superior results with respect to the recall values.A high recall means that the number of missed voxels is minimized but background noise may be falsely detected.As Hessian-based percolation is able to remove these falsely detected voxels, we choose the Frangi filter for the computation of candidate voxels.The filter is defined as follows.Let H(p, σ ) be the Hessian matrix of image I at voxel p, obtained from the convolution of I with the 2nd derivatives of the Gaussian kernel with scaling parameter σ ∈ R ≥0.5 .Further let with and weighting parameters α, β , η σ > 0. It is often recommended to choose α = β = 0.5, independently of the image content and the choice of σ .The parameter η σ depends on scale by setting η σ = 2(max p R(p, σ )) 2 .To account for multiple scales, the Frangi filter is defined as where 0.5 ≤ σ min ≤ σ max ∈ R. The values of F(p) lie in [0, 1).The higher F(p), the more likely p belongs to a crack structure.We consider voxel p to be part of the preselection set H if F(p) ≥ t 1 for some threshold t 1 ∈ R.
Hessian-based percolation then consists of the following steps: 1. Set P = {p} for some p ∈ H and t = I(p) + ε for some ε ∈ R.
2. For each neighbor q of P: Add q to P if I(q) ≤ t.
3. Update t via t = max (max q∈P I(q),t) + ε. 6. Repeat 1-5 for every p ∈ H. Discard voxels that have been detected less than t 2 ∈ N times.

Neural networks: 3d U-net
Neural networks have proven to solve various segmentation tasks in image processing and computer vision precisely and robustly.One of the most popular types of neural networks are convolutional neural networks (CNN).For a more detailed introduction to the world of CNNs, we refer to [23].Convolutional neural networks have been applied to CT data of concrete in [15].Yet, the focus of this work lies in the separation of concrete into distinct components: cement matrix, aggregates, and pores.Generally, data with available ground truth is required for training and evaluation.Here, the training set consists of realistic semi-synthetic images, each one of size 256 3 .These images are created by simulating cracks of fixed width (1, 3, and 5 voxels) 11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022using fractionial Brownian surfaces and integrating them into CT scans of concrete without prior cracks [21].These images also include crack junctions and branching in various planes, to mimic the same effects found in real data.Among many available architectures of neural networks, we choose to use the U-net [19] in its 3d version [20].This neural network has originally been developed for biomedical images and has also been applied in materials science [24].It has a characteristic symmetrical U-shape consisting of an encoder for image compression, a bottle-neck for capturing context with dropout probability 0.5, and a decoder for image decompression.Another characteristic of U-net is that the features from each level of the encoder are utilized in its counterpart level of the decoder during the decompression step in order to improve crack localization.The network's output is a probability map of the same size as the input image with values between 0 (almost surely background) and 1 (almost surely crack) which is usually thresholded with τ = 0.5 to get the segmentation.The parameters of the network are selected during the training process which aims at minimizing a loss/error function on a given set of semi-synthetic training data.We choose the binary weighted-cross entropy as loss function across all voxels, where D(I) is the domain of the images, and I p ∈ [0, 1] and I t ∈ {0, 1} represent U-net prediction and ground truth, respectively.Weights w are assigned to each voxel to compensate for underrepresented classes or to help the U-net learn certain (usually rare) phenomena better.We give larger weights to the cracks and pores compared to regular cement matrix and aggregates.Pores have been segmented for this purpose by simply thresholding with a user-defined threshold.Finally, weights for cracks and pores are set based on their proportions in the training set as n voxels /(n cracks + n pores ), while the remaining weights are set to 1.Note that pores still have the same label as cement matrix and aggregates, but since pores and crack share a similar gray value distribution, this helps the U-net differentiate between them.Due to memory restrictions, the U-net is applied to image patches of size 64 3 .Images are thus tiled into patches of this size.
To reduce possible edge effects, patches overlap by 14 voxels.Training data is augmented with rotation, flip and downscaling (zoom 0.5 and 0.25) operations to ensure robustness.In total, the U-net was trained on 3 456 images for 20 epochs with batch size 2. The Adam optimization algorithm [25] was used with a learning rate of 0.001 which is cut in half every 5 epochs.

Post-processing strategies
Segmentation methods may not produce perfect results in all cases.This is especially true for the cases where a range of varying concrete types is analyzed.We discuss two strategies to reduce noise or wrongly segmented voxels based on connectivity exploiting the assumption that cracks are connected continuous structures.Hysteresis thresholding is based on two thresholds τ 1 > τ 2 .First, τ 1 is applied to the image to obtain an initial set of foreground voxels.Then, the set is grown iteratively such that neighboring voxels satisfying I(p) ≥ τ 2 are also included in the foreground.This idea can be useful for both approaches described in this paper.In particular, the output of both the U-net and the Frangi filter step of Hessian percolation is a probability map with values in [0, 1] i.e. the higher the value of the map, the more likely it is that the voxel belongs to the crack.Hence, two thresholds can be selected intuitively to improve the quality of the segmentation and reduce noise and artifacts.Furthermore, if the number of cracks in an image is known and the corresponding segmentations are connected structures, this knowledge can be used to remove erroneously segmented structures by extracting the largest connected components.

Application to real concrete data
Due to the absence of ground truths, we restrict to a qualitative inspection of results.Parameters are chosen according to the findings from [21].Here, we focus on the extensions of the methods needed to process cracks in real CT images.

Hessian-based percolation
Note that the Frangi filter is, per definition in Equation ( 2), constructed to detect structures with widths in [2σ min , 2σ max ] [18].A voxel preselection is obtained via the Frangi filter with parameters α = β = 0.5, σ min = 0.5, and σ max ∈ [3.5, 15], depending on the specimen's (approximate) maximal crack width determined manually.In practice, the Frangi filter can only be computed for a finite number of values for σ and we set the σ -step size to 0.5.We found that η σ = 2(max p R(p, σ )) 2 used in [18] is not a robust choice with respect to outliers in the histogram of R. In our CT images, these outliers coincide with highly absorbing artifacts as in Figure 4, left.These regions have comparatively high gray values distorting image contrast.Hence, we observe λ i ≪ 0 within these regions.Consequently, the term based on η σ in Equation ( 1) would suppress thin crack structures too much.Hence, we choose η σ = 2(R 0.99 (p, σ )) 2 where R 0.99 (p, σ ) denotes the 99%-quantile of the empirical distribution of R. For a visual comparison of both approaches, see Figure 4.It shows that thinner structures are detected more accurately when using the quantile-based approach.Then, hysteresis thresholding is applied with τ 1 ∈ [0.7, 0.9] and τ 2 ∈ [0.25, 0.3] to obtain candidate voxels.The percolation parameters are chosen as ε = 0, F 3D = 0.2,W ∈ [2, 4] and t 2 = 0.

3d U-net: Pyramidal image representation for handling a wide range of scales
Image pyramids [26] are a faster alternative to scale-space theory and multiresolution approaches when processing multiscale structures.This image representation is based on the operator REDUCE which reduces/downsamples the image size by a fixed positive scalar M by using interpolation.As a result, we get a sequence of images (S k ) k=0,...,n called pyramid levels for which S 0 = I and S k = REDUCE(S k−1 ) for k > 0. In our case M = 2, spline interpolation of order 3 is used, and the number of pyramid levels n is set to 2 (see Figure 5).Since our U-net has been designed to perform crack segmentation on a 64 3 cube and is trained on cracks of widths 1, 3, and 5, the pyramid representation comes as a natural tool to adapt the U-net for larger variations in crack thickness (between 1 and 20 voxels).The REDUCE operator shrinks thicker cracks to thinner ones and suppresses thin and fine cracks on higher levels of the pyramid (Figure 5).For example, a crack of width 10 in the input image I will have a thickness of 2.5 in S 2 .In this way, fine and thin cracks will be segmented at the lower levels, while thicker cracks are segmented at the higher levels of the pyramid.Hence, the strategy is the following: Apply U-net to each level of the pyramid, upsample the results to the initial image size, and fuse the upsampled images into the final crack segmentation image.For the fusion step, we use the voxelwise maximum across all pyramid levels.This means that a voxel needs to be labelled as a crack voxel at only one pyramid level in order to be classified as a crack voxel in the final segmentation.

Results
Hessian percolation and the U-net are applied to all four available µCT images.To suppress false detections in the background, we extract the largest 26-connected component to obtain the segmented crack structure.Volume renderings are given in Figure 6.We can see that both methods extract the crack shapes equally well.However, on some crack surfaces, we observe holes which originate from abrupt changes in crack orientation or air pores (Figure 6, 2nd row).HP seems to handle this effect better than U-net.However, U-net seems to be better at handling very thin cracks (Figure 6, 1st row).Furthermore, the renderings show that the low absorbing fiber edges are often segmented as well.This holds true especially for long fibers.Locally, these fibers are hard to distinguish from crack structures.In addition, they often intersect cracks structures.Therefore, they cannot be discarded using connectivity.Air pores are segmented rarely.Due to their shape, pores are typically not considered candidate voxels by the Frangi filter.Moreover, most of them do not intersect crack structures and, if segmented previously, are discarded by extracting the largest connected component.Slice views (Figure 7) confirm these observations.Here, we can see cracks of multiple widths being detected by both methods   11th Conference on Industrial Computed Tomography, Wels, Austria (iCT 2022), www.ict-conference.com/2022(e.g. Figure 7, 1st row).Furthermore, abrupt changes in direction (e.g. Figure 7, 3rd row, 4th row) as well as crack branching (e.g. Figure 7, 1st row, 4th row) appear as striking characteristics.The methods perform considerably well in these cases.However, problems arise when these phenomena occur in combination with very thin cracks (e.g. Figure 7, 3rd row, image center).Individual sections from NC12 which illustrate strengths and weaknesses of HP and compare it to U-net are given in Figures 8  and 9.They can be considered representative for the whole data set.For HP, the Frangi filter applies Gaussian pre-smoothing on the image.Therefore, most parts of the crack are segmented but the roughness of its edges is, in some parts, not captured well.This happens especially for thicker structures with abrupt changes in direction.Some parts can be improved by the percolation step (Figure 8) but problems still remain for thicker cracks that do not propagate straight through the concrete (Figure 9, right).For U-net, this effect is less pronounced.However, at points of abrupt change in direction, cracks are sometimes not segmented correctly (Figure 9, right).
As it turns out, the Frangi filter tends to miss out on single parts of branching structures.Holes appear.The percolation procedure in HP is able to fill these holes.This holds true for thinner structures as well, see Figure 8.However, some structures with a width below 1 voxel are still not segmented (Figure 9, left).U-net performs better at this task, but thin cracks are still not necessarily detected as connected components (Figure 9, left).This is also apparent in full slice views and renderings.Thin cracks (1-2 voxels) are, as expected, hardest to segment.In some cases they are too small to be imaged well with the chosen resolution.
As a result, they end up being brighter than thicker cracks.In addition, they are often disconnected which makes it hard for the methods to segment accurately.Crack volumes and surface areas per method are summarized in Table 2.It suggests that, in most cases, crack volumes are higher for U-net than for HP while surface areas are lower.This means that HP segmentations are rougher.Note that filter responses in Equation ( 1) can be computed simultaneously for multiple σ .Thus, run-times have been optimized by parallelizing the Frangi filter.Total runtimes for each input image vary between 16 and 29 minutes.

Conclusion
Challenges are numerous when segmenting 3d images of concrete.In a previous setup with semi-synthetic data, the most promising methods and suggestions for their parametrization have been identified.In this work, we observe that these methods need to be adapted to real CT images of concrete.In particular, we recognize that varying crack widths are one of the major challenges when transitioning from artificial to real data.To overcome this challenge, first adaptions of the algorithms are made.As it turns out, these adaptions ensure that cracks of multiple widths are segmented successfully.Still, we found that problems arise especially for thin cracks, background artifacts, and cracks with abrupt changes in direction.Mostly small sections of the images are affected.Yet, the methods need to be refined to account for these problems.Further attention should be directed to analyzing holes in the segmented crack surfaces and to potentially consider closing them.Note that the fibers used in our samples are GFRP fibers.In particular, their edges consist of low absorbing polymer.Thus, fiber edges locally appear as dark, plate-like structures in the reconstructed images.This makes it hard to distinguish them from crack structures.The necessity of their segmentation depends on the individual application.In practice, GFRP fibers are only used occasionally while steel fibers are abundant.Our methods yield promising results for the majority of crack structures.This is confirmed visually by 3d renderings and 2d slices.However, ground truths are needed for analyzing the goodness of results objectively.In the future, a quantitative study based on semi-synthetic images containing cracks of varying widths and featuring the observed sudden changes in orientation needs to be conducted.In addition, segmented cracks need to be characterized regarding thickness, direction, and -if applicable -their relation to the system of reinforcing fibers.

Figure 2 :
Figure 2: Force-deformation curves of tensile test for HPC (left) and NC (right)

Figure 8 :
Figure 8: Section from NC12 with a curved crack structure and thin branching (left).Sections after applying the Frangi filter with hysteresis binarization (middle) and HP (right).
For U-net, the run-time should be analyzed in 2 steps: training and prediction step.The training step is the most time consuming part which depends on the size of the data-set.In our case, the training procedure lasts around 5 days.For the prediction step, parallelization is easily achievable through the pyramid representation of the image: levels of the pyramid can be simultaneously processed by the U-net.Hence, run-times become proportional to the processing time of level S 0 of the pyramid i.e. the original image I. Depending on the size of the image, run-times for U-net vary between 1 and 2 hours.

Figure 9 :
Figure 9: Sections of NC12 (left) and corresponding sections obtained from Hessian-based percolation (center) and U-net (right).Left: segmentation of a thin (1-2 voxels) crack.Right: Thick crack (up to 30 voxels) with abrupt changes in direction.

Table 1 :
Information on the µCT image data.

Table 2 :
Segmented crack volume V (in mm 3 ) and crack surface area S (in mm 2 ) for each sample and method.