US 12,146,838 B2
	Deep learning-based crack segmentation through heterogeneous image fusion
Wei Song, Tuscaloosa, AL (US); and Shanglian Zhou, Tuscaloosa, AL (US)
Assigned to THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ALABAMA, Tuscaloosa, AL (US)
Filed by THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ALABAMA, Tuscaloosa, AL (US)
Filed on May 28, 2021, as Appl. No. 17/333,252.
Claims priority of provisional application 63/031,862, filed on May 29, 2020.
Prior Publication US 2021/0372938 A1, Dec. 2, 2021
Int. Cl. G06T 7/55 (2017.01); G01N 21/88 (2006.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06T 7/00 (2017.01)

CPC G01N 21/8851 (2013.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06T 7/0004 (2013.01); G06T 7/97 (2017.01); G01N 2021/889 (2013.01); G01N 2203/0062 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/20221 (2013.01)]

12 Claims

1. A method for detecting cracks in road segments comprising:

receiving raw 3D range data for a first image by a computing device from an imaging system, wherein the first image comprises a plurality of pixels;

receiving raw 2D intensity data for the first image by the computing device from an imaging system, wherein the raw 2D intensity data have pixel-to-pixel location correspondence with the raw 3D range data, wherein the pixel-to-pixel location correspondence between the raw 2D intensity range data and raw 3D range data generates spatial co-location features that each uniquely identify a portion of a road segment that each pixel of the plurality of pixels correspond to;

generating fused data containing the spatial co-location features by integrating the raw 3D range data and raw 2D intensity data to generate fused data for the first image by the computing device, wherein the fused data is generated directly from the raw 3D range data and the raw 2D intensity data without any preprocessing or filtering of the raw 3D range data or the raw 2D intensity data;

providing the co-location features to a deep convolutional neural network (“DCNN”) by the computing device; and

receiving a label for each pixel of the plurality of pixels from the DCNN by the computing device, wherein a received label for a pixel indicates whether or not the pixel is associated with a crack, wherein the raw 2D intensity data and the raw 3D range data each comprise a same number of pixels, wherein the DCNN is an encoder-decoder network that receives the raw 2D intensity data and raw 3D range data with the pixel-to-pixel location correspondence, wherein the encoder-decoder network comprises an encoder module and a decoder module, wherein the encoder module comprises two branches, wherein each branch of the encoder module contains a plurality of convolutional layers, wherein the decoder module contains a same number of transposed convolutional layers as either branch of the encoder module, wherein the first branch of the encoder module receives the raw 2D intensity data and produces intensity-based features, wherein the second branch of the encoder module receives the raw 3D range data and produces range-based features, and further wherein the output features from both branches are integrated through an addition operation.