US 12,423,826 B2
	3D semantic segmentation method and computer program recorded on recording medium to execute the same
Jae Geun Yoon, Seoul (KR); Seung Jin Oh, Seongnam-si (KR); Kwang Ho Song, Seoul (KR); and Jiyeon Jeon, Incheon (KR)
Assigned to INFINIQ CO., LTD., Seoul (KR)
Filed by INFINIQ CO., LTD., Seoul (KR)
Filed on Jun. 4, 2024, as Appl. No. 18/732,924.
Claims priority of application No. 10-2023-0086542 (KR), filed on Jul. 4, 2023.
Prior Publication US 2025/0014187 A1, Jan. 9, 2025
Int. Cl. G01S 17/89 (2020.01); G06T 5/30 (2006.01); G06T 7/11 (2017.01); G06T 7/12 (2017.01); G06T 7/174 (2017.01); G06V 10/771 (2022.01); G06V 10/82 (2022.01)

CPC G06T 7/12 (2017.01) [G01S 17/89 (2013.01); G06T 5/30 (2013.01); G06T 7/11 (2017.01); G06V 10/771 (2022.01); G06V 10/82 (2022.01); G06T 7/174 (2017.01); G06T 2207/10028 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30261 (2013.01)]

5 Claims

1. A 3D semantic segmentation method comprising steps of:

receiving an image photographed by a camera and point cloud data acquired from a LiDAR sensor;

generating, by a learning data generation device, a projection image expressing the point cloud data in polar coordinates of a size equal to those of the image, wherein the generating step includes generating the projection image through a multiplication operation of calibration matrix information between the LiDAR sensor and the camera and coordinates of the point cloud data, and generating an image and the projection image having a same height and width by truncating a preset area from the projection image and by equally truncating the image;

inputting the image and the projection image into an artificial intelligence (AI) machine, learned in advance to estimate a 2D segment map and a 3D segment map having dimensions as high as a number of types of classes to be predicted; and

learning artificial intelligence before the 2D segment map and the 3D segment map are estimated based on a synthesis loss function that simultaneously calculates and sums loss values for estimating the 2D segment map and the 3D segment map,

wherein the synthesis loss function is expressed as shown in a following equation:

here, L_2Ddenotes a first loss value for estimating the 2D segment map, L_3Ddenotes a second loss value for estimating the 3D segment map, pred_2Ddenotes a predicted answer for estimating the 2D segment map, pred_3Ddenotes a predicted answer for estimating the 3D segment map, label_2Ddenotes a first correct answer value for estimating the 2D segment map, and label_3Ddenotes a second correct answer value for estimating the 3D segment map,

wherein the learning step includes a step of setting pixels neighboring as much as a preset distance from each point included in the first correct answer value with a same label,

wherein the first correct answer value and the second correct answer value used to calculate each of the loss values are sparse data as the first correct answer value and the second correct answer value are generated in a method of assigning a 3D correct value of the point cloud data to 2D projective points projected through the multiplication operation of calibration matrix information,

wherein the artificial intelligence (AI) machine includes a neural network for images and a neural network for projection images symmetrical to each other, and wherein each of the neural network for images and the neural network for projection images includes:

an encoder including three contextual blocks and four residual blocks (res blocks) for learning a structure and context information of the image and the projection image;

a decoder including four dilation blocks (up blocks) for dilating data output from the encoder, and an output layer for outputting the 2D segment map and the 3D segment map; and

an attention fusion module including eight attention fusion blocks for fusing feature maps output from the three contextual blocks, the four residual blocks, and the four dilation blocks, and

wherein the AI machine generates two feature maps with emphasized features of important locations by finding first locations and reflection rates of important features from a feature map of an image through a spatial attention module, multiplying the first locations and the reflection rates with a feature map of the projection image and the feature map of the image and connecting the feature map of the projection image and the feature map of the image to original feature maps through a residual path.