US 12,205,292 B2
	Methods and systems for semantic segmentation of a point cloud
Ran Cheng, Markham (CA); Ryan Razani, North York (CA); and Bingbing Liu, Markham (CA)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed by Huawei Technologies Co., Ltd., Shenzhen (CN)
Filed on Jul. 16, 2021, as Appl. No. 17/378,155.
Prior Publication US 2023/0035475 A1, Feb. 2, 2023
Int. Cl. G06T 7/11 (2017.01); G01S 17/89 (2020.01); G01S 17/931 (2020.01); G06F 18/24 (2023.01); G06F 18/25 (2023.01)

CPC G06T 7/11 (2017.01) [G01S 17/89 (2013.01); G01S 17/931 (2020.01); G06F 18/24 (2023.01); G06F 18/253 (2023.01); G06T 2207/10028 (2013.01); G06T 2207/20016 (2013.01); G06T 2207/20084 (2013.01)]

20 Claims

1. A method for semantic segmentation of a 3D point cloud, the method comprising:

processing a 3D point cloud to produce a sparse tensor;

feeding the sparse tensor as an input to each of a plurality of branches of an encoder of a neural network to produce a plurality of branch feature maps, N being a number of the plurality of branches, N being equal to or greater than 3, each ith branch respectively comprising i sequentially chained different encoder blocks to produce an ith branch feature map, i being an integer between 1 and N;

feeding the plurality of branch feature maps to a plurality of hierarchical attention blocks to generate a plurality of emphasized feature maps, wherein, for each pth branch of a 3rd to Nth branches, a pth branch feature map and a (p−2) th emphasized feature map are fed to a corresponding (p−1) th hierarchical attention block, the (p−2) th emphasized feature map is output by a (p−2) th hierarchical attention block, and wherein a first branch feature map and a second branch feature map are fed to a first hierarchical attention block;

feeding each emphasized feature map output by the plurality of hierarchical attention blocks to a spatial feature transformer to fuse each emphasized feature map of the plurality of hierarchical attention blocks and generate a fused feature map; and

processing the fused feature map and a final decoder block of a decoder to predict a class label for a plurality of points in the 3D point cloud.