US 12,315,083 B2
Performing point cloud tasks using multi-scale features generated through self-attention
Pei Sun, Palo Alto, CA (US); Mingxing Tan, Newark, CA (US); Weiyue Wang, Sunnyvale, CA (US); Fei Xia, Sunnyvale, CA (US); Zhaoqi Leng, Milpitas, CA (US); Dragomir Anguelov, San Francisco, CA (US); and Chenxi Liu, Santa Clara, CA (US)
Assigned to Waymo LLC, Mountain View, CA (US)
Filed by Waymo LLC, Mountain View, CA (US)
Filed on Mar. 13, 2023, as Appl. No. 18/120,989.
Claims priority of provisional application 63/323,914, filed on Mar. 25, 2022.
Claims priority of provisional application 63/319,228, filed on Mar. 11, 2022.
Prior Publication US 2023/0351691 A1, Nov. 2, 2023
Int. Cl. G06T 17/20 (2006.01); G06F 30/27 (2020.01); G06T 5/60 (2024.01); G06T 15/20 (2011.01); G06T 17/05 (2011.01)
CPC G06T 17/20 (2013.01) [G06F 30/27 (2020.01); G06T 5/60 (2024.01); G06T 15/20 (2013.01); G06T 17/05 (2013.01); G06T 2210/56 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method performed by one or more computers, the method comprising:
obtaining a set of point clouds captured by one or more sensors, wherein each point cloud comprises a respective plurality of three-dimensional points;
assigning the three-dimensional points to respective voxels in a voxel grid of voxels;
generating multi-scale features of the voxel grid, the multi-scale features comprising, for each of a plurality of scales, respective features for each non-empty voxel in a scaled voxel grid corresponding to the scale, the generating comprising:
processing respective features for each non-empty voxel in the voxel grid through a hierarchical sequence of self-attention neural network blocks, the processing comprising, for each scale:
obtaining initial features for each non-empty voxel in the scaled voxel grid corresponding to the scale; and
processing the initial features for the non-empty voxels in the scaled voxel grid corresponding to the scale using a self-attention neural network to generate the respective features for the non-empty voxels in the scaled voxel grid corresponding to the scale; and
generating an output for a point cloud processing task using the multi-scale features of the voxel grid.