CPC G06T 17/20 (2013.01) [G06F 30/27 (2020.01); G06T 5/60 (2024.01); G06T 15/20 (2013.01); G06T 17/05 (2013.01); G06T 2210/56 (2013.01)] | 20 Claims |
1. A method performed by one or more computers, the method comprising:
obtaining a set of point clouds captured by one or more sensors, wherein each point cloud comprises a respective plurality of three-dimensional points;
assigning the three-dimensional points to respective voxels in a voxel grid of voxels;
generating multi-scale features of the voxel grid, the multi-scale features comprising, for each of a plurality of scales, respective features for each non-empty voxel in a scaled voxel grid corresponding to the scale, the generating comprising:
processing respective features for each non-empty voxel in the voxel grid through a hierarchical sequence of self-attention neural network blocks, the processing comprising, for each scale:
obtaining initial features for each non-empty voxel in the scaled voxel grid corresponding to the scale; and
processing the initial features for the non-empty voxels in the scaled voxel grid corresponding to the scale using a self-attention neural network to generate the respective features for the non-empty voxels in the scaled voxel grid corresponding to the scale; and
generating an output for a point cloud processing task using the multi-scale features of the voxel grid.
|