| CPC B60W 60/0011 (2020.02) [B60W 40/04 (2013.01); G01S 17/89 (2013.01); G01S 17/931 (2020.01); G06V 20/58 (2022.01); G06V 20/70 (2022.01); B60W 2300/145 (2013.01); B60W 2420/403 (2013.01); B60W 2420/408 (2024.01); B60W 2556/20 (2020.02)] | 18 Claims |

|
1. A method of autonomous vehicle operation, comprising:
obtaining, by a computer located in an autonomous vehicle, a combined point cloud data that describes a plurality of areas of an environment in which the autonomous vehicle is operating,
wherein the combined point cloud data is obtained by performing a signal processing technique on multiple sets of point cloud data obtained from a plurality of light detection and ranging sensors located on the autonomous vehicle;
wherein the combined point cloud data comprises a first set of combined point cloud data combined with a second set of combined point cloud data, wherein the second set of combined point cloud data is obtained or scanned later than the first set of combined point cloud data, wherein the first set of combined point cloud data is obtained by combining a first set of point cloud data of at least two light detection and ranging sensors of the plurality of light detection and ranging sensors;
determining, from the combined point cloud data, a first set of points located within a plurality of fields of view of a plurality of cameras located on the autonomous vehicle;
determining, from the first set of points, a second set of points located within one or more bounding boxes around one or more objects in images obtained from the plurality of cameras,
assigning one or more labels to the second set of points, wherein the one or more labels include information that identifies the one or more objects;
causing the autonomous vehicle to operate based on one or more characteristics of the one or more objects determined from the second set of points;
enlarging the bounding boxes by a deep fuse encoder to include more contextual points;
adding, by the deep fuse encoder, virtual points to each bounding box as size-aware point features; and
extracting and normalizing the points within the enlarged boxes in a canonical coordinate system to form input data of a deep fusion encoder;
wherein the deep fusion encoder comprises a Multi-Layer Perceptron (MLP) module with a plurality of fully connected layers and a max-pooling operator for feature aggregation.
|