CPC G06V 20/58 (2022.01) [G06N 3/0454 (2013.01); G06N 3/08 (2013.01); G06V 10/757 (2022.01)] | 20 Claims |
1. A method comprising:
obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point;
processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input:
dividing the point cloud data into a plurality of voxels,
generating a feature representation that includes features for each voxel, and
processing the feature representation using the first neural network to generate the spatial embedding;
processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and
processing the spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task.
|
10. The system of claim 9, wherein the operations further comprise:
processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding prediction task that is different from the prediction task.
|
16. The non-transitory computer storage media of claim 14, wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not the same as the prediction task.
|