US 11,657,291 B2
	Spatio-temporal embeddings
Jiyang Gao, San Jose, CA (US); Zijian Guo, Sunnyvale, CA (US); and Congcong Li, Cupertino, CA (US)
Assigned to Waymo LLC, Mountain View, CA (US)
Filed by Waymo LLC, Mountain View, CA (US)
Filed on Oct. 5, 2020, as Appl. No. 17/63,553.
Claims priority of provisional application 62/911,181, filed on Oct. 4, 2019.
Prior Publication US 2021/0103744 A1, Apr. 8, 2021
Int. Cl. G06V 20/58 (2022.01); G06N 3/08 (2006.01); G06N 3/04 (2006.01); G06V 10/75 (2022.01)

CPC G06V 20/58 (2022.01) [G06N 3/0454 (2013.01); G06N 3/08 (2013.01); G06V 10/757 (2022.01)]

20 Claims

1. A method comprising:

obtaining a temporal sequence comprising a respective point cloud input corresponding to each of a plurality of time points, each point cloud input comprising point cloud data generated from sensor data captured by one or more sensors of a vehicle at the respective time point;

processing each point cloud input using a first neural network to generate a respective spatial embedding that characterizes the point cloud input, comprising, for each point cloud input:

dividing the point cloud data into a plurality of voxels,

generating a feature representation that includes features for each voxel, and

processing the feature representation using the first neural network to generate the spatial embedding;

processing the spatial embeddings of the point cloud inputs using a second neural network to generate a spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence; and

processing the spatio-temporal embedding that characterizes the point cloud inputs in the temporal sequence using a task-specific neural network, wherein the task-specific neural network is configured to process the spatio-temporal embedding to generate a predicted output for a prediction task.

10. The system of claim 9, wherein the operations further comprise:

processing the spatio-temporal embedding using one or more additional task-specific neural networks, wherein each additional task-specific neural network is configured to generate a respective predicted output for a corresponding prediction task that is different from the prediction task.

16. The non-transitory computer storage media of claim 14, wherein the first neural network and the second neural network have been trained jointly on a first prediction task, and wherein the first prediction task is not the same as the prediction task.