US 12,340,520 B2
	System and method for motion prediction in autonomous driving
Alan Sullivan, Middleton, MA (US); Siheng Chen, Cambridge, MA (US); Jun Wang, Greenbelt, MD (US); and Xiaolong Li, Blacksburg, VA (US)
Assigned to Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA (US)
Filed by Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA (US)
Filed on Apr. 20, 2022, as Appl. No. 17/659,889.
Prior Publication US 2023/0342944 A1, Oct. 26, 2023
Int. Cl. G06T 7/20 (2017.01); B60W 60/00 (2020.01); G06V 10/80 (2022.01); H04N 19/43 (2014.01)

CPC G06T 7/20 (2013.01) [B60W 60/0027 (2020.02); G06V 10/806 (2022.01); H04N 19/43 (2014.11); G06T 2207/10028 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30252 (2013.01)]

20 Claims

1. A point-cloud motion sensor for estimating motion information of at least some points of an environment, the point-cloud motion sensor comprising:

a depth sensor configured to sense a dynamic environment to collect a temporal sequence of three-dimensional (3D) point clouds of the environment including a current 3D point cloud and a previous 3D point cloud;

a memory configured to store computer executable instruction; and

a processor configured to iteratively process the sequence of 3D point clouds with the neural network, the neural network including:

an encoder providing a spatiotemporal encoding of each point in each of the 3D clouds; and

a decoder decoding the spatiotemporal encodings to generate motion information for each point of each of the 3D clouds,

wherein, to encode a current point of the current 3D point cloud, the encoder is configured to:

extract features of neighboring points in the current 3D point cloud located in proximity to a location of the current point to produce a current spatial encoding of the current point in a current frame;

extract features of neighboring points in the previous 3D point cloud located in proximity to a location in the previous 3D point cloud corresponding to the location of the current point to produce a previous spatial encoding of the current point in the previous frame; and

combine the current spatial encoding and the previous spatial encoding to produce a spatiotemporal encoding of the current point; and

wherein the neural network includes a contractive branch that sequentially downsamples its input and an expansive branch that sequentially upsamples its input, wherein the contractive branch includes one or multiple pairs of the encoder and a downsampling layer, and wherein the expansive branch includes one or multiple pairs of the decoder and an upsampling layer.