US 12,330,691 B2
	Motion prediction in an autonomous vehicle using fused synthetic and camera images
Eric McKenzie Wolff, Zephyr Cove, NV (US); Oscar Beijbom, Santa Cruz, CA (US); Alex Lang, Culver City, CA (US); Sourabh Vora, Marina del Rey, CA (US); Bassam Helou, Santa Monica, CA (US); Elena Corina Grigore, Redwood City, CA (US); and Cheng Jiang, West Bloomfield, MI (US)
Assigned to Motional AD LLC
Filed by Motional AD LLC, Boston, MA (US)
Filed on Jun. 13, 2022, as Appl. No. 17/806,707.
Claims priority of provisional application 63/365,590, filed on May 31, 2022.
Prior Publication US 2023/0382427 A1, Nov. 30, 2023
Int. Cl. B60W 60/00 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)

CPC B60W 60/0027 (2020.02) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); B60W 2420/403 (2013.01); B60W 2420/408 (2024.01); B60W 2554/404 (2020.02); B60W 2554/80 (2020.02); B60W 2556/45 (2020.02)]

20 Claims

1. A computer system comprising:

one or more computer readable storage devices configured to store computer executable instructions; and

one or more computer processors configured to execute the computer executable instructions, wherein execution of the computer executable instructions causes the computer system to:

obtain a set of data pairs, each data pair comprising:

first data corresponding to a synthetic image representing a birds-eye-view of an area generated based on sensor data of a vehicle in the area, wherein the synthetic image identifies an object in the area; and

second data corresponding a camera image representing a viewpoint of the vehicle in the area, wherein the camera image depicts the object;

train a machine learning model based on the set of data pairs to result in a trained model, wherein the machine learning model includes:

a first portion trained to:

generate a set of annotations based on a set of synthetic image features extracted from first data of a data pair, and a set of raw image features extracted from second data of a data pair, and

fuse the first data and the set of annotations to generate a combined image, and

a second portion trained to:

generate a predicted motion of the object, based on the combined image and a set of input state information; and

transmit the trained model to a destination vehicle, wherein the destination vehicle is configured to apply the trained model to sensor data of the destination vehicle to predict motion of a target object identified within the sensor data.