US 12,434,725 B1
Generating scene descriptions by a machine learned model
David Casterton, San Carlos, CA (US); and Sean Konz, San Francisco, CA (US)
Assigned to Zoox, Inc., Foster City, CA (US)
Filed by Zoox, Inc., Foster City, CA (US)
Filed on Mar. 10, 2023, as Appl. No. 18/120,350.
Int. Cl. B60W 50/06 (2006.01); G06N 20/00 (2019.01)
CPC B60W 50/06 (2013.01) [G06N 20/00 (2019.01)] 22 Claims
OG exemplary drawing
 
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising:
receiving text data comprising an initial description of a scene;
receiving image data of the scene, the image data including a vehicle and an object in the scene;
inputting the text data and the image data into a machine learned model;
determining, by the machine learned model, a text description for the scene that is more descriptive than the initial description;
outputting, based at least in part on the text description, the scene to a computing device associated with the vehicle to perform a simulation using the scene; and
controlling the vehicle in an environment based at least in part on an outcome of the simulation.