US 12,236,705 B1
	Pedestrian attribute and gesture detection
Oytun Ulutan, Buena Park, CA (US); Xin Wang, Sunnyvale, CA (US); Kratarth Goel, Albany, CA (US); Vasiliy Karasev, San Francisco, CA (US); Sarah Tariq, Palo Alto, CA (US); and Yi Xu, Pasadena, CA (US)
Assigned to Zoox, Inc., Foster City, CA (US)
Filed by Zoox, Inc., Foster City, CA (US)
Filed on May 14, 2021, as Appl. No. 17/320,678.
Claims priority of provisional application 63/117,263, filed on Nov. 23, 2020.
Claims priority of provisional application 62/028,377, filed on May 21, 2020.
Int. Cl. G06K 9/00 (2022.01); G01S 13/89 (2006.01); G01S 17/89 (2020.01); G05D 1/00 (2006.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06V 10/22 (2022.01); G06V 20/56 (2022.01); G06V 40/10 (2022.01)

CPC G06V 40/107 (2022.01) [G01S 13/89 (2013.01); G01S 17/89 (2013.01); G05D 1/0276 (2013.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 10/22 (2022.01); G06V 20/56 (2022.01)]

20 Claims

1. A system comprising:

one or more processors; and

one or more non-transitory computer-readable media storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:

receiving first image data from an image sensor associated with a vehicle, the first image data representing a pedestrian that is proximate the vehicle at a first time;

receiving second image data from the image sensor, the second image data representing the pedestrian at a second time after the first time;

inputting the first image data and the second image data into a machine-learned model;

receiving a first output, from the machine-learned model, the first output comprising an indication of a gesture of the pedestrian and a status of the pedestrian, wherein the status comprises an indication of an authorized agent status of the pedestrian;

based at least in part on the status of the pedestrian, at least one of:

inputting the first output of the gesture into a prediction component of the vehicle that is configured to generate a prediction based at least in part on the first output; or

inputting the first output into a planning component of the vehicle based at least in part on the first output further indicating that the pedestrian is an authorized agent, the planning component being configured to determine vehicle trajectories based at least in part on the first output; and

controlling the vehicle to follow a trajectory to traverse an environment based at least in part on a second output from at least one of the prediction component or the planning component.