CPC G06V 40/107 (2022.01) [G01S 13/89 (2013.01); G01S 17/89 (2013.01); G05D 1/0276 (2013.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 10/22 (2022.01); G06V 20/56 (2022.01)] | 20 Claims |
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:
receiving first image data from an image sensor associated with a vehicle, the first image data representing a pedestrian that is proximate the vehicle at a first time;
receiving second image data from the image sensor, the second image data representing the pedestrian at a second time after the first time;
inputting the first image data and the second image data into a machine-learned model;
receiving a first output, from the machine-learned model, the first output comprising an indication of a gesture of the pedestrian and a status of the pedestrian, wherein the status comprises an indication of an authorized agent status of the pedestrian;
based at least in part on the status of the pedestrian, at least one of:
inputting the first output of the gesture into a prediction component of the vehicle that is configured to generate a prediction based at least in part on the first output; or
inputting the first output into a planning component of the vehicle based at least in part on the first output further indicating that the pedestrian is an authorized agent, the planning component being configured to determine vehicle trajectories based at least in part on the first output; and
controlling the vehicle to follow a trajectory to traverse an environment based at least in part on a second output from at least one of the prediction component or the planning component.
|