| CPC G06V 20/582 (2022.01) [G06V 10/764 (2022.01); G06V 20/584 (2022.01)] | 20 Claims |

|
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising:
receiving vision data associated with a vehicle traversing an environment;
determining, based at least in part on the vision data, a two-dimensional sensor perspective image representing a portion of the environment;
determining, by a first machine-learned (ML) model, based at least in part on the two-dimensional sensor perspective image, a first image comprising a traffic light and traffic lane association label for a first pixel and a second pixel of the two-dimensional sensor perspective image, wherein:
the first pixel represents at least a portion of a traffic light in the environment;
the second pixel represents at least a portion of a traffic lane in the environment; and
the traffic light and traffic lane association label indicates that the first pixel is associated with the second pixel;
determining, by a second ML model, based at least in part on mapping data associated with the environment, traffic lane features in the environment;
determining, based at least in part on determining that a first confidence score associated with the first ML model is greater than a second confidence score associated with the second ML model, that the first ML model is to be used for the environment; and
controlling the vehicle based at least in part on the traffic light and traffic lane association label.
|