| CPC G06T 7/20 (2013.01) [G06V 10/40 (2022.01); G06V 20/584 (2022.01); G06V 20/588 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30256 (2013.01)] | 20 Claims |

|
1. A method performed by one or more computers, the method comprising:
obtaining, by an autonomous vehicle navigating through an environment, an input characterizing a scene in the environment at a current time point;
generating, by the autonomous vehicle and from the input, an encoded representation that includes respective features for each grid cell in a top-down representation of the environment;
processing, by the autonomous vehicle, the encoded representation using a decoder neural network that is configured to process the encoded representation to generate a decoder output that comprises, for each future time point in a sequence of future time points and for each agent type in a set of one or more agent types:
a feature map that specifies, for each grid cell, (i) an occupancy score that represents a likelihood that any agent of the agent type will occupy the grid cell at the future time point, and (ii) a motion vector that represents predicted motion of agents of the agent type within the grid cell at the future time point, wherein the agents of the agent type are different from the autonomous vehicle.
|