CPC G06V 20/56 (2022.01) [B60W 60/0027 (2020.02); G06N 3/08 (2013.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); B60W 2420/403 (2013.01)] | 20 Claims |
1. A computer-implemented method for providing an agent action anticipative transformer comprising:
receiving image data associated with a video of a surrounding environment of an ego agent;
analyzing the image data and extracting short range clips from the image data;
analyzing the short range clips and extracting clip-level features associated with each of the short range clips; and
executing self-supervision using causal masking with respect to the extracted clip-level features to output action predictions and feature predictions to enable ego-centric action anticipation with respect to at least one target agent to autonomously control the ego agent.
|