CPC G11B 27/034 (2013.01) [G06N 3/08 (2013.01); H04N 19/172 (2014.11)] | 21 Claims |
1. A system for semantically grounded video generation, the system comprising:
one or more processors and associated memory, the memory being a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of:
receiving a raw video frame of a scene from one or more sensors on an autonomous platform;
encoding the raw video frame into a low-dimensional representation of the scene;
decoding the low-dimensional representation into a raw observation space;
decoding the low-dimensional representation into a corresponding semantic segmentation map for the scene;
feeding the low-dimensional representation into a controller model for the autonomous platform;
extracting semantic concepts in the low-dimensional representation that are related to an action selection by the controller model;
feeding the extracted semantic concepts into a world model to predict state and action dynamics of the autonomous platform;
feeding the raw observation space into discriminator networks that operate on frames and videos to determine between real and synthetically generated content;
modifying a generative capability of one or more encoders and decoders such that the discriminator networks are unable to distinguish between real and synthetically generated content; and
recursively generating semantically grounded videos using a conjunction of the world model and controller model.
|