US 12,106,779 B1
System and method for semantically grounded video generation
Sasha Strelnikoff, Seattle, WA (US); Nicholas A. Ketz, Topanga, CA (US); and Praveen K Pilly, West Hills, CA (US)
Assigned to HRL LABORATORIES, LLC, Malibu, CA (US)
Filed by HRL Laboratories, LLC, Malibu, CA (US)
Filed on Feb. 15, 2023, as Appl. No. 18/110,270.
Int. Cl. G11B 27/034 (2006.01); G06N 3/08 (2023.01); H04N 19/172 (2014.01); H04N 19/17 (2014.01)
CPC G11B 27/034 (2013.01) [G06N 3/08 (2013.01); H04N 19/172 (2014.11)] 21 Claims
OG exemplary drawing
 
1. A system for semantically grounded video generation, the system comprising:
one or more processors and associated memory, the memory being a non-transitory computer-readable medium having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of:
receiving a raw video frame of a scene from one or more sensors on an autonomous platform;
encoding the raw video frame into a low-dimensional representation of the scene;
decoding the low-dimensional representation into a raw observation space;
decoding the low-dimensional representation into a corresponding semantic segmentation map for the scene;
feeding the low-dimensional representation into a controller model for the autonomous platform;
extracting semantic concepts in the low-dimensional representation that are related to an action selection by the controller model;
feeding the extracted semantic concepts into a world model to predict state and action dynamics of the autonomous platform;
feeding the raw observation space into discriminator networks that operate on frames and videos to determine between real and synthetically generated content;
modifying a generative capability of one or more encoders and decoders such that the discriminator networks are unable to distinguish between real and synthetically generated content; and
recursively generating semantically grounded videos using a conjunction of the world model and controller model.