US 11,941,080 B2
	System and method for learning human activities from video demonstrations using video augmentation
Quoc-Huy Tran, Redmond, WA (US); Muhammad Zeeshan Zia, Sammamish, WA (US); Andrey Konin, Redmond, WA (US); Sanjay Haresh, Karachi (PK); and Sateesh Kumar, Karachi (PK)
Assigned to Retrocausal, Inc., , WA (US)
Filed by Retrocausal, Inc., Sammamish, WA (US)
Filed on May 20, 2021, as Appl. No. 17/325,759.
Prior Publication US 2022/0374653 A1, Nov. 24, 2022
Int. Cl. G06F 18/214 (2023.01); G06N 20/00 (2019.01); G06T 3/00 (2006.01); G06T 7/194 (2017.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); H04N 5/272 (2006.01); H04N 7/01 (2006.01); H04N 13/111 (2018.01)

CPC G06F 18/214 (2023.01) [G06N 20/00 (2019.01); G06T 3/00 (2013.01); G06T 7/194 (2017.01); G06V 20/41 (2022.01); G06V 40/20 (2022.01); H04N 5/272 (2013.01); H04N 7/0127 (2013.01); H04N 7/0135 (2013.01); H04N 13/111 (2018.05); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01)]

22 Claims

1. A deep learning based system for learning human activities from video demonstrations using video augmentation, the deep learning based system comprising:

one or more hardware processors; and

a memory coupled to the one or more hardware processors, wherein the memory comprises a plurality of subsystems in the form of programmable instructions executable by the one or more hardware processors, wherein the plurality of subsystem comprises:

a receiver subsystem configured for receiving one or more original videos from one or more data sources, wherein the one or more original videos comprises one or more human activities;

a video augmentation subsystem configured for processing the received one or more original videos using one or more video augmentation techniques to generate a set of augmented videos, wherein the one or more video augmentation techniques comprises: performing one or more image transformation configurations, foreground synthesis, background synthesis, speed variation, motion variation, viewpoint variation, segment editing, and obfuscation rendering;

a training video generator subsystem configured for generating a set of training videos by combining the received one or more original videos with the generated set of augmented videos;

a deep learning model generator subsystem configured for generating a deep learning model for the received one or more original videos based on the generated set of training videos;

a learning subsystem configured for learning the one or more human activities performed in the received one or more original videos by deploying the generated deep learning model; and

an output subsystem configured for outputting the learnt one or more human activities performed in the received one or more original videos on a user interface.