US 12,266,157 B2
	Temporal augmentation for training video reasoning system
Farley Lai, Santa Clara, CA (US); and Asim Kadav, Mountain View, CA (US)
Assigned to NEC Corporation, Tokyo (JP)
Filed by NEC Laboratories America, Inc., Princeton, NJ (US)
Filed on Apr. 4, 2022, as Appl. No. 17/712,617.
Claims priority of provisional application 63/171,215, filed on Apr. 6, 2021.
Prior Publication US 2022/0319157 A1, Oct. 6, 2022
Int. Cl. G06V 10/774 (2022.01); G06V 10/62 (2022.01); G06V 10/764 (2022.01)

CPC G06V 10/7747 (2022.01) [G06V 10/62 (2022.01); G06V 10/764 (2022.01)]

20 Claims

1. A method for augmenting video sequences in a video reasoning system, the method comprising:

randomly subsampling a sequence of video frames captured from one or more video cameras;

randomly reversing the subsampled sequence of video frames to define a plurality of sub-sequences of randomly reversed video frames;

training, in a training mode, a video reasoning model with temporally augmented input, including the plurality of sub-sequences of randomly reversed video frames, to make predictions over temporally augmented target classes;

updating parameters of the video reasoning model by a machine learning algorithm by discarding a final prediction to retain the temporally augmented target classes with corresponding original classes; and

deploying, in an inference mode, the video reasoning model, with overfitting suppression through a learned temporal order of video frames to determine the original classes, in the video reasoning system to make the final prediction related to classify a human action in the sequence of video frames.