CPC H04N 19/20 (2014.11) [G06N 3/045 (2023.01); G06N 3/082 (2013.01)] | 20 Claims |
1. A computer implemented method for processing a video, the method comprising:
receiving a plurality of video frames of the video;
generating a plurality of first input features based on the plurality of video frames;
generating a plurality of second input features based on reversing a temporal order of the plurality of first input features;
generating a first set of joint attention features based on the plurality of first input features;
generating a second set of joint attention features based on the plurality of second input features; and
concatenating the first set of joint attention features and the second set of joint attention features to generate a final set of joint attention features.
|