| CPC G06V 20/40 (2022.01) [G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/44 (2022.01)] | 20 Claims |

|
1. A method for video recognition, comprising:
obtaining an original set of clip descriptors by providing a plurality of clips of a video as an input of a three-dimensional (3D) convolutional neural network (CNN) of a neural network, wherein the neural network comprises the 3D CNN and at least one first fully connected layer, and each of the plurality of clips comprises at least one frame;
determining an attention vector corresponding to the original set of clip descriptors;
obtaining an enhanced set of clip descriptors based on the original set of clip descriptors and the attention vector; and
inputting the enhanced set of clip descriptors into the at least one first fully connected layer and performing video recognition based on an output of the at least one first fully connected layer.
|
|
9. A method for training a neural network, comprising,
obtaining an original set of clip descriptors by providing a plurality of clips of a video as an input of a three-dimensional (3D) convolutional neural network (CNN) of a neural network, wherein the neural network comprises the 3D CNN and at least one first fully connected layer, the 3D CNN comprises at least one convolutional layer and at least one second fully connected layer, and each of the plurality of clips comprises at least one frame;
determining an attention vector corresponding to the original set of clip descriptors;
obtaining an enhanced set of clip descriptors based on the original set of clip descriptors and the attention vector;
inputting the enhanced set of clip descriptors into the at least one first fully connected layer and obtaining an output of the neural network; and
training the neural network by updating parameters of the neural network based on a loss of the neural network, wherein the parameters of the neural network comprise a weight of the at least one first fully connected layer and a weight of the at least one second fully connected layer.
|