US 11,928,893 B2
	Action recognition method and apparatus, computer storage medium, and computer device
Donghao Luo, Shenzhen (CN); Yabiao Wang, Shenzhen (CN); Chenyang Guo, Shenzhen (CN); Boyuan Deng, Shenzhen (CN); Chengjie Wang, Shenzhen (CN); Jilin Li, Shenzhen (CN); Feiyue Huang, Shenzhen (CN); and Yongjian Wu, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed on Nov. 18, 2021, as Appl. No. 17/530,428.
Application 17/530,428 is a continuation of application No. PCT/CN2020/120076, filed on Oct. 10, 2020.
Claims priority of application No. 201911143008.2 (CN), filed on Nov. 20, 2019.
Prior Publication US 2022/0076002 A1, Mar. 10, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 18/213 (2023.01); G06T 7/246 (2017.01); G06V 40/20 (2022.01); G06N 3/02 (2006.01)

CPC G06V 40/20 (2022.01) [G06F 18/213 (2023.01); G06T 7/246 (2017.01); G06N 3/02 (2013.01); G06T 2207/20081 (2013.01)]

19 Claims

1. An action recognition method, performed by a computer device, the method comprising:

obtaining image data of video data in a plurality of different temporal frames;

obtaining original feature submaps of each of the temporal frames on a plurality of different convolutional channels by using a multi-channel convolutional layer;

calculating, by using each of the temporal frames as a target temporal frame, motion information weights of the target temporal frame on the convolutional channels according to the original feature submaps of the target temporal frame on the convolutional channels and the original feature submaps of a next temporal frame adjacent to the target temporal frame on each of the convolutional channels;

obtaining motion information feature maps of the target temporal frame on the convolutional channels according to the motion information weights and the original feature submaps of the target temporal frame on the convolutional channels;

performing temporal convolution on the motion information feature maps to obtain temporal motion feature maps of the target temporal frame on the convolutional channels; and

recognizing an action type of a moving object in image data of the target temporal frame according to the temporal motion feature maps.