US 12,087,069 B2
Artificial intelligence-based action recognition method and related apparatus
Wanchao Chi, Shenzhen (CN); Chong Zhang, Shenzhen (CN); Yonggen Ling, Shenzhen (CN); Wei Liu, Shenzhen (CN); Zhengyou Zhang, Shenzhen (CN); Zejian Yuan, Shenzhen (CN); Ziyang Song, Shenzhen (CN); and Ziyi Yin, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed on Nov. 1, 2021, as Appl. No. 17/516,585.
Application 17/516,585 is a continuation of application No. PCT/CN2020/114130, filed on Sep. 9, 2020.
Claims priority of application No. 201911046992.0 (CN), filed on Oct. 30, 2019.
Prior Publication US 2022/0051061 A1, Feb. 17, 2022
Int. Cl. G06K 9/00 (2022.01); G06N 3/045 (2023.01); G06V 30/242 (2022.01); G06V 40/20 (2022.01)
CPC G06V 30/242 (2022.01) [G06N 3/045 (2023.01); G06V 40/20 (2022.01)] 19 Claims
OG exemplary drawing
 
1. An action recognition method, performed by a data processing device, the method comprising:
determining, according to video data comprising an interactive object, node sequence information corresponding to video frames in the video data, the node sequence information of each video frame including position information of nodes in a node sequence, the nodes in the node sequence being nodes of the interactive object that are moved to implement a corresponding interactive action;
determining action categories corresponding to the video frames in the video data, comprising: determining, according to the node sequence information corresponding to N consecutive video frames in the video data, action categories respectively corresponding to the N consecutive video frames; and
determining, according to the action categories corresponding to the video frames in the video data, a target interactive action made by the interactive object in the video data,
wherein determining the node sequence information corresponding to the video frames in the video data comprises:
extracting an image feature of the video frames in the video data; and
determining, according to the image feature, the node sequence information corresponding to the video frames in the video data by using a node recognition model,
wherein the node recognition model is a neural network model that comprises a plurality of layers of prediction submodels, each layer of prediction submodel being configured to determine position information of nodes in the video frames and determine link information between nodes.