US 12,412,426 B2
Action recognition method and apparatus, and device and storage medium
Binquan Li, Guangzhou (CN)
Assigned to BIGO TECHNOLOGY PTE. LTD., Singapore (SG)
Appl. No. 17/999,284
Filed by BIGO TECHNOLOGY PTE. LTD., Singapore (SG)
PCT Filed Apr. 2, 2021, PCT No. PCT/CN2021/085386
§ 371(c)(1), (2) Date Nov. 18, 2022,
PCT Pub. No. WO2021/232969, PCT Pub. Date Nov. 25, 2021.
Claims priority of application No. 202010431706.9 (CN), filed on May 20, 2020.
Prior Publication US 2023/0196837 A1, Jun. 22, 2023
Int. Cl. G06K 9/62 (2022.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06V 10/764 (2022.01); G06V 10/77 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); H04N 19/172 (2014.01); H04N 19/513 (2014.01); H04N 19/52 (2014.01)
CPC G06V 40/20 (2022.01) [G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 20/41 (2022.01); G06V 20/44 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); H04N 19/172 (2014.11); H04N 19/513 (2014.11); H04N 19/52 (2014.11)] 18 Claims
OG exemplary drawing
 
1. An action recognition method, comprising:
acquiring grouped video data by grouping original compressed video data;
inputting the grouped video data into a first predetermined model and determining, based on an output result of the first predetermined model, target grouped video data containing an action;
acquiring to-be-recognized grouped video data by decoding the target grouped video data; and
inputting the to-be-recognized grouped video data into a second predetermined model and determining, based on an output result of the second predetermined model, a type of an action contained in the to-be-recognized grouped video data;
wherein acquiring the to-be-recognized grouped video data by decoding the target grouped video data comprises:
acquiring a to-be-recognized segmented video image by decoding the target grouped video data;
acquiring frequency domain information in the to-be-recognized segmented video image and generating a corresponding frequency domain graph based on the frequency domain information; and
determining the to-be-recognized segmented video image and the corresponding frequency domain graph as the to-be-recognized grouped video data.