US 12,243,314 B2
Method and system for recognizing human action in apron based on thermal infrared vision
Meng Ding, Nanjing (CN); Yuanyuan Ding, Nanjing (CN); Xianghao Kong, Nanjing (CN); Yiming Xu, Nanjing (CN); Yi Wu, Nanjing (CN); and Wei Lu, Nanjing (CN)
Assigned to Nanjing University of Aeronautics and Astronautics, Jiangsu (CN)
Appl. No. 18/007,599
Filed by Nanjing University of Aeronautics and Astronautics, Jiangsu (CN)
PCT Filed Dec. 6, 2021, PCT No. PCT/CN2021/135634
§ 371(c)(1), (2) Date Dec. 1, 2022,
PCT Pub. No. WO2023/087420, PCT Pub. Date May 25, 2023.
Claims priority of application No. 202111362718.1 (CN), filed on Nov. 17, 2021.
Prior Publication US 2024/0265703 A1, Aug. 8, 2024
Int. Cl. G06V 40/20 (2022.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 20/40 (2022.01); G06V 20/52 (2022.01); G06V 20/70 (2022.01)
CPC G06V 20/52 (2022.01) [G06V 10/7715 (2022.01); G06V 10/774 (2022.01); G06V 20/41 (2022.01); G06V 20/70 (2022.01); G06V 40/20 (2022.01)] 10 Claims
OG exemplary drawing
 
1. A method for recognizing a human action in an apron based on thermal infrared vision, comprising:
acquiring a plurality of video sequences from an infrared monitoring video, wherein the plurality of video sequences comprise a plurality of types of preset target actions;
labeling a set target in each image frame in each video sequence with a target box to obtain a target tracking result, wherein the target tracking result comprises position information of an image labeled with the target box, in each frame;
intercepting, for each image frame in each video sequence, a target-box enlarged area according to the labeled target box, wherein a side length of the target-box enlarged area is greater than a maximum side length of the corresponding labeled target box;
adding, for each image frame in each video sequence, the position information of the image labeled with target box to the target-box enlarged area to obtain a three-channel sub-image, wherein the three-channel sub-image includes an abscissa channel image, an ordinate channel image and an image corresponding to the target-box enlarged area, and various three-channel sub-images are arranged in chronological order to form a three-channel sub-image sequence;
training an action recognition model by using a plurality of three-channel sub-image sequences corresponding to a plurality of video sequences as a training set, to obtain a trained action recognition model;
obtaining a to-be-recognized video sequence from another infrared monitoring video, and obtaining a three-channel sub-image sequence corresponding to the to-be-recognized video sequence;
inputting the three-channel sub-image sequence corresponding to the to-be-recognized video sequence into the trained action recognition model to output a target action type.