US 11,967,175 B2
Facial expression recognition method and system combined with attention mechanism
Sannyuya Liu, Hubei (CN); Zongkai Yang, Hubei (CN); Xiaoliang Zhu, Hubei (CN); Zhicheng Dai, Hubei (CN); and Liang Zhao, Hubei (CN)
Assigned to CENTRAL CHINA NORMAL UNIVERSITY, Hubei (CN)
Filed by CENTRAL CHINA NORMAL UNIVERSITY, Hubei (CN)
Filed on May 23, 2023, as Appl. No. 18/322,517.
Application 18/322,517 is a continuation in part of application No. PCT/CN2021/128102, filed on Nov. 2, 2021.
Claims priority of application No. 202011325980.4 (CN), filed on Nov. 24, 2020.
Prior Publication US 2023/0298382 A1, Sep. 21, 2023
Int. Cl. G06K 9/00 (2022.01); G06V 10/24 (2022.01); G06V 10/62 (2022.01); G06V 10/77 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 40/16 (2022.01); G06V 10/774 (2022.01)
CPC G06V 40/165 (2022.01) [G06V 10/247 (2022.01); G06V 10/62 (2022.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 40/171 (2022.01); G06V 40/174 (2022.01); G06V 10/774 (2022.01)] 6 Claims
OG exemplary drawing
 
1. A facial expression recognition method combined with an attention mechanism, comprising following steps:
detecting a face comprised in each of video frames in a video sequence, and extracting a corresponding facial region of interest (ROI), so as to obtain a facial picture in each of the video frames;
correcting the facial picture in each of the video frames on the basis of location information of a facial feature point of the facial picture in each of the video frames, so that the facial picture in each of the video frames is aligned relative to a plane rectangular coordinate system;
inputting the aligned facial picture in each of the video frames of the video sequence into a residual neural network, and extracting a spatial feature of a facial expression corresponding to the facial picture;
calculating a feature weight of the facial expression through an attention mechanism using the spatial feature of the facial expression extracted from the video sequence, a weight higher than a threshold is assigned to an ROI of a facial expression change and a weight lower than the threshold is assigned to a region irrelevant to the facial expression change to correlate feature information of the facial expression between the video frames, a dependency relationship of the facial expression between the adjacent video frames is extracted, and irrelevant interference features are eliminated to acquire a fused feature of the facial expression;
inputting the fused feature of the facial expression acquired from the video sequence into a recurrent neural network, and extracting a temporal feature of the facial expression;
inputting the temporal feature of the facial expression extracted from the video sequence into a fully connected layer, and classifying and recognizing the facial expression in a video based on a facial expression template pre-stored in the fully connected layer.