US 11,967,180 B1
	Dynamic facial expression recognition (FER) method based on Dempster-Shafer (DS) theory
Minglei Shu, Jinan (CN); Zhenyu Liu, Jinan (CN); Zhaoyang Liu, Jinan (CN); Shuwang Zhou, Jinan (CN); and Pengyao Xu, Jinan (CN)
Assigned to QILU UNIVERSITY OF TECHNOLOGY (SHANDONG ACADEMY OF SCIENCES), Jinan (CN); and SHANDONG COMPUTER SCIENCE CENTER (NATIONAL SUPERCOMPUTING CENTER IN JINAN), Jinan (CN)
Filed by Qilu University of Technology (Shandong Academy of Sciences), Jinan (CN); and SHANDONG COMPUTER SCIENCE CENTER (NATIONAL SUPERCOMPUTING CENTER IN JINAN), Jinan (CN)
Filed on Oct. 18, 2023, as Appl. No. 18/381,195.
Claims priority of application No. 202211576932.1 (CN), filed on Dec. 8, 2022.
Int. Cl. G06V 40/16 (2022.01); G06T 5/20 (2006.01); G06V 10/44 (2022.01); G06V 10/764 (2022.01); G06V 10/80 (2022.01)

CPC G06V 40/176 (2022.01) [G06T 5/20 (2013.01); G06V 10/44 (2022.01); G06V 10/764 (2022.01); G06V 10/814 (2022.01); G06V 40/161 (2022.01)]

8 Claims

1. A dynamic facial expression recognition (FER) method based on a Dempster-Shafer (DS) theory, comprising the following steps:

a) preprocessing video data V in a dataset, extracting last N frames of the video data V to obtain consecutive video frames, and performing face detection, alignment, and clipping operations on the video frames to obtain a facial expression image P;

b) constructing a Dempster-Shafer theory Expression Recognition (DSER) network model, wherein the DSER network model comprises a same-identity inter-frame sharing module M_sa space-domain attention module M _atta time-domain fully connected (FC) unit V_FC, a time-domain multi-layer perceptron unit V_MLP, a spatio-temporal feature fusion module M_st, and a discriminator D_dsguided by a DS theory;

c) separately inputting the facial expression image P into the same-identity inter-frame sharing module M_sand the space-domain attention module M_attin the DSER network model, to obtain a same-identity inter-frame shared feature F_s^Pand a space-domain attention feature F_att^P, and multiplying the same-identity inter-frame shared feature F_s^Pby the space-domain attention feature F_att^Pto obtain a space-domain feature F_satt^PS;

d) sequentially inputting the facial expression image P into the time-domain FC unit V_FCand the time-domain multi-layer perceptron unit V_MLPin the DSER network model to obtain a time-domain vector V_FCMLP^PT;

e) inputting the space-domain feature F_satt^PSand the time-domain vector V_FCMLP^PTinto the spatio-temporal feature fusion module M_stin the DSER network model to obtain a spatio-temporal feature F_st^P;

f) inputting the spatio-temporal feature F_st^Pinto the discriminator D_dsguided by the DS theory in the DSER network model, to obtain a classification result R, and completing the construction of the DSER network model;

g) calculating a loss function l;

h) iterating the DSER network model by using the loss function l and an Adam optimizer, to obtain a trained DSER network model; and

i) processing to-be-detected video data by using the step a), to obtain a facial expression image, and inputting the facial expression image into the trained DSER network model to obtain the classification result R.