US 12,213,767 B2
	Video-based method and system for accurately estimating human body heart rate and facial blood volume distribution
Hujun Bao, Hangzhou (CN); Xiaogang Xu, Hangzhou (CN); and Xiaolong Wang, Hangzhou (CN)
Assigned to ZHEJIANG UNIVERSITY, Hangzhou (CN)
Filed by ZHEJIANG UNIVERSITY, Zhejiang (CN)
Filed on Mar. 17, 2022, as Appl. No. 17/696,909.
Application 17/696,909 is a continuation of application No. PCT/CN2021/080905, filed on Mar. 16, 2021.
Claims priority of application No. 202010448368.X (CN), filed on May 25, 2020.
Prior Publication US 2022/0218218 A1, Jul. 14, 2022
Int. Cl. A61B 5/024 (2006.01); A61B 5/00 (2006.01); A61B 5/0295 (2006.01); G06T 7/00 (2017.01); G06T 7/11 (2017.01); G06T 7/73 (2017.01); G06V 10/25 (2022.01); G06V 10/62 (2022.01); G06V 10/774 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01); G16H 50/20 (2018.01)

CPC A61B 5/02416 (2013.01) [A61B 5/0295 (2013.01); A61B 5/7232 (2013.01); A61B 5/725 (2013.01); A61B 5/7264 (2013.01); G06T 7/0012 (2013.01); G06T 7/11 (2017.01); G06T 7/73 (2017.01); G06V 10/25 (2022.01); G06V 10/62 (2022.01); G06V 10/774 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 40/161 (2022.01); G06V 40/171 (2022.01); G16H 50/20 (2018.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01); G06V 2201/03 (2022.01)]

5 Claims

1. A video-based method for accurately estimating a human heart rate and facial blood volume distribution, comprising the following steps:

(1) detecting a human face region in video frame, and extracting a human face image sequence and face key position points in time dimension, extracting a global face signal and a set of face roi signals based on the face image sequence, preprocessing the signals;

wherein the step (1) specifically comprises:

(1.1) using a convolution neural network model to detect the human face region and the face position key points in the video frame, and respectively generating a human face image sequence and a face key position point sequence in time dimension;

(1.2) extracting the global face signal and the set of the face roi signals, respectively, based on the face image sequence, the global face signal can be extracted as shown by Formula 3, where: face_sig is a compressed signal, PCompress ( ) is a compression function which is used to calculate an average pixel intensity of a face image of the face image sequence, and face_seq is the face image sequence;

face_sig=PCompress(face_seq) (3)

segmenting the face image by roi blocks with R×R size to obtain roi block image sequences in time dimension, as shown in Formula 4, where: face_roi_irepresents an i^throi block image sequence, face_roi_seq is a set of roi block image sequences, and mxn is a sum of the roi blocks;

face_roi_seq={face_roi₁,face_roi₂, . . . ,face_roi_i, . . . ,face_roi_m×n} (4)

compressing each roi block image sequence, as shown in Formula5, where: face_roi_seq is the set of roi block image sequences, PCompress ( ) is the compression function for calculating mean of pixel intensity of the image of the sequence, and face_roi_sig is the result of PCompress ( );

face_roi_sig=PCompress(face_roi_seq) (5)

where:

face_roi_sig={face_roi_sig₁, . . . ,face_roi_sig_i, . . . ,face_roi_sig_m×n} (6)

in Formula 6, face_roi_sig_iis a signal compressed by the i^throi block image sequence, and m×n is the sum of the roi blocks;

(1.3) preprocessing the global face signal and the set of the face roi signals to eliminate components outside a specified frequency range;

(2) estimate heart rate value and facial blood volume distribution based on a reference signal and the set of roi signals;

(3) estimate heart rate value based on heart rate distribution probability by using a heart rate estimation model based on Long and Short Time Memory Network (LSTM) and a residual convolution neural network model;

(4) fusing results of the heart rate value of the step (2) and the step (3) based on Kalman filtering.