US 11,941,171 B1
Eye gaze tracking method, apparatus and system
Menglei Zhang, Beijing (CN); Jiankang Sun, Beijing (CN); Guixin Yan, Beijing (CN); Yaoyu Lv, Beijing (CN); Yachong Xue, Beijing (CN); and Xinkai Li, Beijing (CN)
Assigned to BOE Technology Group Co., Ltd., Beijing (CN)
Appl. No. 17/765,612
Filed by BOE Technology Group Co., Ltd., Beijing (CN)
PCT Filed May 28, 2021, PCT No. PCT/CN2021/096793
§ 371(c)(1), (2) Date Mar. 31, 2022,
PCT Pub. No. WO2022/246804, PCT Pub. Date Dec. 1, 2022.
Int. Cl. G06F 3/01 (2006.01); G06T 7/215 (2017.01); G06T 7/292 (2017.01); G06T 7/73 (2017.01); G06T 11/00 (2006.01); G06V 10/22 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 40/10 (2022.01)
CPC G06F 3/013 (2013.01) [G06T 7/215 (2017.01); G06T 7/292 (2017.01); G06T 7/74 (2017.01); G06T 11/00 (2013.01); G06V 10/235 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 40/10 (2022.01); G06T 2207/20084 (2013.01); G06T 2207/30201 (2013.01)] 20 Claims
OG exemplary drawing
 
1. An eye gaze tracking apparatus, wherein the eye gaze tracking apparatus comprises:
a memory and an executor, the executor is configured for performing the following steps:
capturing, by at least two cameras, a plurality of frames of facial images when a viewer views a display screen;
segmenting a current frame of facial image with a pre-trained eye detection model to obtain an image for left and right eyes, wherein the eye detection model is based on a convolutional neural network, an input of the eye detection model is the facial image, and an output of the eye detection model is the image for left and right eyes that is segmented from the facial image; and
calculating a similarity between the current frame of facial image and each frame of facial image in previous N frames of facial images; if a similarity between the current frame of facial image and a facial image n in the previous N frames of facial image is greater than a preset threshold, determining a prediction result of a position at which the eyes gaze for the facial image n to be a prediction result of a position at which the eyes gaze for the current frame of facial image; if the similarity between the current frame of facial image and each frame of facial image in previous N frames of facial images is not greater than the preset threshold, detecting a position on display screen at which the eyes of the viewer gaze with a pre-trained eye gaze recognition model, wherein the eye gaze recognition model is based on a convolutional neural network, an input of the eye gaze recognition model is the image for left and right eyes that is segmented from the facial image and prediction results of positions at which the eyes gaze for the previous N frames of facial images, and an output of the eye gaze recognition model is the prediction result of the position at which the eyes gaze for the current frame of facial image, wherein N is a positive integer, and n is an integer greater than 1 and not greater than N.