US 11,797,084 B2
	Method and apparatus for training gaze tracking model, and method and apparatus for gaze tracking
Zheng Zhou, Shenzhen (CN); Xing Ji, Shenzhen (CN); Yitong Wang, Shenzhen (CN); Xiaolong Zhu, Shenzhen (CN); and Min Luo, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on May 18, 2021, as Appl. No. 17/323,827.
Application 17/323,827 is a continuation of application No. PCT/CN2020/083486, filed on Apr. 7, 2020.
Claims priority of application No. 201910338224.6 (CN), filed on Apr. 24, 2019.
Prior Publication US 2021/0271321 A1, Sep. 2, 2021
Int. Cl. G06F 3/01 (2006.01); G06K 9/62 (2022.01); G06T 5/00 (2006.01); G06V 10/82 (2022.01); G06V 40/00 (2022.01); G06V 40/18 (2022.01); G06F 18/214 (2023.01); G06V 40/19 (2022.01); G06F 18/241 (2023.01)

CPC G06F 3/013 (2013.01) [G06F 18/214 (2023.01); G06T 5/002 (2013.01); G06T 5/009 (2013.01); G06V 10/82 (2022.01); G06V 40/00 (2022.01); G06V 40/18 (2022.01); G06F 18/241 (2023.01); G06V 40/19 (2022.01); G06V 40/193 (2022.01); G06V 40/197 (2022.01)]

20 Claims

1. A method for training a gaze tracking model, comprising:

obtaining a training sample set, the training sample set comprising multiple training sample pairs, each training sample pair comprising an eye sample image and a labeled gaze vector corresponding to the eye sample image;

processing the eye sample images in the training sample set by using an initial gaze tracking model to obtain a predicted gaze vector of each eye sample image;

determining a model loss according to a cosine distance between the predicted gaze vector and the labeled gaze vector for each eye sample image;

iteratively adjusting one or more reference parameters of the initial gaze tracking model until the model loss meets a convergence condition, to obtain a target gaze tracking model;

processing a target eye image by using the target gaze tracking model to determine a predicted gaze vector of the target eye image;

determining, when the target eye image belongs to a video frame in a video stream, a first and a second reference eye images corresponding to the target eye image, the first and the second reference eye images and the target eye image being images in consecutive video frames in the video stream; and

performing smoothing on the predicted gaze vector corresponding to the target eye image according to a predicted gaze vector corresponding to the first and the second reference eye images using a second-order Bezier curve.