US 12,487,664 B2
	Eye tracking and gaze estimation using off-axis camera
Zhengyang Wu, Bellewue, WA (US); Srivignesh Rajendran, San Francisco, CA (US); Tarrence van As, New York, NY (US); Joelle Zimmermann, Los Angeles, CA (US); Vijay Badrinarayanan, Mountain View, CA (US); and Andrew Rabinovich, San Francisco, CA (US)
Assigned to Magic Leap, Inc., Plantation, FL (US)
Filed by Magic Leap, Inc., Plantation, FL (US)
Filed on Feb. 17, 2022, as Appl. No. 17/674,724.
Application 17/674,724 is a continuation of application No. PCT/US2020/047046, filed on Aug. 19, 2020.
Claims priority of provisional application 62/935,584, filed on Nov. 14, 2019.
Claims priority of provisional application 62/926,241, filed on Oct. 25, 2019.
Claims priority of provisional application 62/888,953, filed on Aug. 19, 2019.
Prior Publication US 2022/0244781 A1, Aug. 4, 2022
Int. Cl. G06F 3/01 (2006.01); G02B 27/00 (2006.01); G02B 27/01 (2006.01); G06V 10/26 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 40/18 (2022.01)

CPC G06F 3/013 (2013.01) [G02B 27/0093 (2013.01); G06V 10/267 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 40/193 (2022.01); G06V 40/197 (2022.01); G02B 2027/0138 (2013.01); G02B 27/0172 (2013.01)]

14 Claims

1. A method of training a neural network, the method comprising:

performing a first training step including:

providing a first image of a first eye to the neural network as input, the neural network having a set of feature encoding layers connected to a plurality of sets of task-specific layers, the plurality of sets of task-specific layers including at least three sets of task-specific layers that operate on an output generated by the set of feature encoding layers, the plurality of sets of task-specific layers including:

a first set of task-specific layers that output two-dimensional (2D) pupil data,

a second set of task-specific layers that output eye segmentation data that includes a segmentation of an eye into a plurality of regions including one or more of a background region, a sclera region, a pupil region, or an iris region, and

a third of task-specific layers that output cornea center data;

generating, using the set of feature encoding layers and the second set of task-specific layers of the neural network and based on the first image of the first eye as input, eye segmentation data for the first eye that includes a segmentation of the first eye into the plurality of regions; and

training the set of feature encoding layers using the eye segmentation data for the first eye by modifying weights associated with the set of feature encoding layers; and

performing a second training step including:

providing a second image of a second eye to the neural network as input;

generating, using the set of feature encoding layers, the first set of task specific layers, and the third set of task-specific layers of the neural network and based on the second image of the second eye as input, network output data including 2D pupil data corresponding to the second eye and cornea center data corresponding to the second eye; and

training the plurality of sets of task-specific layers using the network output data by modifying weights associated with the plurality of sets of task-specific layers;

wherein the neural network is trained such that the set of feature encoding layers are trained during the first training step but are held fixed during the second training step.