US 12,254,414 B2
	Autoencoding generative adversarial network for augmenting training data usable to train predictive models
Mårten Nilsson, Danderyd (SE)
Assigned to Tobii AB, Danderyd (SE)
Appl. No. 17/056,272
Filed by Tobii AB, Danderyd (SE)
PCT Filed May 13, 2019, PCT No. PCT/SE2019/050420 § 371(c)(1), (2) Date Nov. 17, 2020, PCT Pub. No. WO2019/221654, PCT Pub. Date Nov. 21, 2019.
Claims priority of provisional application 62/672,985, filed on May 17, 2018.
Prior Publication US 2021/0256353 A1, Aug. 19, 2021
Int. Cl. G06N 3/04 (2023.01); G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01); G06N 3/084 (2023.01); G06N 3/088 (2023.01)

CPC G06N 3/088 (2013.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01); G06N 3/084 (2013.01)]

16 Claims

1. A method for gaze prediction, the method being implemented on a computer system and comprising:

providing, to a gaze prediction model, a user image that shows at least a user eye; and

receiving, from the gaze prediction model, a prediction of a user gaze based on the user image,

wherein:

the gaze prediction model is trained based on an augmented training image that is generated by a generator network,

the generator network is trained to generate the augmented training image based on a training of an autoencoder network and a generative adversarial network,

the autoencoder network comprises the generator network and an encoder network,

the generative adversarial network comprises the generator network and a discriminator network,

a loss function for training the generator network comprises a first loss term associated with training the encoder network and a second loss term associated with training the generative adversarial network, and

the training of the autoencoder network and the generative adversarial network comprises:

providing, to the encoder network, a training image;

mapping, to a latent space, a code vector that is generated by the encoder network based on the training image;

computing a loss of the encoder network based on a comparison of the training image and a reconstructed image, wherein the reconstructed image is generated by the generator network based on the code vector;

updating parameters of the encoder network based on the loss of the encoder network;

providing, to the discriminator network, the training image and a fake image, wherein the fake image is generated by the generator network based on the latent space;

computing a loss of the discriminator network based on predictions of the discriminator network whether each of the training image and the fake image is real or fake; and

updating parameters of the discriminator network based on the loss of the discriminator network.