US 12,450,477 B2
	Adversarial information bottleneck strategy for improved machine learning
Qing Li, Milpitas, CA (US); Yongjune Kim, Daegu (KR); and Cyril Guyot, San Jose, CA (US)
Assigned to Western Digital Technologies, Inc., San Jose, CA (US)
Filed by Western Digital Technologies, Inc., San Jose, CA (US)
Filed on Aug. 23, 2021, as Appl. No. 17/409,667.
Claims priority of provisional application 63/107,044, filed on Oct. 29, 2020.
Prior Publication US 2022/0138565 A1, May 5, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)

CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)]

18 Claims

1. A method of performing machine learning, comprising:

processing a training data instance with a task model, wherein the task model is configured to generate an encoding and a task model output based on parameters of the task model, wherein the training data instance comprises one or more input variables and at least one target variable;

processing a discriminator input based on the encoding using a discriminator model, wherein the discriminator model is configured to generate an estimated mutual information between the encoding and the one or more input variables of the training data instance based on parameters of the discriminator model;

updating parameters of the discriminator model using a first iterative optimization algorithm to maximize a discriminator objective function based on the estimated mutual information;

updating parameters of the task model using a second iterative optimization algorithm to minimize a task objective function based on a sum of the estimated mutual information between the task model output and the one or more input variables of the training data instance and a conditional entropy between the target variable and an encoding generated by the task model; and

determining a weighted sum of the conditional entropy between the target variable and the encoding generated by the task model based on a cross-entropy between the task model output and a concatenation of the task model output and output from a second to last layer of the task model and the estimated mutual information between the encoding and the one or more input variables of the training data instance.