US 12,223,432 B2
Using disentangled learning to train an interpretable deep learning model
Supriyo Chakraborty, White Plains, NY (US); Seraphin Bernard Calo, Cortlandt Manor, NY (US); and Jiawei Wen, State College, PA (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Dec. 23, 2020, as Appl. No. 17/133,437.
Prior Publication US 2022/0198266 A1, Jun. 23, 2022
Int. Cl. G06N 3/088 (2023.01); G06F 18/2137 (2023.01); G06N 3/047 (2023.01); G06V 30/262 (2022.01)
CPC G06N 3/088 (2013.01) [G06F 18/2137 (2023.01); G06N 3/047 (2023.01); G06V 30/274 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method of training an interpretable deep learning model for a machine learning system, comprising:
receiving an input set of data;
providing the input set of data to a deep neural network model;
extracting features from the deep neural network model;
generating a latent space of vectors comprising the extracted features;
feeding the latent space of vectors generated from extracted features of the deep neural network model to a task-specific model, wherein the task-specific model is a low-complexity and linear learning model;
generating interpretable predictions of feature dimensions from the task-specific model;
reconstructing the input set of data using a decoder module;
determining a reconstruction error loss from reconstructing the input set of data;
determining a classification loss or a regression loss from a task specific output set of data; and
training an autoencoder, the decoder module, and the low-complexity learning model, using a combination of (i) the reconstruction error loss and (ii) the classification loss or the regression loss.