US 11,853,877 B2
Training transfer-focused models for deep learning
Patrick Watson, Montrose, NY (US); Bishwaranjan Bhattacharjee, Yorktown Heights, NY (US); Siyu Huo, White Plains, NY (US); Noel Christopher Codella, White Plains, NY (US); Brian Michael Belgodere, Fairfield, CT (US); Parijat Dube, Yorktown Heights, NY (US); Michael Robert Glass, Bayonne, NJ (US); John Ronald Kender, Leonia, NJ (US); and Matthew Leon Hill, Yonkers, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Apr. 2, 2019, as Appl. No. 16/373,149.
Prior Publication US 2020/0320379 A1, Oct. 8, 2020
Int. Cl. G06N 3/08 (2023.01); G06N 3/04 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/04 (2013.01)] 20 Claims
OG exemplary drawing
 
8. A system comprising:
a hardware processor;
a memory device coupled with the hardware processor;
the hardware processor operable to at least:
receive a similarity estimate between a sample data set and a source data set, wherein the sample data set is associated with a target machine learning task, wherein the source data set is associated with a prior-trained neural network model and was used as a training data set used in training the prior-trained neural network model, wherein a plurality of similarity estimates is received corresponding to a plurality of source data sets associated with a plurality of prior-trained neural network models, the similarity estimate determined based on outputs of a hidden layer of the prior-trained neural network model generated using the sample data set and outputs of the hidden layer of the prior-trained neural network model generated using the source data set, each of the plurality of similarity estimates determined based on a distance between the sample data set and a corresponding one of the plurality of source data sets;
determine, at least based on the similarity estimates being above a predefined distance threshold, to train a new neural network model, the similarity estimates being above the predefined distance indicating a gap in areas covered by the source data sets used in training the plurality of prior-trained neural network models;
responsive to determining to train the new neural network model, create a cluster among the plurality of prior-trained neural network models by at least running the plurality of prior-trained neural network models using the sample data set, clustering the prior-trained neural network models into different clusters using activations of a hidden layer of the prior-trained neural network model generated using the sample data set, and selecting the cluster closest to the sample data set;
determine a set of training data based on the cluster, wherein source data sets used in training a plurality of prior-trained neural network models in the cluster are combined for use as at least part of the set of training data; and
train the new neural network model based on the set of training data.