US 12,437,190 B2
	Automated fine-tuning of a pre-trained neural network for transfer learning
Parijat Dube, Yorktown Heights, NY (US); Bishwaranjan Bhattacharjee, Yorktown Heights, NY (US); Patrick Watson, Montrose, NY (US); and John Ronald Kender, Leonia, NJ (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 5, 2019, as Appl. No. 16/704,804.
Prior Publication US 2021/0174191 A1, Jun. 10, 2021
Int. Cl. G06N 3/08 (2023.01); G06F 17/18 (2006.01); G06N 20/10 (2019.01)

CPC G06N 3/08 (2013.01) [G06F 17/18 (2013.01); G06N 20/10 (2019.01)]

15 Claims

1. A computer implemented method for re-training a pre-trained neural network comprising:

fetching a pre-trained neural network from a library of pre-trained neural networks, wherein the pre-trained neural network has been pre-trained on a source dataset;

adjusting a learning rate for each layer of a set of layers of the pre-trained neural network, wherein the pre-trained neural network comprises at least a first layer and a second layer, wherein adjusting the learning rate for each layer comprises:

setting, by a processor, a learning rate range for the pre-trained neural network, wherein the learning rate range comprises a largest learning rate;

obtaining, by the processor, a first target feature vector from the first layer of the pre-trained neural network responsive to a target dataset passing through the first layer;

obtaining, by the processor, a first source feature vector associated with the first layer of the pre-trained neural network responsive to the source dataset passing through the first layer;

obtaining, by the processor, a second target feature vector from the second layer of the pre-trained neural network responsive to a second target dataset passing through the second layer, wherein the second target dataset corresponds to an output dataset of the target dataset having passed through the first layer

obtaining, by the processor, a second source feature vector from the second layer of the pre-trained neural network responsive to a second source dataset passing through the second layer, wherein the second source dataset corresponds to an output dataset of the target dataset having passed through the first layer;

obtaining, by the processor, a first divergence between the first target feature vector and the first source feature vector;

obtaining, by the processor, a second divergence between the second target feature vector and the second source feature vector;

obtaining, by the processor, a highest divergence and a non-highest divergence by comparing the first divergence to the second divergence;

identifying, by the processor, one of at least the first layer and the second layer corresponding to the highest divergence as a highest divergence layer;

setting, by the processor, a first learning rate for the highest divergence layer, wherein the first learning rate for the highest divergence layer is the largest learning rate of the learning rate range;

identifying, by the processor, one of at least the first layer and the second layer corresponding to the non-highest divergence as a non-highest divergence layer;

setting, by the processor, a second learning rate for the non-highest divergence layer, wherein the second learning rate for the non-highest divergence layer is proportional to the highest learning rate with a ratio of the non-highest divergence over the highest divergence;

re-training, by the processor, the pre-trained neural network for the target dataset, wherein the retraining comprises:

adjusting, by the processor, a first set of weights of the highest divergence layer according to the first learning rate upon inputting of the target dataset; and

adjusting, by the processor, a second set of weights of the non-highest divergence layer according to the second learning rate upon inputting of the target dataset.