US 12,488,237 B2
Training neural networks using transfer learning
Joan Puigcerver i Perez, Zurich (CH); Basil Mustafa, Zurich (CH); André Susano Pinto, Zurich (CH); Carlos Riquelme Ruiz, Zurich (CH); Neil Matthew Tinmouth Houlsby, Zurich (CH); and Daniel M. Keysers, Stallikon (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 28, 2021, as Appl. No. 17/488,166.
Claims priority of provisional application 63/087,104, filed on Oct. 2, 2020.
Prior Publication US 2022/0108171 A1, Apr. 7, 2022
Int. Cl. G06N 3/08 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method for training a neural network to perform a first prediction task, the method comprising:
obtaining trained model parameters for each of a plurality of candidate neural networks, wherein each candidate neural network has been pre-trained to perform a respective second prediction task that is different from the first prediction task;
obtaining a plurality of training examples corresponding to the first prediction task;
prior to fine-tuning any of the plurality of candidate neural networks for the first prediction task:
predicting, for each plurality of candidate neural networks and using the plurality of training examples, a respective performance of the candidate neural network on the first prediction task, and
selecting a proper subset of the plurality of candidate neural networks using the respective predicted performance on the first prediction task for each of the candidate neural networks;
after selecting the proper subset, fine-tuning only the candidate neural networks in the proper subset for the first prediction task by generating, for each candidate neural network in the proper subset, one or more fine-tuned neural networks, wherein each of the one or more fine-tuned neural networks is generated by updating the model parameters of the candidate neural network using the plurality of training examples; and
determining model parameters for the neural network using the one or more fine-tuned neural networks.