| CPC G06N 3/08 (2013.01) [G06F 18/285 (2023.01); G06N 20/00 (2019.01)] | 6 Claims |

|
1. A computer-implemented optimizer learning method, comprising:
acquiring training data, the training data comprising a plurality of data sets each comprising neural network attribute information, neural network optimizer information, and optimizer parameter information; and
training a meta-learning model by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output, until the meta-learning model converges,
wherein the neural network attribute information comprises at least one of neural network structure information and neural network task information, the neural network optimizer information is information of a type of the optimizer, and the neural network task information comprises one of a classification task and a recognition task,
wherein the training a meta-learning model by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output comprises:
using a parameter of a current meta-learning model as a first parameter; jittering the first parameter regarding the meta-learning model by changing the first parameter using a Gaussian noise, to acquire a plurality of jitter parameters;
replacing the first parameter with the plurality of jitter parameters to construct a plurality of jitter meta-learning models according to the plurality of jitter parameters;
training the plurality of jitter meta-learning models respectively by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output, wherein the data sets used in the training of the jitter meta-learning models is the same or different; and
selecting, according to training results, a jitter meta-learning model with the smallest loss function or a jitter meta-learning model with the fastest convergence speed as a final meta-learning model.
|