US 12,260,327 B2
	Optimizer learning method and apparatus, electronic device and readable storage medium
Xiaomin Fang, Beijing (CN); Fan Wang, Beijing (CN); Yelan Mo, Beijing (CN); and Jingzhou He, Beijing (CN)
Assigned to Beijing Baidu Netcom Science and Technology Co., Ltd., Beijing (CN)
Filed by Beijing Baidu Netcom Science and Technology Co., Ltd., Beijing (CN)
Filed on Mar. 23, 2021, as Appl. No. 17/210,141.
Claims priority of application No. 202010625746.7 (CN), filed on Jul. 1, 2020.
Prior Publication US 2022/0004867 A1, Jan. 6, 2022
Int. Cl. G06N 3/08 (2023.01); G06F 18/20 (2023.01); G06N 20/00 (2019.01)

CPC G06N 3/08 (2013.01) [G06F 18/285 (2023.01); G06N 20/00 (2019.01)]

6 Claims

1. A computer-implemented optimizer learning method, comprising:

acquiring training data, the training data comprising a plurality of data sets each comprising neural network attribute information, neural network optimizer information, and optimizer parameter information; and

training a meta-learning model by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output, until the meta-learning model converges,

wherein the neural network attribute information comprises at least one of neural network structure information and neural network task information, the neural network optimizer information is information of a type of the optimizer, and the neural network task information comprises one of a classification task and a recognition task,

wherein the training a meta-learning model by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output comprises:

using a parameter of a current meta-learning model as a first parameter; jittering the first parameter regarding the meta-learning model by changing the first parameter using a Gaussian noise, to acquire a plurality of jitter parameters;

replacing the first parameter with the plurality of jitter parameters to construct a plurality of jitter meta-learning models according to the plurality of jitter parameters;

training the plurality of jitter meta-learning models respectively by taking the neural network attribute information and the neural network optimizer information in the data sets as input and taking the optimizer parameter information in the data sets as output, wherein the data sets used in the training of the jitter meta-learning models is the same or different; and

selecting, according to training results, a jitter meta-learning model with the smallest loss function or a jitter meta-learning model with the fastest convergence speed as a final meta-learning model.