US 12,450,497 B2
	Model optimization method, electronic device, and computer program product
Jiacheng Ni, Shanghai (CN); Zijia Wang, WeiFang (CN); Jinpeng Liu, Shanghai (CN); and Zhen Jia, Shanghai (CN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Aug. 16, 2021, as Appl. No. 17/402,769.
Claims priority of application No. 202110838386.3 (CN), filed on Jul. 23, 2021.
Prior Publication US 2023/0025148 A1, Jan. 26, 2023
Int. Cl. G06N 3/0985 (2023.01); G06F 11/34 (2006.01); G06N 20/00 (2019.01)

CPC G06N 3/0985 (2023.01) [G06F 11/3447 (2013.01); G06N 20/00 (2019.01)]

20 Claims

1. A method, comprising:

determining an initial learning rate combination for a deep learning model in a processor-based machine learning system, wherein the initial learning rate combination comprises a plurality of learning rates, each learning rate being determined for one of a plurality of layers of the deep learning model, and the plurality of learning rates comprising static learning rates and dynamic learning rates;

generating a coded representation of the initial learning rate combination in the processor-based machine learning system, wherein the coded representation comprises a plurality of entries for respective ones of the layers of the deep learning model, with each of the entries being configured to include a value denoting a particular selected one of the static learning rates and the dynamic learning rates for its corresponding one of the layers;

adjusting the initial learning rate combination in the processor-based machine learning system to obtain a target learning rate combination, by application of an annealing algorithm of the processor-based machine learning system to the coded representation;

training the deep learning model in the processor-based machine learning system utilizing the target learning rate combination; and

performing a recognition task in the processor-based machine learning system utilizing the trained deep learning model;

wherein an accuracy rate achieved by the processor-based machine learning system for the recognition task when the target learning rate combination is used to train the deep learning model is higher than or equal to a first threshold accuracy rate.