US 12,380,328 B2
Lightweight model training method, image processing method, electronic device, and storage medium
Ruoyu Guo, Beijing (CN); Yuning Du, Beijing (CN); Chenxia Li, Beijing (CN); Baohua Lai, Beijing (CN); and Yanjun Ma, Beijing (CN)
Assigned to Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed by Beijing Baidu Netcom Science Technology Co., Ltd., Beijing (CN)
Filed on Feb. 13, 2023, as Appl. No. 18/108,956.
Claims priority of application No. 202211059602.5 (CN), filed on Aug. 30, 2022.
Prior Publication US 2024/0070454 A1, Feb. 29, 2024
Int. Cl. G06N 3/08 (2023.01); G06V 10/82 (2022.01)
CPC G06N 3/08 (2013.01) [G06V 10/82 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A lightweight model training method, comprising:
acquiring a first augmentation probability, a second augmentation probability and a target weight adopted in an e-th iteration, the target weight being a weight of a distillation loss value, e being a positive integer not greater than E, and E being a maximum quantity of iterations and being a positive integer greater than 1;
performing data augmentation on a data set based on the first augmentation probability and the second augmentation probability respectively, to obtain a first data set and a second data set;
obtaining a first output value of a student model and a second output value of a teacher model based on the first data set;
obtaining a third output value of the student model and a fourth output value of the teacher model based on the second data set, and the student model being a lightweight model;
determining a distillation loss function based on the first output value and the second output value;
determining a truth-value loss function based on the third output value and the fourth output value;
determining a target loss function based on the distillation loss function and the truth-value loss function;
training the student model based on the target loss function; and
determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E.
 
10. An electronic device, comprising:
at least one processor; and
a memory connected in communication with the at least one processor;
wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute operations, comprising:
acquiring a first augmentation probability, a second augmentation probability and a target weight adopted in an e-th iteration, the target weight being a weight of a distillation loss value, e being a positive integer not greater than E, and E being a maximum quantity of iterations and being a positive integer greater than 1;
performing data augmentation on a data set based on the first augmentation probability and the second augmentation probability respectively, to obtain a first data set and a second data set;
obtaining a first output value of a student model and a second output value of a teacher model based on the first data set;
obtaining a third output value of the student model and a fourth output value of the teacher model based on the second data set, and the student model being a lightweight model;
determining a distillation loss function based on the first output value and the second output value;
determining a truth-value loss function based on the third output value and the fourth output value;
determining a target loss function based on the distillation loss function and the truth-value loss function;
training the student model based on the target loss function; and
determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E.
 
16. A non-transitory computer-readable storage medium storing a computer instruction thereon, wherein the computer instruction is used to cause a computer to execute operations, comprising:
acquiring a first augmentation probability, a second augmentation probability and a target weight adopted in an e-th iteration, the target weight being a weight of a distillation loss value, e being a positive integer not greater than E, and E being a maximum quantity of iterations and being a positive integer greater than 1;
performing data augmentation on a data set based on the first augmentation probability and the second augmentation probability respectively, to obtain a first data set and a second data set;
obtaining a first output value of a student model and a second output value of a teacher model based on the first data set;
obtaining a third output value of the student model and a fourth output value of the teacher model based on the second data set, and the student model being a lightweight model;
determining a distillation loss function based on the first output value and the second output value;
determining a truth-value loss function based on the third output value and the fourth output value;
determining a target loss function based on the distillation loss function and the truth-value loss function;
training the student model based on the target loss function; and
determining a first augmentation probability or target weight to be adopted in an (e+1)-th iteration in a case of e is less than E.