| CPC G06N 3/08 (2013.01) [G06N 3/0895 (2023.01); G06N 3/09 (2023.01); G06N 3/098 (2023.01); G06N 3/0985 (2023.01)] | 19 Claims |

|
1. A processor-implemented neural network method, comprising:
setting a searching range of mask weights based on both of a distribution of the mask weights of a binary mask corresponding to a filter of a pretrained model and a learning rate-related parameter set in an incremental learning model;
identifying a targeted mask weight in the searching range of the mask weights;
calculating a weight gradient corresponding to the targeted mask weight based on an input activation of an input channel of a masked filter obtained from forward propagation of a training epoch process and a loss gradient obtained from back propagation of the training epoch process, wherein the masked filter is obtained by activating or deactivating each weight included in the filter based on the binary mask;
updating the targeted mask weight based on the weight gradient;
updating a portion of the binary mask corresponding to the updated targeted mask weight based on and a preset reference value; and
updating the masked filter by applying the updated the binary mask to the masked filter for a next training epoch process of the pretrained model,
wherein the setting of the searching range comprises:
when the distribution of the mask weights is obtained, setting the searching range based on a mean of the mask weights from the distribution of the mask weights; and
when it is determined that the learning rate-related parameter that determines a level of learning rate decay has changed according to predefined criteria, setting the searching range based on the level of learning rate decay,
wherein the training epoch process is included in a process for training the pretrained model, which is configured to perform a first task, to perform a second task,
wherein a non-target mask weight not in the searching range is not updated during the training epoch process and is fixed as a value determined in a previous training epoch, and
wherein a value of the binary mask corresponding to the non-targeted mask weight is fixed as a value determined in the previous training epoch.
|