CPC G06N 3/084 (2013.01) [G06F 17/18 (2013.01); G06F 18/10 (2023.01); G06N 5/046 (2013.01)] | 13 Claims |
1. A computer-implemented method for training a neural network, the computer-implemented method comprising:
initializing one or more model parameters for training the neural network;
performing a forward pass and back propagation for a minibatch of training data comprising a plurality of batches of training data;
determining a new weight value for each of a plurality of nodes of the neural network based on an optimization algorithm;
for each determined new weight value, determining whether to update a running mean corresponding to a weight of each node from the plurality of nodes, wherein determining whether to update the running mean is based on:
whether a current batch of training data falls within a predefined last subset of batches of the minibatch; and
performing a random determination having a probability based on a value included in the one or more model parameters;
based on a determination to update the running mean, calculating a new mean weight value for each node using the determined new weight value,
wherein when the current batch of training data does not fall within the predefined last subset of batches of the minibatch, the running mean is not updated using the determined new weight value;
updating weight parameters for all nodes based on the calculated new mean weight values corresponding to each node;
assigning the running mean as the weight for each node when training on K number of minibatches is completed, wherein K is a predefined number; and
reinitializing running means for all nodes in the neural network at a start of training a K+1 minibatch of the training data.
|