| CPC G06N 3/047 (2023.01) [G06N 3/063 (2013.01); G06N 3/08 (2013.01)] | 11 Claims |

|
1. A processor implemented method for mitigating generalization loss in deep neural network (DNN) for time series classification, the method comprising:
receiving, via one or more hardware processors, a timeseries training dataset comprising a plurality of training samples;
computing, via the one or more hardware processors, an entropy of the timeseries training dataset, and a mean and a variance of the entropy;
computing, via the one or more hardware processors, a regularization factor based on the mean and the variance of the entropy and facilitating auto-tuning of regularization by setting the regularization factor which relaxes the DNN to underplay higher learnability by using representation space augmentation which minimizes training error, wherein the representation space augmentation is performed by introducing model complexity and enriching residual learning by unsupervised feature space augmentation that refines residual channel of residual network, wherein the residual learning is introduced to solve learning degradation problem in the time series data set when depth of the deep neural network is increased,
wherein computing the regularization factor comprises:
computing the entropy for each of the plurality of training samples to obtain a vector having a plurality of entropy values associated with each of the plurality of iterations;
computing an index of dispersion of the entropy based on the mean and variance of the entropy; and
computing the regularization factor associated with the training timeseries data based on the index of dispersion and a scale factor of the entropy;
performing a plurality of iterations, via the one or more hardware processors, wherein each iteration of the plurality of iterations comprises:
dynamically adjusting the learning rate of the DNN using a Mod-Adam optimization, and obtain a network parameter, wherein adjusting the learning rate of the deep neural network for a current instance from amongst a plurality of training instances using the Mod-Adam optimization comprises:
receiving the training timeseries dataset for the current instance;
computing, for a set of iterations in the current instance, the learning rate by using the Shannon entropy of the current instance with respect to a minimum observation window and a maximum observation window of the current instance;
determining whether, for a consecutive J number of iterations, the generalization loss of the current iteration from amongst the set of iterations is greater than the generalization loss of a preceding iteration from amongst the set of iterations; and
updating the learning rate to obtain an updated learning rate based on the determination of the generalization loss of the current iteration being greater than the generalization loss of the preceding iteration; and
updating, based on the network parameter, the regularization factor to obtain an updated regularized factor,
wherein adjusting the learning rate in the plurality of iterations comprises repeatedly updating the network parameter based on a variation of a generalization loss during the plurality of iterations,
wherein, the updated regularized factor of a current iteration is used for adjusting the learning rate in a subsequent iteration of the plurality of iterations.
|