| CPC G06F 17/11 (2013.01) [G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06F 18/10 (2023.01); G06F 18/217 (2023.01); G06N 3/08 (2013.01); G06N 7/01 (2023.01); G06V 10/72 (2022.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)] | 9 Claims |

|
1. A process for optimizing a loss function to train a convolutional neural network (CNN) model for solving a classification problem comprising:
(i) providing an initial mean solution vector to a multi-dimensional continuous value optimization process running on one or more processors;
(ii) generating a set of candidate loss function parameters using the initial mean solution vector for use in building a first set of candidate loss functions in accordance with a predetermined loss function representation;
wherein the predetermined loss function representation is a multivariate Taylor expansion;
wherein the multivariate Taylor expansion is
![]() (iii) evaluating each of the candidate loss functions in the first set of candidate loss function with the CNN model including:
(a) building each of the first candidate loss functions using the initial set of candidate loss function parameters;
(b) at least partially training the CNN model on a training data set related to the classification problem using each of the first candidate loss functions;
(c) evaluating the CNN model trained with each of the first candidate loss functions on a validation data set related to the classification problem;
(d) obtaining individual fitness values for each of the first candidate loss functions from the evaluation in (c);
(iv) ranking each of the first candidate loss functions in accordance with individual fitness values, wherein each of the first candidate loss functions includes a different set of candidate loss function parameters;
(v) repeating steps (ii) to (iv) for multiple generations to optimize the loss function for training the CNN model solving the classification problem, including replacing the initial mean vector solution with a new mean vector solution derived from a ranked first candidate loss function in accordance with fitness value; and
(vi) selecting an optimized loss function for training the CNN model solving the classification problem at a predetermined selection point,
wherein the selected optimized loss function trains the CNN model to solve the classification problem in fewer steps and with higher accuracy than a process for training the CNN model using cross-entropy loss or BaikalCMA loss function.
|