US 12,292,944 B2
	Loss function optimization using Taylor series expansion
Santiago Gonzalez, Denver, CO (US); and Risto Miikkulainen, Stanford, CA (US)
Assigned to Cognizant Technology Solutions U.S. Corp., College Station, TX (US)
Filed by Cognizant Technology Solutions U.S. Corporation, College Station, TX (US)
Filed on Sep. 14, 2020, as Appl. No. 17/019,766.
Claims priority of provisional application 62/902,458, filed on Sep. 19, 2019.
Prior Publication US 2021/0089832 A1, Mar. 25, 2021
Int. Cl. G06F 17/11 (2006.01); G06F 17/16 (2006.01); G06F 17/18 (2006.01); G06F 18/10 (2023.01); G06F 18/21 (2023.01); G06N 3/08 (2023.01); G06N 7/01 (2023.01); G06V 10/72 (2022.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)

CPC G06F 17/11 (2013.01) [G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06F 18/10 (2023.01); G06F 18/217 (2023.01); G06N 3/08 (2013.01); G06N 7/01 (2023.01); G06V 10/72 (2022.01); G06V 10/764 (2022.01); G06V 10/776 (2022.01); G06V 10/82 (2022.01)]

9 Claims

1. A process for optimizing a loss function to train a convolutional neural network (CNN) model for solving a classification problem comprising:

(i) providing an initial mean solution vector to a multi-dimensional continuous value optimization process running on one or more processors;

(ii) generating a set of candidate loss function parameters using the initial mean solution vector for use in building a first set of candidate loss functions in accordance with a predetermined loss function representation;

wherein the predetermined loss function representation is a multivariate Taylor expansion;

wherein the multivariate Taylor expansion is

(iii) evaluating each of the candidate loss functions in the first set of candidate loss function with the CNN model including:

(a) building each of the first candidate loss functions using the initial set of candidate loss function parameters;

(b) at least partially training the CNN model on a training data set related to the classification problem using each of the first candidate loss functions;

(c) evaluating the CNN model trained with each of the first candidate loss functions on a validation data set related to the classification problem;

(d) obtaining individual fitness values for each of the first candidate loss functions from the evaluation in (c);

(iv) ranking each of the first candidate loss functions in accordance with individual fitness values, wherein each of the first candidate loss functions includes a different set of candidate loss function parameters;

(v) repeating steps (ii) to (iv) for multiple generations to optimize the loss function for training the CNN model solving the classification problem, including replacing the initial mean vector solution with a new mean vector solution derived from a ranked first candidate loss function in accordance with fitness value; and

(vi) selecting an optimized loss function for training the CNN model solving the classification problem at a predetermined selection point,

wherein the selected optimized loss function trains the CNN model to solve the classification problem in fewer steps and with higher accuracy than a process for training the CNN model using cross-entropy loss or BaikalCMA loss function.