US 11,941,523 B2
	Stochastic gradient boosting for deep neural networks
Oluwatobi Olabiyi, Arlington, VA (US); Erik T. Mueller, Chevy Chase, MD (US); and Christopher Larson, Washington, DC (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Apr. 16, 2021, as Appl. No. 17/232,968.
Application 17/232,968 is a continuation of application No. 16/293,047, filed on Mar. 5, 2019, granted, now 10,990,878, issued on Apr. 27, 2021.
Application 16/293,047 is a continuation of application No. 16/276,306, filed on Feb. 14, 2019, granted, now 10,510,002, issued on Dec. 17, 2019.
Prior Publication US 2021/0232925 A1, Jul. 29, 2021
Int. Cl. G06N 3/08 (2023.01); G06N 3/047 (2023.01); G06N 20/00 (2019.01)

CPC G06N 3/08 (2013.01) [G06N 3/047 (2023.01); G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method comprising:

initializing a deep neural network model to include an input layer, an output layer, a plurality of hidden layers, and a plurality of model parameters;

training, based on a training set that comprises a plurality of examples, the deep neural network model by

performing a plurality of first iterations that modifies the plurality of model parameters based on a first loss function, wherein the first loss function comprises a negative log-likelihood of the plurality of model parameters, wherein each respective first iteration of the plurality of first iterations is associated with a respective first mini-batch from the plurality of examples, wherein each respective first iteration of the plurality of first iterations is performed based on minimizing an average gradient of the first loss function for each example in the respective first mini-batch;

after performing the plurality of first iterations, performing a plurality of second iterations that adjusts the plurality of model parameters based on a second loss function that applies different weighting to examples based on whether the deep neural network classifies an example correctly or incorrectly, wherein the second loss function comprises a weighted negative log-likelihood of the plurality of model parameters, wherein each respective second iteration of the plurality of second iterations is associated with a respective second mini-batch from the plurality of examples, wherein each respective second iteration of the plurality of second iterations is performed based on minimizing an average gradient of the second loss function for each example in the respective second mini-batch and by giving zero weight to any example in the respective second mini-batch that is correctly classified by the deep neural network model,

wherein, for at least one second iteration of the plurality of second iterations, at least one example of the second mini-batch for the at least one second iteration is correctly classified by the deep neural network model, the at least one example is given zero weight based on being correctly classified, and the average gradient of the second loss function is determined based on a size of the second mini-batch that includes the at least one example, and

determining that the deep neural network model, after performing the plurality of first iterations and the plurality of second iterations, satisfies one or more stopping criteria; and

after the training, generating, based on an input data set, one or more predictions using the deep neural network model.