US 11,727,274 B1
	Deep learning model training system
Jarad Forristal, Marfa, TX (US); Joshua David Griffin, Harrisburg, NC (US); Seyedalireza Yektamaram, Rotterdam (NL); and Wenwen Zhou, Cary, NC (US)
Assigned to SAS Institute Inc., Cary, NC (US)
Filed by SAS Institute Inc., Cary, NC (US)
Filed on Aug. 17, 2022, as Appl. No. 17/820,342.
Claims priority of provisional application 63/344,501, filed on May 20, 2022.
Claims priority of provisional application 63/331,751, filed on Apr. 15, 2022.
Int. Cl. G06N 3/08 (2023.01); G06F 18/211 (2023.01)

CPC G06N 3/08 (2013.01) [G06F 18/211 (2023.01)]

30 Claims

1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to:

(A) select a batch of observation vectors, wherein the batch of observation vectors includes a mini-batch number of observation vectors selected from a plurality of observation vectors, wherein each observation vector of the plurality of observation vectors includes a value for each variable of a plurality of variables;

(B) execute a neural network with a weight vector to compute a gradient vector using the selected batch of observation vectors, wherein the neural network includes a layer type for each layer of a plurality of neural network layers, wherein the weight vector includes a weight value defined for each neuron of the neural network, wherein the gradient vector includes a gradient value defined for each neuron of the neural network;

(D) compute a regularization parameter value using the computed gradient vector, the computed vector of eigenvalues, and a step-size value;

(E) compute a search direction vector using the computed vector of eigenvalues, the computed gradient vector, the Hessian approximation matrix, and the computed regularization parameter value;

(F) compute a reduction ratio value as a ratio of a decrease in an objective function value from the execution of the neural network to a predicted decrease in the objective function value from the execution of the neural network;

(G) compute an updated weight vector for each neuron of the neural network from the weight vector, a predefined learning rate value, and either the computed search direction vector or the computed gradient vector based on the computed reduction ratio value;

(H) compute an updated Hessian approximation matrix from the Hessian approximation matrix, the predefined learning rate value, and either the computed search direction vector or the computed gradient vector based on the computed reduction ratio value, wherein, when ρ≥η₁, the updated Hessian approximation matrix is computed using the computed search direction vector and a predefined first learning rate value, and when ρ<η₁, the updated Hessian approximation matrix is computed using the computed gradient vector and a predefined second learning rate value, where ρ indicates the computed reduction ratio value, and η₁indicates a predefined step success value;

(I) update the step-size value using the computed search direction vector;

(J) repeat (A) through (I) until a stop criterion is satisfied, wherein, on a next iteration of (A) through (I), the step-size value is the updated step-size value, and the weight vector is the computed, updated weight vector; and

output the computed, updated weight vector to describe a trained neural network model.