US 12,468,931 B2
Configuring a neural network using smoothing splines
Ulrich Alfons Finkler, Mahopac, NY (US); Michele Merler, New York City, NY (US); Mayoore Selvarasa Jaiswal, Austin, TX (US); Hui Wu, White Plains, NC (US); Rameswar Panda, Medford, MA (US); and Wei Zhang, Elmsford, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 21, 2020, as Appl. No. 17/075,963.
Prior Publication US 2022/0121924 A1, Apr. 21, 2022
Int. Cl. G06F 18/211 (2023.01); G06F 18/21 (2023.01); G06N 3/08 (2023.01); G06N 7/01 (2023.01)
CPC G06N 3/08 (2013.01) [G06F 18/211 (2023.01); G06F 18/217 (2023.01); G06N 7/01 (2023.01)] 13 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
identifying a first set of support points comprising an initial plurality of sets of hyperparameter values at which to evaluate an objective function having coefficients that relate hyperparameter values of a neural network to respective performance values, wherein the respective performance values are representative of an error rate;
wherein the hyperparameter values comprise at least one of a dropout rate, a weight norm, a hidden layer size, a convolutional kernel size, a pooling size;
wherein the neural network is a convolutional neural network;
executing initial training processes on the neural network with hyperparameters set to the initial plurality of sets of hyperparameter values such that the initial training processes provide an initial set of the performance values, one for each set of hyperparameter values, for the objective function;
generating an approximation of the objective function using splines that pass through the first set of support points at selected performance values;
calculating a coefficient for the approximation of the objective function using QR decomposition, wherein the QR decomposition used for calculating the coefficient for the approximation of the objective function comprises a householder QR decomposition; approximating a point at which the approximation of the objective function reaches a maximum value using a hierarchical Monte Carlo technique;
determining an updated set of support points comprising an updated set of hyperparameter values associated with the maximum value, wherein the updated set of hyperparameter values comprises a value of a network subcell layout parameter; and
executing an updated training process on the neural network with hyperparameters set to the updated set of hyperparameter values.