| CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01)] | 20 Claims |

|
1. A method for tuning one or more hyperparameters of a large neural network model, wherein the large neural network model comprising an infinitely-wide neural network model, the method comprising:
receiving the large neural network model;
parameterizing the large neural network model according to a parameterization scheme, wherein when a transformer is used in the large neural network model, the parameterization scheme comprises a dot-product attention logit scaler hyperparameter;
reducing a width of at least one layer of the large neural network model resulting in a smaller neural network model, the smaller neural network model comprising at least a reduced width of one or more layers of the infinitely-wide neural network model;
performing a hyperparameter tuning process using the smaller neural network model to identify a tuned hyperparameter, wherein a model scaling process is based on an estimated amount of computational resources, energy, and/or time is used to tune the hyperparameter, wherein the hyperparameter is tuned using a logit scaling parameter;
identifying the optimized tuple of hyperparameters based on the tuning process;
adjusting the tuning process based on the optimized tuple of hyperparameters, wherein the optimized tuple minimizes a predetermined loss function;
returning the smaller neural network based on using the adjusted tuning process; and
transferring the tuned hyperparameter to the large neural network model using an identified scaling factor.
|