US 12,067,462 B2
Model training framework
Eric Mcvoy Dodds, Berkeley, CA (US); and Huy Xuan Nguyen, Dublin, CA (US)
Assigned to Yahoo Assets LLC, New York, NY (US)
Filed by Oath Inc., New York, NY (US)
Filed on Aug. 15, 2019, as Appl. No. 16/541,751.
Prior Publication US 2021/0049500 A1, Feb. 18, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 11/36 (2006.01); G06F 17/18 (2006.01)
CPC G06N 20/00 (2019.01) [G06F 11/3664 (2013.01); G06F 17/18 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
executing, on a processor of a computing device, instructions that cause the computing device to perform operations, the operations comprising:
receiving, by a model training framework, a definition of a first model of a first type and a configuration of the first model input by a user;
receiving, by the model training framework, a second definition of a second model of a second type and a second configuration of the second model input by the user;
setting up, by the model training framework and without receiving custom code for training the first model from the user, first computations that the first model will perform during training of the first model based upon the definition and the configuration input by the user;
setting up, by the model training framework and without receiving custom code for training the second model from the user, second computations that the second model will perform during training of the second model based upon the second definition and the second configuration input by the user;
specifying, by the model training framework, summary statistics to be tracked during the training of the first model;
inputting, by the model training framework, a batch of training data into the first model for processing using the first computations to train the first model based upon hyper parameters specified in the configuration of the first model, wherein the summary statistics are tracked during the training, wherein the first computations are spread by the model training framework across a plurality of processing units and outputs by the plurality of processing units are aggregated to determine an output of the first model during the training;
updating, by the model training framework, parameters of the first model based upon a function corresponding to accuracy of the first model processing the training data;
outputting the summary statistics, wherein the summary statistics comprise a training loss function as the function used to update parameters of the first model during the training, a value of regularization loss added to the parameters during training, and a learning rate;
performing a plurality of training iterations to train the first model using batches of the training data;
generating a first checkpoint during the plurality of training iterations, wherein the first checkpoint comprises progress of the first model being trained during a training iteration;
generating a second checkpoint based upon receiving an exit command during the training, wherein the second checkpoint comprises progress of the first model being trained, and wherein the second checkpoint is used to restart the training of the first model from the second checkpoint;
entering into a debug mode during the training based upon the configuration indicating that the debug mode is to be activated; and
saving a record of the first model, the configuration of the first model, checkpoints created during the training, a training batch size, an initial learning rate value, a decay learning rate, and parameters of the first model into at least one of a first structure having serialized machine readable format and a second structure having textual human readable format.