| CPC G06F 40/30 (2020.01) [G06N 3/045 (2023.01); G06N 5/02 (2013.01); G06N 3/082 (2013.01)] | 18 Claims |

|
1. A method for creating a student model from a teacher model for use in knowledge distillation, the method comprising:
providing the teacher model, wherein:
the teacher model comprises a neural network having a plurality of layers; and
the teacher model is trained on a first training dataset;
generating candidate student models, each of the candidate student models comprising a model having a unique permutation of layers derived by randomly selecting one or more layers of the plurality of layers of the teacher model for removing;
generating a second training dataset, the second training dataset comprising a randomly selected data from the first training dataset;
for each of the candidate student models:
providing the second training dataset as inputs to the candidate student model;
recording outputs generated by the candidate student model from the second training data set; and
based on the recorded outputs, evaluating a performance of the candidate student model according to a predetermined model evaluation criterion;
determining which of the candidate student models performed best among the candidate student models based on the predetermined model evaluation criterion; and
identifying a preferred candidate student model as being the candidate student model that performed best.
|