US 12,333,254 B2
	Systems and methods relating to knowledge distillation in natural language processing models
Pavan Buduguppa, Hyderabad (IN); Ramasubramanian Sundaram, Hyderabad (IN); and Veera Raghavendra Elluru, Hyderabad (IN)
Assigned to Genesys Cloud Services, Inc., Menlo Park, CA (US)
Filed by GENESYS CLOUD SERVICES, INC., Daly City, CA (US)
Filed on Dec. 21, 2021, as Appl. No. 17/557,245.
Prior Publication US 2023/0196024 A1, Jun. 22, 2023
Int. Cl. G06N 3/082 (2023.01); G06F 40/30 (2020.01); G06N 3/045 (2023.01); G06N 5/02 (2023.01)

CPC G06F 40/30 (2020.01) [G06N 3/045 (2023.01); G06N 5/02 (2013.01); G06N 3/082 (2013.01)]

18 Claims

1. A method for creating a student model from a teacher model for use in knowledge distillation, the method comprising:

providing the teacher model, wherein:

the teacher model comprises a neural network having a plurality of layers; and

the teacher model is trained on a first training dataset;

generating candidate student models, each of the candidate student models comprising a model having a unique permutation of layers derived by randomly selecting one or more layers of the plurality of layers of the teacher model for removing;

generating a second training dataset, the second training dataset comprising a randomly selected data from the first training dataset;

for each of the candidate student models:

providing the second training dataset as inputs to the candidate student model;

recording outputs generated by the candidate student model from the second training data set; and

based on the recorded outputs, evaluating a performance of the candidate student model according to a predetermined model evaluation criterion;

determining which of the candidate student models performed best among the candidate student models based on the predetermined model evaluation criterion; and

identifying a preferred candidate student model as being the candidate student model that performed best.