US 12,288,140 B2
Classifier assistance using domain-trained embedding
Soyoung Peraud, Redmond, WA (US); Alexandre Rochette, Montreal (CA); Gabriel Arien Desgarennes, Issaquah, WA (US); Niel Chah, Toronto (CA); Abhishek Kumar, Redmond, WA (US); and Timothy James Hazen, Arlington, MA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 15, 2021, as Appl. No. 17/150,524.
Prior Publication US 2022/0230089 A1, Jul. 21, 2022
Int. Cl. G06N 3/09 (2023.01); G06F 18/22 (2023.01); G06F 18/23 (2023.01); G06F 18/2411 (2023.01); G06F 18/2431 (2023.01); G06N 3/084 (2023.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 18/22 (2023.01); G06F 18/23 (2023.01); G06F 18/2411 (2023.01); G06F 18/2431 (2023.01); G06N 3/084 (2013.01); G06N 3/09 (2023.01)] 20 Claims
OG exemplary drawing
 
12. A method for training a classifier, comprising:
receiving embeddings produced from a natural language model;
receiving raw datasets and seed datasets; and
repeating one or more iterations of labeling epochs until a first condition is met, the labeling epochs including:
generating raw vectors corresponding to the raw datasets and seed vectors corresponding to the seed datasets, based on the embeddings;
assigning pseudo class labels to the raw datasets based on distances between the raw vectors and the seed vectors; and
repeating one or more iterations of classification epochs until a second condition is met, the classification epochs including:
updating the embeddings by performing classification tasks using the seed vectors and the raw vectors corresponding to the pseudo class labels which are assumed as ground-truth labels.