US 11,928,567 B2
System and method for improving machine learning models by detecting and removing inaccurate training data
Oren Elisha, Hertzeliya (IL); Ami Luttwak, Binyamina (IL); Hila Yehuda, Tel Aviv (IL); Adar Kahana, Natanya (IL); and Maya Bechler-Speicher, Tel Aviv (IL)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 17, 2023, as Appl. No. 18/185,679.
Application 18/185,679 is a continuation of application No. 16/795,353, filed on Feb. 19, 2020, granted, now 11,636,389.
Prior Publication US 2023/0229973 A1, Jul. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. H04L 9/40 (2022.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) [G06N 5/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving a training set comprising training samples associated with a first category to train a first machine learning (ML) model;
generating a category score for the first category based on a total variance of the training samples associated with the first category;
identifying the first category as a suspect category subject to removal in response to the category score indicating excessive variance of the training samples associated with the first category relative to a threshold variance score; and
improving prediction accuracy of the first ML model by revising the training set by at least one of
refining the training samples associated with the first category, or
eliminating the training samples of the first category from the training set.