US 11,720,808 B2
Feature removal framework to streamline machine learning
Yen-Jung Chang, Santa Clara, CA (US); Yunsong Meng, Cupertino, CA (US); Tie Wang, Cupertino, CA (US); Yang Yang, Fremont, CA (US); Bo Long, Palo Alto, CA (US); Boyi Chen, Santa Clara, CA (US); Yanbin Jiang, Sunnyvale, CA (US); and Zheng Li, San Jose, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 28, 2020, as Appl. No. 16/886,316.
Prior Publication US 2021/0374562 A1, Dec. 2, 2021
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
creating a baseline version of a machine learning model by training the machine learning model with a first set of feature values for a first set of features;
deploying the baseline version in an environment, wherein the deployed baseline version outputs a first set of entity rankings;
creating a second set of feature values for the first set of features by replacing at least one first value of the first set of feature values with at least one second value different from the at least one first value, wherein the at least one second value is configured to minimize a contribution of the first set of features to output of the machine learning model;
generating a second set of entity rankings by applying the baseline version to the second set of feature values;
computing a similarity score between the first set of entity rankings and the second set of entity rankings;
identifying a set of high-importance features based on the similarity score, wherein the set of high-importance features comprises a subset of the first set of features;
creating a simplified version of the machine learning model by training the machine learning model with the set of high-importance features instead of the first set of features;
determining a resource overhead for the simplified version by executing the simplified version in the environment or in a replica of the environment; and
in response to the resource overhead satisfying a target resource overhead, replacing at least a portion of the deployed baseline version with the simplified version in the environment, wherein the simplified version outputs a second set of entity rankings different from the first set of entity rankings.