US 12,223,012 B2
	Machine learning variable selection and root cause discovery by cumulative prediction
Richard Burch, McKinney, TX (US); Qing Zhu, Rowlett, TX (US); Jonathan Holt, Sachse, TX (US); and Tomonori Honda, Santa Clara, CA (US)
Assigned to PDF Solutions, Inc., Santa Clara, CA (US)
Filed by PDF Solutions, Inc., Santa Clara, CA (US)
Filed on Oct. 16, 2020, as Appl. No. 17/072,830.
Claims priority of provisional application 62/916,171, filed on Oct. 16, 2019.
Prior Publication US 2021/0117861 A1, Apr. 22, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 17/18 (2006.01); G06F 18/2113 (2023.01); G06N 5/04 (2023.01)

CPC G06F 18/2113 (2023.01) [G06F 17/18 (2013.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01)]

6 Claims

1. A computer-implemented method for predicting yield for semiconductor devices in a semiconductor process, wherein a plurality of process parameters are associated with the yield of semiconductor devices, comprising:

configuring and training a machine learning model to predict yield based on an input data set having a selected plurality of the process parameters;

selecting a first parameter of the plurality of process parameters for the input data set, providing the input data set to the machine learning model, predicting yield based on the input data set, and determining a first r-squared value for a first prediction by the machine learning model based on the input data set;

selecting a second parameter of the plurality of process parameters and adding the second parameter to the input data set, providing the input data set to the machine learning model, predicting yield based on the input data set, and determining a second r-squared value for a second prediction by the machine learning model based on the input data set;

repeating a step of selecting an additional parameter of the plurality of process parameters, adding the additional parameter to the input data set, providing the input data set to the machine learning model, predicting yield based on the input data set, and determining an additional r-squared value for another prediction by the machine learning model based on the input data set;

accumulating all determined r-squared values until the accumulation increases by less than a threshold value;

ranking the plurality of selected parameters on the basis of respective r-squared values; and

identifying as inputs to include or exclude from the machine learning model on the basis of the ranking of selected parameters.