US 12,456,070 B2
System-level control using tree-based regression with outlier removal
Dung Tien Phan, Pleasantville, NY (US); Pavankumar Murali, Ardsley, NY (US); and Lam Nguyen, Ossining, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Aug. 20, 2020, as Appl. No. 16/998,748.
Prior Publication US 2022/0058515 A1, Feb. 24, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 18/243 (2023.01); G06F 18/2433 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 18/24323 (2023.01); G06F 18/2433 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, using a processor, input data comprising time-series data; and
simultaneously training, using a binary mixed-integer linear program of the processor, a network of optimal decision trees (“ODTs”) for regression based on the input data, the network of ODTs configured such that each ODT of the network of ODTs comprises at least one of an upstream ODT and a downstream ODT, wherein an output of an upstream ODT is coupled to an input of a downstream ODT;
wherein during the training of each respective downstream ODT:
a sample, output from a respective upstream ODT, is classified as either an outlier or a point in a distribution according to a minimizing of a nonlinear loss function in which training loss and outlier loss are minimized together, the nonlinear loss function determined according to the following formula:

OG Complex Work Unit Math
where zi∈{0, 1} is a selection variable for deciding whether a sample (xi, yi) will be removed or not, α>0 is a weighting parameter to balance between the training error zi(cTxi−yi)2 and the outlier loss

OG Complex Work Unit Math
n is a total number of samples, cT is a learned model parameter for a linear regression at a leaf node of the respective ODT, and T represents the transpose of c; and
each sample classified as an outlier is removed from the respective input of the respective downstream ODT, thereby training the respective downstream ODT only on samples that do not contain any outliers; and
controlling a set point for a manufacturing process undergoing an upset condition using the trained network of ODTs, wherein characterization factors of an underlying decision tree of the network of ODTs are given by branching hyperplanes at each branch node and linear regressions at each leaf node.