US 11,989,657 B2
	Automated machine learning pipeline for timeseries datasets utilizing point-based algorithms
Nikan Chavoshi, Redwood City, CA (US); Anatoly Yakovlev, Hayward, CA (US); Hesam Fathi Moghadam, Sunnyvale, CA (US); Venkatanathan Varadarajan, Seattle, WA (US); Sandeep Agrawal, San Jose, CA (US); Ali Moharrer, Belmont, CA (US); Jingxiao Cai, Fremont, CA (US); Sanjay Jinturkar, Santa Clara, CA (US); and Nipun Agarwal, Saratoga, CA (US)
Assigned to Oracle International Corporation, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Oct. 15, 2020, as Appl. No. 17/071,285.
Prior Publication US 2022/0121955 A1, Apr. 21, 2022
Int. Cl. G06N 20/00 (2019.01); G06N 3/088 (2023.01)

CPC G06N 3/088 (2013.01) [G06N 20/00 (2019.01)]

20 Claims

1. A method comprising:

a) for each preprocessor configuration of a plurality of configurations of a window preprocessor for a training timeseries dataset for a machine learning model:

1) the window preprocessor performing based on said preprocessor configuration:

i) partitioning the training timeseries dataset into one or more sequences of windows, wherein each sequence of said one or more sequences of windows has a distinct window size;

ii) converting the training timeseries dataset into a configuration-specific point-based dataset that is based on a statistical summary operator of said preprocessor configuration, including performing for each window in each sequence of said one or more sequences of windows:

generating, for the window, a feature vector in the configuration-specific point-based dataset, and

the statistical summary operator calculating a value in the feature vector;

2) training the machine learning model based on said configuration-specific point-based dataset; and

3) calculating a score for said preprocessor configuration based on said training based on said configuration-specific point-based dataset;

b) selecting, based on said scores of the plurality of configurations of the window preprocessor, an optimal preprocessor configuration from the plurality of configurations; and

c) converting, by the window preprocessor and based on the optimal preprocessor configuration, a new timeseries dataset.