US 12,265,920 B2
	Code-free automated machine learning
Jonas Mueller, San Francisco, CA (US); Tatsuya Arai, Los Altos Hills, CA (US); Abhi Vinayaka Sharma, Mountain View, CA (US); Ryan Matthew Brand, New York, NY (US); Yohei Nakayama, San Francisco, CA (US); Nick Dean Erickson, Seattle, WA (US); Hang Zhang, San Carlos, CA (US); Mu Li, Union City, CA (US); and Alexander Johannes Smola, Sunnyvale, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Apr. 15, 2020, as Appl. No. 16/849,537.
Prior Publication US 2021/0326717 A1, Oct. 21, 2021
Int. Cl. H04L 29/06 (2006.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01)

CPC G06N 5/04 (2013.01) [G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method comprising:

deploying, within a multi-tenant service provider network, an object storage location for an account associated with a user;

deploying, within the multi-tenant service provider network, a machine learning (ML) orchestrator associated with the account;

detecting, by the ML orchestrator, that a training dataset comprising a plurality of columns of training data has been stored at the object storage location;

determining a target variable to infer corresponding to a particular column of the plurality of columns based on an identifier of the particular column in the training dataset or based on a user selection of the particular column, wherein each column of the plurality of columns comprises a respective set of values;

identifying a plurality of ML pipelines to evaluate; wherein each ML pipeline of the plurality of ML pipelines comprises a respective set of one or more preprocessing operations to apply to the training dataset;

evaluating the plurality of ML pipelines including: (a) for each ML pipeline of the plurality of ML pipeline, applying the respective set of one or more preprocessing operations to the training dataset, and (b) training, using a ML training service of the multi-tenant service provider network, a plurality of ML models corresponding to the plurality of ML pipelines to infer the target variable, wherein the plurality of ML models are generated according to a plurality of ML algorithms of the plurality of ML pipelines;

deploying at least one of the plurality of ML models via a ML hosting service of the multi-tenant service provider network;

utilizing the at least one of the plurality of ML models to generate one or more inferences; and

storing the one or more inferences at the object storage location.