US 12,111,881 B2
	Item recommendation with application to automated artificial intelligence
Nico Stephan Gorbach, Kilchberg (CH); Adelmo Cristiano Innocenza Malossi, Schönenberg (CH); and Andrea Bartezzaghi, Rueschlikon (CH)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 1, 2020, as Appl. No. 17/108,107.
Prior Publication US 2022/0171985 A1, Jun. 2, 2022
Int. Cl. G06F 17/00 (2019.01); G06F 17/18 (2006.01); G06F 18/2113 (2023.01); G06F 18/22 (2023.01); G06F 18/232 (2023.01); G06N 20/00 (2019.01)

CPC G06F 18/2113 (2023.01) [G06F 17/18 (2013.01); G06F 18/22 (2023.01); G06F 18/232 (2023.01); G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method for selecting preferred machine learning pipelines for processing new datasets, the method comprising:

for a plurality of machine learning pipelines and a plurality N of datasets previously-processed by the pipelines, storing a plurality of rating values, each rating value corresponding to a performance of a pipeline of the plurality of machine learning pipelines and for a dataset of the plurality N of datasets;

for each pair u_{i=1 to N}, u_{j=1 to N}, i±j of the plurality N of previously-processed datasets, determining a distance d_i,jfrom u_ito u_jin a latent space, wherein the distance d_i,jcorresponds to an expected value of a regret incurred when the pipeline, selected in a predetermined manner based on a set of rating values for a dataset u_j, is rated for a performance of the selected pipeline for a dataset u_i, and wherein the regret for the selected pipeline includes a monotonically decreasing function of the rating value for the performance of the pipeline for the dataset u_i,

clustering the plurality N of previously-processed datasets in the latent space and identifying a representative dataset in each cluster for which each distance to the dataset from other datasets in the cluster is minimized over the cluster;

in response to receiving a new dataset, selecting a set of preferred pipelines from the plurality of machine learning pipelines for processing the new dataset, each preferred pipeline being selected according to a set of rating values for the representative dataset; and

processing the new dataset in the set of preferred pipelines.