US 12,248,863 B2
	Identifying a finding in a dataset using a machine learning model ensemble
Aviad Zlotnick, Mitzpeh Netofah (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Feb. 4, 2021, as Appl. No. 17/167,113.
Prior Publication US 2022/0245427 A1, Aug. 4, 2022
Int. Cl. G06N 3/045 (2023.01); G06N 3/02 (2006.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06N 20/20 (2019.01)

CPC G06N 3/045 (2023.01) [G06N 3/02 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06N 20/20 (2019.01)]

19 Claims

1. A computer implemented method of identifying a finding in an input sub-dataset of an input dataset using an ensemble of machine learning (ML) models, comprising:

obtaining outcomes of an ensemble of a plurality of ML models generated in response to feeding each of a plurality of input sub-datasets of the input dataset into each ML model of the ensemble of the plurality of ML models trained to generate an outcome indicative of likelihood of a finding depicted in a respective input sub-dataset;

creating a two dimensional (2D) outcome dataset storing outcomes of the ensemble of the plurality of ML models, wherein a first dimension denotes the respective ML model of the ensemble of ML models and a second dimension denotes the respective input sub-dataset of the plurality of input sub-datasets of the input dataset, wherein ML models of the ensemble of the plurality of ML models are clustered into a plurality of clusters, each cluster including ML models of a same architecture, wherein clusters of architecture that have increasing statistical distance between the different architectures are arranged consecutively along the first dimension of the 2D outcome dataset, and within each cluster, ML models of the respective cluster that have increasing statistical distances between meta-parameters of the same architecture are arranged consecutively;

computing a filtered 2D dataset by applying a filter function to the 2D outcome dataset; and

analyzing the filtered 2D dataset to identify at least one specific input sub-dataset of the plurality of input sub-datasets of the input dataset likely depicting the finding.