US 12,248,446 B2
Data gap mitigation
Nianjun Zhou, Chappaqua, NY (US); Dhavalkumar C. Patel, White Plains, NY (US); Emmanuel Yashchin, Yorktown Heights, NY (US); Arun Kwangil Iyengar, Yorktown Heights, NY (US); Shrey Shrivastava, White Plains, NY (US); and Anuradha Bhamidipaty, Yorktown Heights, NY (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 3, 2022, as Appl. No. 17/979,830.
Prior Publication US 2024/0152492 A1, May 9, 2024
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/23 (2019.01); G06F 16/2457 (2019.01); G06F 16/2458 (2019.01); G06F 17/18 (2006.01); G06F 18/15 (2023.01)
CPC G06F 16/215 (2019.01) [G06F 16/2365 (2019.01); G06F 16/24578 (2019.01); G06F 16/2462 (2019.01); G06F 17/18 (2013.01); G06F 18/15 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A method of facilitating processing within a computing environment, the method, comprising:
providing a data gap mitigation system to interact with other components of the computing environment across a network, the data gap mitigation system to enhance data quality for a machine learning system of the computing environment;
operatively connecting the data gap mitigation system to the network, and to the machine learning system across the network;
receiving configuration data at the data gap mitigation system across the network from a client device;
running the data gap mitigation system using the configuration data to enhance a measurement dataset obtained from one or more sensors, the running the data gap mitigation system comprising:
executing each imputer algorithm of a plurality of imputer algorithms specified in the configuration data received from the client device, the executing including for each imputer algorithm:
applying the executing imputer algorithm to an imputer evaluation dataset (IED) to obtain an imputer algorithm output (IAO) dataset, the imputer evaluation dataset having been obtained from an imputer candidate dataset (ICD) by removing known values from the imputer candidate dataset, the imputer candidate dataset including a complete dataset that is representative of a data range, the data range being within the measurement dataset and being incomplete;
generating an imputer evaluation metric for the executing imputer algorithm based on a comparison between the imputer candidate dataset (ICD) and the imputer algorithm output (IAO) dataset;
ranking each executed imputer algorithm of the plurality of imputer algorithms based on the imputer evaluation metric of each executed imputer algorithm; and
generating a complete data range by executing a highest ranked imputer algorithm of the plurality of imputer algorithms with the data range to provide the machine learning system of the computing environment with a complete data range for enhanced data analysis; and
sending the complete data range across the network to the machine learning system to apply the complete data range to the machine learning system to enhance output data quality of the machine learning system from the measurement dataset obtained from the one or more sensors.