US 11,853,392 B2
Providing reduced training data for training a machine learning model
Lukasz G Cmielowski, Cracow (PL); Amadeusz Masny, Bialystok (PL); Daniel Jakub Ryszka, Cracow (PL); and Wojciech Sobala, Cracow (PL)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Nov. 30, 2021, as Appl. No. 17/538,361.
Prior Publication US 2023/0169148 A1, Jun. 1, 2023
Int. Cl. G06F 18/214 (2023.01); G06F 18/211 (2023.01); G06F 18/2433 (2023.01); G06N 20/00 (2019.01)
CPC G06F 18/2148 (2023.01) [G06F 18/211 (2023.01); G06F 18/2433 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
reading a first batch of data records and a second batch of data records of an original training data;
generating a reduced training data for training a machine learning model (ML-model) using a computing device, wherein the computing device comprises a limited storage capacity for storing the reduced training data, and wherein the reduced training data is dependent at least on the first batch and the second batch of the data records of the original training data;
reading a further batch of the data records of the original training data;
generating an updated version of the reduced training data dependent on the reduced training data and the further batch, wherein a size of the updated version of the reduced training data is less than a size of the limited storage capacity of the computing device and less than a combined size of the reduced training data and the further batch;
repeating the reading on another further batch of the data records of the original training data;
repeating the generating of the updated version of the reduced training data based on the another further batch of the data records of the original training data; and
providing the updated version of the reduced training data for the training of the ML-model, wherein the size of the updated version of the reduced training data is smaller than a size of the original training data.