US 11,775,757 B2
Automated machine-learning dataset preparation
Willie Robert Patten, Jr., Hurdle Mills, NC (US); Eugene Irving Kelton, Mechanicsburg, PA (US); Arvin Bhatnagar, Cary, NC (US); Jason Howard Cornpropst, Raleigh, NC (US); and Jacob McPherson, Franklinton, NC (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 4, 2020, as Appl. No. 16/865,621.
Prior Publication US 2021/0342640 A1, Nov. 4, 2021
Int. Cl. G06F 40/279 (2020.01); G06N 20/00 (2019.01); G06F 40/205 (2020.01); G06F 16/901 (2019.01); G06F 18/214 (2023.01); G06F 18/21 (2023.01)
CPC G06F 40/279 (2020.01) [G06F 16/9024 (2019.01); G06F 18/2148 (2023.01); G06F 18/2193 (2023.01); G06F 40/205 (2020.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method of preparing a dataset for ingestion by a machine-learning model, the method comprising:
calculating a pattern relevance for a first field in the dataset;
modifying the first field based on the pattern relevance;
detecting a contextual cue in the first field;
retrieving contextual information for a value in the first field;
adding the contextual information to the dataset;
identifying a numerical scheme for the first field; and
parsing the first field into a number according to the numerical scheme.