US 12,242,437 B2
Automated determination of accurate data schema
Yao Dong Liu, Xian (CN); Jiang Bo Kang, Xian (CN); Jun Wang, Xian (CN); Dong Hai Yu, Xian (CN); and Song Bo, Xian (CN)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jul. 9, 2021, as Appl. No. 17/371,508.
Prior Publication US 2023/0010147 A1, Jan. 12, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/21 (2019.01); G06F 16/23 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/213 (2019.01) [G06F 16/2365 (2019.01); G06F 16/285 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method, comprising:
selecting, by a computing device, a subset of methods to generate data schemas for input data, from a list of methods for generating data schemas, based on an output of a regression model, wherein the output of the regression model comprises a numeric indicator of schema accuracy for each method in the set of methods associated with the determined data category;
generating, by the computing device, a candidate schema for each method in the subset of methods to generate data schemas; and
generating, by the computing device, a master data schema for the input data by merging the candidate schema for each method in the subset of methods to generate data schemas, utilizing predetermined rules, wherein the predetermined rules comprise selecting three methods having a highest numeric indicator or schema accuracy.