US 11,887,010 B2
Data classification for data lake catalog
Marcio T. Moura, Wellington, FL (US); Qiqing C. Ouyang, Yorktown Heights, NY (US); Jo A. Ramos, Grapevine, TX (US); and Deepak Rangarao, Cupertino, CA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Dec. 15, 2017, as Appl. No. 15/842,965.
Application 15/842,965 is a continuation of application No. 15/823,771, filed on Nov. 28, 2017.
Prior Publication US 2019/0164063 A1, May 30, 2019
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 5/02 (2023.01); G06N 20/00 (2019.01); G06F 16/9032 (2019.01); G06F 16/903 (2019.01); G06Q 10/067 (2023.01)
CPC G06N 5/02 (2013.01) [G06F 16/90332 (2019.01); G06F 16/90344 (2019.01); G06N 20/00 (2019.01); G06Q 10/067 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A computer-implemented method for a data classifier the method comprising executing on a computer processor:
extracting from a structured text business data input, via natural language understanding processing, training set data elements that are selected from the group consisting of training keywords, training concepts, training entities, and training taxonomy classifications;
identifying associations within the structured text business data of each of a plurality of business classes with respective ones of the extracted training set data elements, wherein the business classes comprise at least one of business terms, entity names, attribute names, and column names; and
building a logical relationship data classification training knowledge base ontology that connects ones of the business classes to respective associated ones of the extracted training data elements as questions, into a plurality of knowledge base ontology question-business class associations.