US 12,423,523 B2
Generating semantic triplets from unstructured text using named entities
Gaetano Rossiello, Brooklyn, NY (US); Alfio Massimiliano Gliozzo, Brooklyn, NY (US); Nandana Sampath Mihindukulasooriya, Dublin (IE); Faisal Mahbub Chowdhury, Woodside, NY (US); and Michael Robert Glass, Bayonne, NJ (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 14, 2022, as Appl. No. 18/080,872.
Prior Publication US 2024/0202447 A1, Jun. 20, 2024
Int. Cl. G06F 40/295 (2020.01); G06F 16/334 (2025.01); G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06F 40/30 (2020.01); G06F 40/40 (2020.01); G06N 3/0455 (2023.01); G06N 5/02 (2023.01)
CPC G06F 40/295 (2020.01) [G06F 16/3344 (2019.01); G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06F 40/30 (2020.01); G06F 40/40 (2020.01); G06N 5/02 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
generating a training dataset by aligning text from a document of a document database with a named entity from a knowledge base;
generating an enhanced training dataset by updating the training dataset to include a named entity type and a named entity label associated with the named entity;
training a natural language processing (NLP) model using the enhanced training dataset resulting in a trained NLP model;
identifying, using the trained NLP model, the named entity in a block of unstructured text;
generating, using the trained NLP model, a target sequence that includes a relationship between the named entity and a tail entity, wherein the target sequence includes the named entity type and the named entity label of the named entity and includes a tail entity type and a tail entity label of the tail entity;
validating the target sequence by translating the target sequence comprising semantic annotations into a natural language statement;
issuing a query to the document database, wherein the query comprises the natural language statement;
generating, responsive to receiving a query result from the document database in response to the query, a validation result based on a comparison of the natural language statement to the query result;
wherein the NLP model comprises a sequence-to-sequence model comprising an encoder and a decoder, generating, using the encoder, a numeric representation of a word from the block of unstructured text, generating, using the decoder, the target sequence as a prediction of the relationship based at least in part on the numeric representation of the word; and
further comprising validating the target sequence using a classifier model that analyzes the relationship and outputs an indication of an accuracy of the relationship.