| CPC G06F 40/295 (2020.01) [G06F 16/3344 (2019.01); G06F 40/103 (2020.01); G06F 40/126 (2020.01); G06F 40/30 (2020.01); G06F 40/40 (2020.01); G06N 5/02 (2013.01)] | 18 Claims |

|
1. A computer-implemented method comprising:
generating a training dataset by aligning text from a document of a document database with a named entity from a knowledge base;
generating an enhanced training dataset by updating the training dataset to include a named entity type and a named entity label associated with the named entity;
training a natural language processing (NLP) model using the enhanced training dataset resulting in a trained NLP model;
identifying, using the trained NLP model, the named entity in a block of unstructured text;
generating, using the trained NLP model, a target sequence that includes a relationship between the named entity and a tail entity, wherein the target sequence includes the named entity type and the named entity label of the named entity and includes a tail entity type and a tail entity label of the tail entity;
validating the target sequence by translating the target sequence comprising semantic annotations into a natural language statement;
issuing a query to the document database, wherein the query comprises the natural language statement;
generating, responsive to receiving a query result from the document database in response to the query, a validation result based on a comparison of the natural language statement to the query result;
wherein the NLP model comprises a sequence-to-sequence model comprising an encoder and a decoder, generating, using the encoder, a numeric representation of a word from the block of unstructured text, generating, using the decoder, the target sequence as a prediction of the relationship based at least in part on the numeric representation of the word; and
further comprising validating the target sequence using a classifier model that analyzes the relationship and outputs an indication of an accuracy of the relationship.
|