US 12,217,007 B2
Providing a semantic encoding and language neural network
Thanh Lam Hoang, Maynooth (IE); Dzung Phan, Pleasantville, NY (US); Gabriele Picco, Dublin (IE); Lam Nguyen, Ossining, NY (US); Marco Luca Sbodio, Castaheany (IE); and Vanessa Lopez Garcia, Dublin (IE)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jul. 11, 2022, as Appl. No. 17/811,763.
Prior Publication US 2024/0013003 A1, Jan. 11, 2024
Int. Cl. G06F 40/30 (2020.01); G06F 40/126 (2020.01); G06F 40/205 (2020.01); G06F 40/279 (2020.01); G06F 40/40 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01)
CPC G06F 40/30 (2020.01) [G06F 40/126 (2020.01); G06F 40/205 (2020.01); G06F 40/279 (2020.01); G06F 40/40 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for providing semantic encoding and language generation in a computing system by a processor, comprising:
automatically parsing unstructured data into one or more knowledge graphs based on the unstructured data and a list of candidate relations using a first machine learning model;
encoding, using the first machine learning model, the unstructured data into a distribution of a plurality of triples based on the one or more knowledge graphs, wherein the encoding further comprises predicted probabilities of relations between entities in the unstructured data;
sampling, using a second machine learning model, a set of the plurality of triples from the unstructured data of the one or more knowledge graphs;
generating text data from the set of the plurality of triples using the second machine learning model;
computing a penalty score for the set of the plurality of triples based on a degree of difference between the unstructured data and the generated text data; and
adjusting at least one predicted probability from the first machine learning model based on the determined penalty score.