CPC G06F 21/6254 (2013.01) [G06F 16/345 (2019.01); G06F 40/166 (2020.01); G06F 40/216 (2020.01); G06F 40/284 (2020.01); G06F 40/40 (2020.01); G06F 40/47 (2020.01); G06F 40/56 (2020.01); G06F 40/58 (2020.01); G06N 3/045 (2023.01); G06N 3/09 (2023.01); G06N 20/00 (2019.01)] | 20 Claims |
1. A computer-implemented method comprising:
populating a fake value for each entity within a set of entities, to generate a string of fake entity values that correspond to the entities, respectively;
inserting a sentinel token between adjacent fake values included in the string of fake entity values to generate first input data;
generating, by a natural language generation model, natural language sentences based on the first input data, wherein the natural language sentences comprise one or more fake values from the string of fake entity values;
performing pre-processing on the natural language sentences, to generate pre-processed natural language sentences;
analyzing the pre-processed natural language sentences to determine whether a fake value from the string of fake entity values is missing in the pre-processed natural language sentences;
in response to determining that the fake value is missing, summarizing, using a text summarization model, the pre-processed natural language sentences to generate a text summary;
concatenating the text summary with the fake value, to generate second input data;
generating, by a next sentence generation model, an additional natural language sentence, based on the second input data, wherein the additional natural language sentence comprises the fake value;
combining the additional natural language sentence with the pre-processed natural language sentences to generate a text portion comprising a first plurality of natural language sentences that are obtained as a result of the combining;
post-processing the result of the combining, to generate a collection of final natural language sentences; and
outputting the collection of final natural language sentences.
|