CPC G06F 40/30 (2020.01) [G06F 16/3344 (2019.01); G06F 18/214 (2023.01); G06F 40/205 (2020.01); G06F 40/211 (2020.01); G06F 40/253 (2020.01); G06F 40/295 (2020.01); G06N 20/00 (2019.01); G06F 40/268 (2020.01); G06F 40/279 (2020.01); G06F 40/56 (2020.01); G10L 15/063 (2013.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01)] | 33 Claims |
1. A method of training a natural language generation system, the method comprising:
determining a prefix tree via a processor based on a training data source that includes a plurality of natural language sentences, the prefix tree linking one or more named entities included in the training data source to one or more attributes in the training data source;
determining a plurality of concepts via the processor based on the training data source;
parsing the plurality of natural language sentences via the processor based on an ontology to determine a plurality of parse tree structures representing the plurality of natural language sentences, the ontology including an ontological vocabulary class of linguistic features identifying words used to represent ontological entities and relationships within the training data source;
determining a plurality of concept expression templates via the processor by collapsing and parameterizing the plurality of parse tree structures based on the prefix tree, the ontology, and the plurality of concepts, the plurality of concept expression templates modeling how the plurality of natural language sentences express the plurality of concepts as applied to the named entities and the corresponding attributes; and
training the natural language generation system based on the plurality of concept expression templates to generate natural language output expressing a concept of the plurality of concepts about an input data set.
|