| CPC G06F 40/40 (2020.01) [G06F 40/211 (2020.01); G06F 40/284 (2020.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
accessing training data comprising a plurality of training examples comprising a first training example, wherein the first training example comprises a first natural language utterance and a first logical form for the first natural language utterance, and wherein the first natural language utterance comprises one or more keywords associated with a target robustness bucket;
replacing, in the first natural language utterance, the one or more keywords with one or more replacement terms sampled from a list of replacement terms to generate a second natural language utterance;
generating a second training example comprising the second natural language utterance and the first logical form;
accessing a data manufacturing template that defines structure of a system operation and constraints imposed on operators of the system operation;
generating a second logical form by filling slots for one or more of the operators in the data manufacturing template based on a schema and values for a system, wherein the one or more of the operators are associated with the target robustness bucket;
translating the second logical form into a third natural language utterance based on a grammar data structure that includes a custom grammar and a set of rules for translating logical form statements into corresponding natural language expressions comprising one or more target keywords, wherein the custom grammar comprises target keywords including the replacement terms from the list of replacement terms, which are used to control the translating such that natural language utterances will contain one or more of the target keywords;
generating a third training example comprising the second logical form and the third natural language utterance;
augmenting the training data by adding the second training example and the third training example to the plurality of training examples to generate an augmented training data set; and
training a machine learning model to generate logical forms for utterances using the augmented training data set.
|