| CPC G06F 40/289 (2020.01) [G06F 40/30 (2020.01); G06N 3/08 (2013.01); H04L 51/02 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
receiving a training set of utterances comprising in-domain examples;
augmenting the training set of utterances with out-of-domain (OOD) examples to generate augmented batches of utterances for training a machine-learning model, wherein the augmenting comprises:
generating a data set of the OOD examples,
filtering out a plurality of OOD examples from the data set of the OOD examples, based on a determination that context of each of the plurality of OOD examples has a substantial similarity to context of one or more of the utterances of the training set of utterances, and
generating the augmented batches of utterances, each of the augmented batches of utterances comprising utterances from the training set of utterances and utterances from the filtered data set of the OOD examples; and
training the machine-learning model using the augmented batches of utterances, wherein the trained machine-learning model is configured to, based on one or more utterances provided as an input by a user, identify an intent from a set of predetermined intents,
wherein the substantial similarity between the context of OOD utterances of the data set of the OOD examples and the context of the utterances of the training set is determined based on a distance measure using a Multilingual Universal Sentence Encoder (MUSE) single embedding, and
wherein if min (d_i) is less than a predetermined threshold, then the context of an OOD utterance of the data set of the OOD examples and the context of an utterance of the training set of utterances is determined to be substantially similar,
where d_i is an Euclidean distance (v_i, u),
v_i is a vector representation of an utterance (x_i) of the training set of utterances and is muse (x_i) where i=1→n, and
u is a vector representation of the OOD utterance of the data set of the OOD examples and is muse (OOD utterance).
|