CPC G06F 40/35 (2020.01) [G06F 40/40 (2020.01); G06F 40/205 (2020.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01)] | 10 Claims |
1. A computer-implemented method comprising:
determining, by a chatbot system, a classification result for an utterance using a machine learning model, wherein the machine learning model is trained to, based on at least one input utterance, predict an intent;
determining, by the chatbot system, a plurality of sets of anchors based on the utterance, each of the plurality of sets of anchors corresponding to one or more anchor words of the utterance, wherein determining the plurality of sets of anchors includes selecting the plurality of sets of anchors in one or more rounds using a beam search technique, wherein each anchor word of the plurality of sets of anchors exceeds a uniqueness threshold;
for each anchor set of the plurality of sets of anchors:
generating, by the chatbot system, based on the utterance, one or more synthetic utterances corresponding to the anchor set, each synthetic utterance of the one or more synthetic utterances comprising the one or more anchor words of the anchor set,
determining, by the chatbot system, one or more classification results for the one or more synthetic utterances using the machine learning model, and
determining, by the chatbot system, a confidence value for the anchor set based on the one or more classification results;
selecting, by the chatbot system, a particular set of anchors among the plurality of sets of anchors based on a comparison of the confidence value for the particular set of anchors to a confidence threshold value;
generating, by the chatbot system, a report comprising a representation of one or more anchor words of the particular set of anchors and particular synthetic utterances corresponding to the particular set of anchors, the particular set of anchors corresponding to the confidence value greater than the confidence threshold value;
generating, by the chatbot system, a training dataset for retraining the machine learning model, the training dataset comprising the particular synthetic utterances; and
retraining, by the chatbot system, the machine learning model using the training dataset.
|