CPC G06F 40/289 (2020.01) [G06F 18/2431 (2023.01); G06F 40/284 (2020.01); G06N 20/00 (2019.01); G10L 15/063 (2013.01)] | 20 Claims |
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform:
generating training data for an intent classification machine learning model by:
determining, via a text-to-text machine learning model, one or more respective paraphrases for each sample phrase of training phrases, wherein:
a respective quantity of the one or more respective paraphrases varies for the each sample phrase of the training phrases;
generating, via a label generating machine learning model, labeled data based on unlabeled live logs by:
determining live-log samples from the unlabeled live logs, comprising:
stratifying the unlabeled live logs into multiple data bins based on a respective timestamp of each of the unlabeled live logs; and
randomly selecting respective unlabeled live logs from each of the multiple data bins to add to the live-log samples;
wherein a respective quantity of the respective unlabeled live logs for the each of the multiple data bins is: (a) a predetermined number; or (b) proportional to a respective size of the each of the multiple data bins; and
generating, via the label generating machine learning model, the labeled data based on the live-log samples and one or more labeling functions; and
adding the one or more respective paraphrases for the each sample phrase of the training phrases and the labeled data to the training data; and
transmitting the training data, as generated, to the intent classification machine learning model for training.
|