CPC G06F 40/279 (2020.01) [G10L 25/30 (2013.01)] | 14 Claims |
1. A system, comprising a processor to:
pre-train a transformer-based language model on a general domain;
inter-train the pre-trained transformer-based language model using mask language modeling to generate a mask language modeling (MLM) inter-trained transformer-based pre-trained language model;
inter-train the MLM inter-trained transformer-based pre-trained language model using partitioning and classification to generate a doubly inter-trained transformer-based pre-trained language model, wherein the partitioning comprises a clustering based on bag of words representations on a stemmed text to partition, according to class labels, unlabeled training data into clusters of text instances and wherein the inter-training the MLM inter-trained transformer-based pre-trained language model comprises using the clusters of text instances as labeled data for an intermediate training task; and
fine-tune the doubly inter-trained transformer-based pre-trained language model on a target task to generate a fine-tuned transformer-based language model.
|