US 12,093,645 B2
Inter-training of pre-trained transformer-based language models using partitioning and classification
Eyal Shnarch, Tel Aviv (IL); Ariel Gera, Haifa (IL); Alon Halfon, Rishon Lezion (IL); Lena Dankin, Haifa (IL); Leshem Choshen, Haifa (IL); Ranit Aharonov, Ramat Hasharon (IL); and Noam Slonim, Beit HaKerem (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Sep. 14, 2021, as Appl. No. 17/474,364.
Prior Publication US 2023/0078698 A1, Mar. 16, 2023
Int. Cl. G06F 40/279 (2020.01); G10L 25/30 (2013.01)
CPC G06F 40/279 (2020.01) [G10L 25/30 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A system, comprising a processor to:
pre-train a transformer-based language model on a general domain;
inter-train the pre-trained transformer-based language model using mask language modeling to generate a mask language modeling (MLM) inter-trained transformer-based pre-trained language model;
inter-train the MLM inter-trained transformer-based pre-trained language model using partitioning and classification to generate a doubly inter-trained transformer-based pre-trained language model, wherein the partitioning comprises a clustering based on bag of words representations on a stemmed text to partition, according to class labels, unlabeled training data into clusters of text instances and wherein the inter-training the MLM inter-trained transformer-based pre-trained language model comprises using the clusters of text instances as labeled data for an intermediate training task; and
fine-tune the doubly inter-trained transformer-based pre-trained language model on a target task to generate a fine-tuned transformer-based language model.