US 12,288,550 B2
Framework for focused training of language models and techniques for end-to-end hypertuning of the framework
Poorya Zaremoodi, Melbourne (AU); Cong Duy Vu Hoang, Melbourne (AU); Duy Vu, Melbourne (AU); Dai Hoang Tran, Sydney (AU); Budhaditya Saha, Sydney (AU); Nagaraj N. Bhat, Bengaluru (IN); Thanh Tien Vu, Herston (AU); Tuyen Quang Pham, Springvale (AU); Adam Craig Pocock, Burlington, MA (US); Katherine Silverstein, Somerville, MA (US); Srinivasa Phani Kumar Gadde, Fremont, CA (US); Vishal Vishnoi, Redwood City, CA (US); Mark Edward Johnson, Sydney (AU); and Thanh Long Duong, Seabrook (AU)
Assigned to ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Sep. 23, 2022, as Appl. No. 17/952,116.
Claims priority of provisional application 63/248,209, filed on Sep. 24, 2021.
Prior Publication US 2023/0098783 A1, Mar. 30, 2023
Int. Cl. G10L 15/06 (2013.01); G10L 15/183 (2013.01)
CPC G10L 15/063 (2013.01) [G10L 15/183 (2013.01); G10L 2015/0635 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method, comprising:
obtaining a machine learning model pre-trained for language modeling;
post-training the machine learning model for multiple tasks to generate a focused machine learning model, wherein the post-training comprises:
training the machine learning model using an unlabeled set of training data, wherein the unlabeled set of training data pertains to a task of the multiple tasks, the machine learning model is pre-trained for the task as part of the language modeling, and the unlabeled set of training data pertains to a target domain, a target task, or a target language,
wherein said training comprises performing iterative training operations to further optimize model parameters of the machine learning model to encode information related to the target domain, the target task, or the target language,
further training the machine learning model using a labeled set of training data, wherein the labeled set of training data pertains to another task of the multiple tasks, the another task being an auxiliary task that is related to a downstream task to be performed using the machine learning model or output from the machine learning model, and
wherein said further training comprises performing iterative training operations to further optimize the model parameters of the machine learning model to encode auxiliary information related to the downstream task; and
providing the focused machine learning model comprising the optimized model parameters.