US 12,437,162 B2
Removing undesirable signals from language models using negative data
Michael Louis Wick, Lexington, MA (US); Jean-Baptiste Frederic George Tristan, Lexington, MA (US); Adam Craig Pocock, Burlington, MA (US); and Katherine Silverstein, Somerville, MA (US)
Assigned to ORACLE INTERNATIONAL CORPORATION, Redwood Shores, CA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jun. 2, 2020, as Appl. No. 16/890,097.
Prior Publication US 2021/0374361 A1, Dec. 2, 2021
Int. Cl. G06F 40/30 (2020.01); G06F 40/58 (2020.01)
CPC G06F 40/58 (2020.01) 19 Claims
OG exemplary drawing
 
1. A method for training a language model using negative data, the method comprising:
accessing a first training corpus comprising positive training data for training a first language model;
accessing a second language model, wherein the second language model comprises an n-gram model or a transformer model that is inhibited by removing position information from the transformer model and configured to generate outputs that are less grammatically correct than outputs generated by the first language model;
generating output text from the second language model to use as a second training corpus of negative training data; and
training the first language model using at least the first training corpus, the second training corpus, and a maximum likelihood function, wherein the maximum likelihood function maximizes a likelihood of the first language model predicting the positive training data while minimizing a likelihood of the first language model predicting the negative training data.