US 12,153,896 B2
Method and system for controlling distributions of attributes in language models for text generation
Marc Dymetman, Grenoble (FR); Hady Elsahar, Grenoble (FR); and Muhammad Khalifa, Grenoble (FR)
Assigned to Naver Corporation, (KR)
Filed by Naver Corporation, Seongnam-si (KR)
Filed on Aug. 2, 2021, as Appl. No. 17/391,178.
Claims priority of application No. 2010054 (FR), filed on Oct. 1, 2020; and application No. 21305835 (EP), filed on Jun. 17, 2021.
Prior Publication US 2022/0108081 A1, Apr. 7, 2022
Int. Cl. G06F 40/40 (2020.01); G06F 40/10 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2023.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)
CPC G06F 40/40 (2020.01) [G06F 40/10 (2020.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01)] 22 Claims
OG exemplary drawing
 
1. A method for generating, from a pre-trained language model, a target language model for controlled text generation, the target language model having minimal divergence with pre-trained language model distribution, comprising:
(a) receiving a pre-trained language model having attributes with existing probability distributions over the pre-trained language model;
(b) receiving at least one target constraint, the received target constraint specifying an expectation of a target attribute over the target language model, the target language model approximating the pre-trained language model;
(c) computing parameters of an energy based model by applying the received target constraint to the pre-trained language model;
(d) obtaining samples from a reference policy;
(e) updating parameters of a target policy using the obtained samples from the reference policy and the energy based model;
(f) updating the reference policy with the target policy if a first distance between the target policy and an implicit probability distribution, the implicit probability distribution being represented by the energy based model, is smaller than a second distance between the reference policy and the implicit probability distribution represented by the energy based model, the first and second distances being calculated as a divergence;
(g) repeating (d), (e) and (f) until the target policy converges with the target constraint; and
(h) outputting the target policy as the target language model having minimal divergence with pre-trained language model distribution and configured to generate controlled text with the target attribute over a probability distribution approximating a probability distribution specified by the target constraint.