US 12,443,846 B2
Dialogue training with rich reference-free discriminators
Linfeng Song, Palo Alto, CA (US)
Assigned to TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed by TENCENT AMERICA LLC, Palo Alto, CA (US)
Filed on Apr. 23, 2024, as Appl. No. 18/643,120.
Application 18/643,120 is a continuation of application No. 17/181,475, filed on Feb. 22, 2021, granted, now 11,995,542.
Prior Publication US 2024/0273369 A1, Aug. 15, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/047 (2023.01); G06F 40/20 (2020.01); G06N 3/08 (2023.01); G06N 5/04 (2023.01); G10L 15/06 (2013.01); G10L 15/18 (2013.01)
CPC G06N 3/08 (2013.01) [G06F 40/20 (2020.01); G06N 3/047 (2023.01); G06N 5/041 (2013.01); G10L 15/063 (2013.01); G10L 15/1815 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A method of using a neural network based open-domain dialogue model for open-domain dialogue response generation, the method comprising:
receiving an input utterance from a device having a conversation with a trained neural network based open-domain dialogue model;
obtaining, from the trained neural network based open-domain dialogue model, a response, wherein the trained neural network based open-domain dialogue model is trained based on quality scores of candidate replies using a highest quality score and a quality score of a randomly selected candidate reply among the candidate replies, and wherein obtaining the response comprises:
obtaining at least one candidate reply to the input utterance;
obtaining at least one quality score corresponding to the at least one candidate reply is based on a plurality of reference-free discriminators, and wherein the plurality of the reference-free discriminators evaluate at least one of:
a fluency of the at least one candidate reply by determining a perplexity of candidate replies based on contextual information,
a specificity of the at least one candidate reply by determining normalized inverse function document frequencies of the candidate replies, and
a consistency of the at least one candidate reply by calculating a probability that the at least one candidate reply contradict previous candidate replies output by the trained neural network based open-domain dialogue model during the conversation; and
determining the response based on the at least one quality score corresponding to the at least one candidate reply.