US 12,217,146 B2
	Generating dual sequence inferences using a neural network model
Victor Zhong, Palo Alto, CA (US); Caiming Xiong, Menlo Park, CA (US); and Richard Socher, Menlo Park, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Oct. 20, 2021, as Appl. No. 17/506,033.
Application 17/506,033 is a continuation of application No. 15/881,582, filed on Jan. 26, 2018, granted, now 11,170,287.
Claims priority of provisional application 62/578,380, filed on Oct. 27, 2017.
Prior Publication US 2022/0044093 A1, Feb. 10, 2022
Int. Cl. G06N 3/04 (2023.01); G06F 16/2458 (2019.01); G06F 16/93 (2019.01); G06N 3/006 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06N 5/04 (2023.01); G06N 3/048 (2023.01)

CPC G06N 3/04 (2013.01) [G06F 16/2462 (2019.01); G06F 16/93 (2019.01); G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 5/04 (2013.01); G06N 3/048 (2023.01)]

20 Claims

1. A computer-implemented method for training a neural network model, the method comprising:

generating, using the neural network model, a series of inferences from a first training sequence and a second training sequence;

generating a mixed learning objective from the series of inferences, wherein the generating further comprises:

determining, using a supervised learning objective, a first loss or a first reward for an inference in the series of inferences independently of other inferences in the series of inferences;

determining, using a reinforcement learning objective, a second loss or a second reward over the series of inferences by:

determining a baseline score using a scoring function, wherein the baseline score is based on baseline start and end positions of an answer span and a ground truth start and end positions of the answer span;

determining a reinforcement learning reward function based on the scoring function and the baseline score, wherein the reinforcement learning reward function is based on the ground truth start and end positions of the answer span and a start and an end positions of an inference answer span; and

determining the second loss or the second reward based on the reinforcement learning reward function; and

combining the supervised learning objective and the reinforcement learning objective into a mixed objective using the first and second loss or the first and second reward; and

updating parameters of the neural network model based on a loss or a reward in the mixed learning objective.