| CPC G06N 3/04 (2013.01) [G06F 16/2462 (2019.01); G06F 16/93 (2019.01); G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 5/04 (2013.01); G06N 3/048 (2023.01)] | 20 Claims | 

| 
               1. A computer-implemented method for training a neural network model, the method comprising: 
            generating, using the neural network model, a series of inferences from a first training sequence and a second training sequence; 
                generating a mixed learning objective from the series of inferences, wherein the generating further comprises: 
                determining, using a supervised learning objective, a first loss or a first reward for an inference in the series of inferences independently of other inferences in the series of inferences; 
                determining, using a reinforcement learning objective, a second loss or a second reward over the series of inferences by: 
                determining a baseline score using a scoring function, wherein the baseline score is based on baseline start and end positions of an answer span and a ground truth start and end positions of the answer span; 
                  determining a reinforcement learning reward function based on the scoring function and the baseline score, wherein the reinforcement learning reward function is based on the ground truth start and end positions of the answer span and a start and an end positions of an inference answer span; and 
                  determining the second loss or the second reward based on the reinforcement learning reward function; and 
                combining the supervised learning objective and the reinforcement learning objective into a mixed objective using the first and second loss or the first and second reward; and 
                updating parameters of the neural network model based on a loss or a reward in the mixed learning objective. 
               |