| CPC G10L 15/16 (2013.01) [G06N 3/045 (2023.01); G10L 15/26 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for training a neural network to compute a first output corresponding to a first input, comprising:
obtaining a corpus of training data;
initializing parameters of the neural network; and
training the parameters of the neural network with a plurality of update steps, wherein a first update step comprises:
determining a first future-context size by sampling a probability distribution to choose a value for the first future-context size with a probability defined by a density function of the probability distribution, wherein the first future-context size corresponds to an amount of input that are future to the first input to be used in computing the first output,
masking the neural network using the first future-context size to obtain a first masked neural network,
computing the first output of the neural network by processing a first sample of the training data with the first masked neural network,
computing a first loss value using the first output, and
updating the parameters of the neural network using the first loss value.
|