| CPC G06N 3/08 (2013.01) [G06F 40/30 (2020.01); G06N 3/045 (2023.01)] | 18 Claims |

|
1. A method performed by one or more computers, wherein the method comprises:
pre-training a neural network that has a plurality of network parameters to generate a pre-trained neural network, wherein the pre-training comprises:
obtaining a text document comprising a plurality of text segments;
determining, for each of the plurality of text segments, an importance score of the text segment that characterizes a relative importance of the segment with respect to other text segments in the text document;
selecting one or more text segments based on the importance scores;
generating a masked text document that replaces the one or more text segments in the text document with mask tokens;
processing, using the neural network and in accordance with current values of the plurality of network parameters, the masked text document to generate a prediction of the one or more text segments; and
determining, based on a difference between the prediction and the one or more text segments, an update to the current values of the plurality of network parameters; and
providing the pre-trained neural network for adaptation to perform a specific text processing task using labeled text data.
|