CPC G06F 40/126 (2020.01) [G06F 40/151 (2020.01); G06N 3/08 (2013.01)] | 20 Claims |
1. A system comprising:
processing circuitry;
a memory coupled to the processing circuitry, the memory including a program stored thereon that, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising:
converting a string of words to a series of tokens;
generating, using the series of tokens as input, a local string-dependent embedding of each token of the series of tokens;
generating, using the series of tokens as input and independent of the local string-dependent embedding, a global string-dependent embedding of each token of the series of tokens;
combining the local string-dependent embedding and the global string-dependent embedding to generate an n-gram induced embedding of each token of the series of tokens;
generating, by a position embedder, a relative position embedding that includes a vector representation of a position of each token in the series of tokens;
combining the relative position embedding and the n-gram induced embedding resulting in a disentangled attention embedding for each token of the series of tokens;
obtaining a masked language model (MLM) previously trained to generate a masked word prediction; and
executing the MLM based on the disentangled attention embedding of each token to generate the masked word prediction.
|