| CPC G06F 40/284 (2020.01) [G06F 40/295 (2020.01); G06N 3/08 (2013.01); G06N 5/04 (2013.01)] | 20 Claims |

|
9. A language-processing service configured for natural language understanding (NLU), the language-processing service comprising:
a language model including:
an upstream sequence of transformer blocks configured to receive vectorized training data and emit modified vectorized training data during pretraining, the upstream sequence of transformer blocks including an upstream data embedding,
a downstream sequence of transformer blocks configured to receive the modified vectorized training data and emit pretraining output during the pretraining, the downstream sequence of transformer blocks including a downstream data embedding equivalent to the upstream data embedding, wherein pretraining logic operative during the pretraining is configured to adjust the upstream data embedding and the downstream data embedding by computing a gradient of the upstream data embedding disentangled from a gradient of the downstream data embedding, wherein the gradient of the upstream data embedding is computed based on a loss function of the upstream sequence of transformer blocks and not on a loss function of the downstream sequence of transformer blocks, and wherein the gradient of the downstream data embedding is computed based on the loss function of the upstream sequence of transformer blocks and the loss function of the downstream sequence of transformer blocks,
wherein the upstream and downstream sequences of transformer blocks are configured to execute collectively a multitask pretraining problem;
an input module configured to convey language input to the language-processing model; and
an output module configured to expose an output of the language-processing model.
|