US 11,902,628 B2
	Masked model training of a prediction network
Pengyu Zhao, Beijing (CN); Chunxu Xu, Beijing (CN); Xianghui Mao, Beijing (CN); and Xiaohui Xie, Beijing (CN)
Assigned to HULU, LLC, Santa Monica, CA (US)
Filed by HULU, LLC, Santa Monica, CA (US)
Filed on Jul. 13, 2021, as Appl. No. 17/374,606.
Prior Publication US 2023/0019564 A1, Jan. 19, 2023
Int. Cl. G06N 3/045 (2023.01); G06N 3/084 (2023.01); G06N 3/0895 (2023.01); G06N 3/09 (2023.01); H04N 21/442 (2011.01); H04N 21/466 (2011.01); H04N 21/482 (2011.01); H04L 65/612 (2022.01)

CPC H04N 21/4826 (2013.01) [G06N 3/045 (2023.01); G06N 3/084 (2013.01); H04L 65/612 (2022.05)]

20 Claims

1. A method comprising:

receiving, by a computing device, a first sequence of inputs for processing via a sub-model of a plurality of sub-models, wherein the plurality of sub-models are part of a main model;

masking, by the computing device, an input in the first sequence of inputs with a masked value to generate a second sequence of inputs;

processing, by the computing device, the second sequence of inputs using the sub-model to generate a sequence of features that correspond to the second sequence of inputs, wherein the sub-model comprises a self-attention sub-model that applies attention to inputs in the second sequence of inputs based on relationships between the inputs;

generating, by the computing device, a first output of the main model based on the sequence of features; and

training, by the computing device, the sub-model based on a feature in the sequence of features that corresponds to the masked input and the first output, wherein training the sub-model comprises:

generating a first value based on a feature in the sequence of features that corresponds to the masked input;

generating a second value based on the first output; and

adjusting a parameter of the sub-model using the first value and the second value.