CPC G06F 40/284 (2020.01) [G06F 18/24 (2023.01); G06F 40/295 (2020.01); G06N 3/08 (2013.01)] | 20 Claims |
1. A computer-implemented method, comprising:
initializing a model having a sequence-to-sequence network architecture, wherein the sequence-to-sequence network architecture comprises an encoder;
training, based on a training set, the model to process data associated with a first task and a second task, the training set comprising a plurality of encoder sequences, wherein each of the encoder sequences includes one or more elements, and relationships between the one or more elements and a plurality of classes of elements associated with a purpose, and wherein training the model comprises:
generating, for each element, a vector representation identifying relationships between each element and the plurality of classes of elements;
generating, based on the vector representation of each element in each sequence, an encoding of each encoder sequence in the training set, each encoding comprising an indication of the first task or the second task, an encoder attention weight calculated based on the indication and a feed-forward analysis using a position of each element in the encoder sequence, wherein an encoding for an encoder sequence not having a corresponding indication of the first task comprises an encoder attention weight of zero for training the model for the first task, and wherein an encoding for an encoder sequence not having a corresponding indication of the second task comprises an encoder attention weight of zero for training the model for the second task;
applying, for the first task, loss masking to the encoder sequences not having the corresponding indication of the first task;
applying, for the second task, loss masking to the encoder sequences not having the corresponding indication of the second task; and
for each encoding of the encoder sequences, training the encoder using:
the encoding of the encoder sequence;
the encoder attention weight; and
the loss masking of the first task or the loss masking of the second task; and
generating, using the trained model having been trained for the first task and for the second task, a prediction based on an input data set.
|