US 12,032,910 B2
	Joint intent and entity recognition using transformer models
Oluwatobi Olabiyi, Arlington, VA (US); Erik T. Mueller, Chevy Chase, MD (US); Zachary Kulis, Washington, DC (US); and Varun Singh, Silver Spring, MD (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Sep. 26, 2022, as Appl. No. 17/952,912.
Application 17/952,912 is a continuation of application No. 16/881,282, filed on May 22, 2020, granted, now 11,468,239.
Prior Publication US 2023/0020350 A1, Jan. 19, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/284 (2020.01); G06F 18/24 (2023.01); G06F 40/295 (2020.01); G06N 3/08 (2023.01)

CPC G06F 40/284 (2020.01) [G06F 18/24 (2023.01); G06F 40/295 (2020.01); G06N 3/08 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

initializing a model having a sequence-to-sequence network architecture, wherein the sequence-to-sequence network architecture comprises an encoder;

training, based on a training set, the model to process data associated with a first task and a second task, the training set comprising a plurality of encoder sequences, wherein each of the encoder sequences includes one or more elements, and relationships between the one or more elements and a plurality of classes of elements associated with a purpose, and wherein training the model comprises:

generating, for each element, a vector representation identifying relationships between each element and the plurality of classes of elements;

generating, based on the vector representation of each element in each sequence, an encoding of each encoder sequence in the training set, each encoding comprising an indication of the first task or the second task, an encoder attention weight calculated based on the indication and a feed-forward analysis using a position of each element in the encoder sequence, wherein an encoding for an encoder sequence not having a corresponding indication of the first task comprises an encoder attention weight of zero for training the model for the first task, and wherein an encoding for an encoder sequence not having a corresponding indication of the second task comprises an encoder attention weight of zero for training the model for the second task;

applying, for the first task, loss masking to the encoder sequences not having the corresponding indication of the first task;

applying, for the second task, loss masking to the encoder sequences not having the corresponding indication of the second task; and

for each encoding of the encoder sequences, training the encoder using:

the encoding of the encoder sequence;

the encoder attention weight; and

the loss masking of the first task or the loss masking of the second task; and

generating, using the trained model having been trained for the first task and for the second task, a prediction based on an input data set.