CPC G06F 16/739 (2019.01) [G06F 16/7844 (2019.01); G06F 40/284 (2020.01); G10L 13/08 (2013.01)] | 20 Claims |
1. A method for identifying intents, comprising:
receiving a video and a transcript of the video;
encoding the video to obtain a sequence of video encodings;
encoding the transcript to obtain a sequence of text encodings;
applying a visual gate to the sequence of text encodings by performing a cross-attention mechanism on the sequence of text encodings and the sequence of video encodings to obtain gated text encodings; and
generating an intent label for the transcript based on the gated text encodings.
|