US 12,124,508 B2
Multimodal intent discovery system
Adyasha Maharana, Cary, NC (US); Quan Hung Tran, San Jose, CA (US); Seunghyun Yoon, San Jose, CA (US); Franck Dernoncourt, San Jose, CA (US); Trung Huu Bui, San Jose, CA (US); and Walter W. Chang, San Jose, CA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Jul. 12, 2022, as Appl. No. 17/811,963.
Prior Publication US 2024/0020337 A1, Jan. 18, 2024
Int. Cl. G06F 16/73 (2019.01); G06F 16/738 (2019.01); G06F 16/783 (2019.01); G06F 40/284 (2020.01); G10L 13/08 (2013.01)
CPC G06F 16/739 (2019.01) [G06F 16/7844 (2019.01); G06F 40/284 (2020.01); G10L 13/08 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for identifying intents, comprising:
receiving a video and a transcript of the video;
encoding the video to obtain a sequence of video encodings;
encoding the transcript to obtain a sequence of text encodings;
applying a visual gate to the sequence of text encodings by performing a cross-attention mechanism on the sequence of text encodings and the sequence of video encodings to obtain gated text encodings; and
generating an intent label for the transcript based on the gated text encodings.