US 12,265,783 B2
Systems and methods for multi-modal conversation summarization on a conversation platform
Divyansh Agarwal, San Francisco, CA (US); Chien-Sheng Wu, Mountain View, CA (US); and Tian Xie, San Jose, CA (US)
Assigned to Salesforce, Inc., San Francisco, CA (US)
Filed by Salesforce, Inc., San Francisco, CA (US)
Filed on Nov. 10, 2022, as Appl. No. 18/054,511.
Prior Publication US 2024/0160837 A1, May 16, 2024
Int. Cl. G06F 16/00 (2019.01); G06F 16/334 (2025.01); G06F 16/34 (2019.01); G06F 40/166 (2020.01); G06F 40/205 (2020.01); H04L 51/216 (2022.01); G06V 20/40 (2022.01)
CPC G06F 40/166 (2020.01) [G06F 16/3344 (2019.01); G06F 16/345 (2019.01); G06F 40/205 (2020.01); H04L 51/216 (2022.05); G06V 20/47 (2022.01)] 15 Claims
OG exemplary drawing
 
1. A method of multi-modal summarization of communication on a messaging platform, the method comprising:
receiving, via a user interface, a user request for summarizing communication messages relating to a topic between a first user and a second user on a messaging platform;
searching, via a search engine, for messages between the first user and the second user on the messaging platform;
filtering the messages based on the topic from the user request by predicting, via a topic classification model, whether the messages are related to the topic and excluding a subset of messages that are predicted to be unrelated to the topic;
generating, by a text encoder of a multi-modal summarization model, a text representation from an input sequence corresponding to textual content from the filtered messages;
generating, by an image encoder of the multi-modal summarization model, an image representation of visual features in a multimedia attachment file from the filtered messages;
generating, via a decoder of the multi-modal summarization model, a text summary summarizing both the filtered messages and the multimedia attachment file based on a combination of the text representation generated by the text encoder and the image representation generated by the image encoder,
wherein the text summary comprises a text that references a time and/or a sender of the multimedia attachment file generated from metadata of the multimedia attachment file; and
transmitting, via the user interface, the generated text summary relating to the topic in response to the user request.