US 12,462,115 B2
	System and method for temporal attention behavioral analysis of multi-modal conversations in a question and answer system
Rajasekhar Tumuluri, Bridgewater, NJ (US)
Assigned to Openstream Inc., Bridgewater, NJ (US)
Filed by Openstream Inc., Bridgewater, NJ (US)
Filed on Aug. 11, 2023, as Appl. No. 18/448,228.
Application 18/448,228 is a continuation of application No. 17/103,460, filed on Nov. 24, 2020, granted, now 11,769,018.
Prior Publication US 2023/0385560 A1, Nov. 30, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/35 (2020.01); G06F 16/9032 (2019.01); G06F 16/906 (2019.01); G06F 40/56 (2020.01); G06N 3/04 (2023.01); G06N 5/04 (2023.01)

CPC G06F 40/56 (2020.01) [G06F 16/90328 (2019.01); G06F 16/90332 (2019.01); G06F 16/906 (2019.01); G06N 3/04 (2013.01); G06N 5/04 (2013.01)]

19 Claims

1. A method for processing a multi-modal conversation, the method comprising:

receiving sensor data from a plurality of sensors by a computer system, wherein the sensor data comprise a user request in multiple mode inputs associated with the plurality of sensors, wherein the user request includes a portion of a conversation;

vectorizing and embedding the multiple mode inputs with contextual data derived from an intra-query representation of the user request;

concatenating the vectorized multiple mode inputs in a prescribed format;

computing an attention weight using a gradient method to pass the vectorized multiple mode inputs,

determining, based on the concatenated and vectorized multiple mode inputs and the attention weight, an attention in the conversation;

identifying semantic relationships between one or more of the multiple mode inputs from the plurality of sensors;

extracting one or more hard-attentions and one or more soft-attentions from the attention, wherein the one or more hard-attentions are directly extracted from the multiple mode inputs and the one or more soft-attentions are derived based on the semantic relationships;

weighting, based on the one or more hard-attentions and the one or more soft-attentions, portions of the multiple mode inputs to determine a meaning of an input query wherein the weighting is based on the intra-query representation of the user request;

generating the input query at least based on the weighted portions of the multiple mode inputs;

determining one or more sequences attentions and temporal attentions for the input query and analyzing the sequence attentions and temporal attentions through a sequence stream and temporal stream attentional encoder-decoder active learning framework to determine a context of the conversation;

identifying an application class and data sources based on the context;

selecting a call-action-inference pattern based on the sequence attentions and temporal attentions;

transforming the input query into one or more candidate queries;

executing the one or more candidate queries against a respective data store and a respective application to generate candidate responses;

concatenating the candidate responses to generate a response; and

displaying the response in an interactive dashboard.