| CPC G06F 40/56 (2020.01) [G06F 16/90328 (2019.01); G06F 16/90332 (2019.01); G06F 16/906 (2019.01); G06N 3/04 (2013.01); G06N 5/04 (2013.01)] | 19 Claims |

|
1. A method for processing a multi-modal conversation, the method comprising:
receiving sensor data from a plurality of sensors by a computer system, wherein the sensor data comprise a user request in multiple mode inputs associated with the plurality of sensors, wherein the user request includes a portion of a conversation;
vectorizing and embedding the multiple mode inputs with contextual data derived from an intra-query representation of the user request;
concatenating the vectorized multiple mode inputs in a prescribed format;
computing an attention weight using a gradient method to pass the vectorized multiple mode inputs,
determining, based on the concatenated and vectorized multiple mode inputs and the attention weight, an attention in the conversation;
identifying semantic relationships between one or more of the multiple mode inputs from the plurality of sensors;
extracting one or more hard-attentions and one or more soft-attentions from the attention, wherein the one or more hard-attentions are directly extracted from the multiple mode inputs and the one or more soft-attentions are derived based on the semantic relationships;
weighting, based on the one or more hard-attentions and the one or more soft-attentions, portions of the multiple mode inputs to determine a meaning of an input query wherein the weighting is based on the intra-query representation of the user request;
generating the input query at least based on the weighted portions of the multiple mode inputs;
determining one or more sequences attentions and temporal attentions for the input query and analyzing the sequence attentions and temporal attentions through a sequence stream and temporal stream attentional encoder-decoder active learning framework to determine a context of the conversation;
identifying an application class and data sources based on the context;
selecting a call-action-inference pattern based on the sequence attentions and temporal attentions;
transforming the input query into one or more candidate queries;
executing the one or more candidate queries against a respective data store and a respective application to generate candidate responses;
concatenating the candidate responses to generate a response; and
displaying the response in an interactive dashboard.
|