US 12,190,906 B1
	Systems and methods for predicting an emotion based on a multimodal input
Vinothkumar Venkataraman, Bangalore (IN); Ramesh Babu Sarvesetty, Bangalore (IN); Rahul Ignatius, Bangalore (IN); Naveen Gururaja Yeri, Bangalore (IN); and Saurav Kumar, Bangalore (IN)
Assigned to Wells Fargo Bank, N.A., San Francisco, CA (US)
Filed by Wells Fargo Bank, N.A., San Francisco, CA (US)
Filed on Dec. 17, 2021, as Appl. No. 17/645,031.
Int. Cl. G10L 25/63 (2013.01); G06N 20/00 (2019.01); G10L 15/26 (2006.01); G10L 15/32 (2013.01); H04L 51/02 (2022.01); H04M 3/493 (2006.01)

CPC G10L 25/63 (2013.01) [G06N 20/00 (2019.01); G10L 15/26 (2013.01); G10L 15/32 (2013.01); H04L 51/02 (2013.01); H04M 3/4936 (2013.01)]

23 Claims

1. A method for predicting an emotion based on a multimodal input, the method comprising:

receiving, by a communications circuitry, a multimodal input from a user including at least (i) an amount of keystrokes over a period of time, (ii) text, and (iii) speech;

converting, via automatic speech recognition circuitry, the speech to converted text;

causing, by a speech-context Bidirectional Long Short-Term Memory (BLSTM) of an emotion prediction circuitry and using the text and the converted text, generation of context hidden vectors;

causing, by the emotion prediction circuitry, generation of audio hidden vectors using the speech and an audio BLSTM;

generating, by a trained machine learning model of the emotion prediction circuitry and using the multimodal input, the context hidden vectors, and the audio hidden vectors, an EmotionPrint for the user; and

determining, by the emotion prediction circuitry and using the EmotionPrint, a next action.