CPC G10L 25/63 (2013.01) [G06N 20/00 (2019.01); G10L 15/26 (2013.01); G10L 15/32 (2013.01); H04L 51/02 (2013.01); H04M 3/4936 (2013.01)] | 23 Claims |
1. A method for predicting an emotion based on a multimodal input, the method comprising:
receiving, by a communications circuitry, a multimodal input from a user including at least (i) an amount of keystrokes over a period of time, (ii) text, and (iii) speech;
converting, via automatic speech recognition circuitry, the speech to converted text;
causing, by a speech-context Bidirectional Long Short-Term Memory (BLSTM) of an emotion prediction circuitry and using the text and the converted text, generation of context hidden vectors;
causing, by the emotion prediction circuitry, generation of audio hidden vectors using the speech and an audio BLSTM;
generating, by a trained machine learning model of the emotion prediction circuitry and using the multimodal input, the context hidden vectors, and the audio hidden vectors, an EmotionPrint for the user; and
determining, by the emotion prediction circuitry and using the EmotionPrint, a next action.
|