CPC G10L 15/22 (2013.01) [G05D 1/0061 (2013.01); G05D 1/0223 (2013.01); G10L 13/00 (2013.01); G10L 15/26 (2013.01); G10L 15/30 (2013.01); G10L 15/1822 (2013.01)] | 20 Claims |
1. A computer-implemented method for interacting with a user, the method comprising:
utilizing a processor associated with a virtual personal assistance system, wherein the processor is configured to:
obtain first sensor data from a first sensor included in a plurality of sensors, the first sensor data including visual data;
analyze the first sensor data using a first deep learning model to generate a first prediction value;
obtain second sensor data from a second sensor included in the plurality of sensors, the second sensor data including at least one of audio data, temperature data, or biological data;
analyze the second sensor data using a second deep learning model that is different from the first deep learning model to generate a second prediction value;
aggregate the first prediction value with the second prediction value to determine an aggregate prediction value; and
output, via an audio output device, a natural language audio output to the user based on the aggregate prediction value.
|