US 12,080,284 B2
	Two-way in-vehicle virtual personal assistant
Gregory Bohl, Muenster, TX (US); Mengling Hettinger, Murphy, TX (US); Prithvi Kambhampati, Dallas, TX (US); Behrouz Saghafi Khadem, Dallas, TX (US); and Nikhil Patel, Plano, TX (US)
Assigned to Harman International Industries, Incorporated, Stamford, CT (US)
Filed by HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED, Stamford, CT (US)
Filed on Dec. 23, 2019, as Appl. No. 16/726,216.
Claims priority of provisional application 62/786,247, filed on Dec. 28, 2018.
Prior Publication US 2020/0211553 A1, Jul. 2, 2020
Int. Cl. G10L 15/22 (2006.01); G05D 1/00 (2024.01); G10L 13/00 (2006.01); G10L 15/18 (2013.01); G10L 15/26 (2006.01); G10L 15/30 (2013.01)

CPC G10L 15/22 (2013.01) [G05D 1/0061 (2013.01); G05D 1/0223 (2013.01); G10L 13/00 (2013.01); G10L 15/26 (2013.01); G10L 15/30 (2013.01); G10L 15/1822 (2013.01)]

20 Claims

1. A computer-implemented method for interacting with a user, the method comprising:

utilizing a processor associated with a virtual personal assistance system, wherein the processor is configured to:

obtain first sensor data from a first sensor included in a plurality of sensors, the first sensor data including visual data;

analyze the first sensor data using a first deep learning model to generate a first prediction value;

obtain second sensor data from a second sensor included in the plurality of sensors, the second sensor data including at least one of audio data, temperature data, or biological data;

analyze the second sensor data using a second deep learning model that is different from the first deep learning model to generate a second prediction value;

aggregate the first prediction value with the second prediction value to determine an aggregate prediction value; and

output, via an audio output device, a natural language audio output to the user based on the aggregate prediction value.