CPC G10L 15/197 (2013.01) [G10L 15/063 (2013.01)] | 19 Claims |
1. A method for processing voice commands from a user, the method comprising:
receiving a first audio input acquired while the user utters a first utterance;
receiving a first video input including video of the user acquired in conjunction with acquiring the first audio input;
determining that the first utterance includes a command directed to a system based at least in part on
processing the first audio input, and
processing the first video input including identifying a visual characteristic associated with the user uttering the first utterance; and
causing the system to act on the command after determining that the first utterance includes the command directed to the system.
|