CPC G10L 15/22 (2013.01) [G06F 3/013 (2013.01); G06V 20/59 (2022.01); G06V 40/166 (2022.01); G06V 40/19 (2022.01); G06V 40/20 (2022.01); G10L 15/20 (2013.01); G10L 15/25 (2013.01); G10L 15/26 (2013.01); G10L 21/0208 (2013.01); B60N 2/002 (2013.01); G06V 40/18 (2022.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01); G10L 15/24 (2013.01); G10L 2021/02166 (2013.01); G10L 25/78 (2013.01); H04R 2430/20 (2013.01); H04R 2460/07 (2013.01); H04R 2499/11 (2013.01)] | 20 Claims |
11. A computing device associated with a user, the computing device comprising:
a plurality of microphones;
a camera;
data processing hardware; and
memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising:
receiving a first acoustic signal comprising first voice data corresponding to an utterance spoken by a user of the computing device;
determining, by processing the first voice data, that the utterance spoken by the user comprises a specific verbal command directed toward the computing device that instructs the computing device to process subsequent speech spoken by the user; and
in response to determining that the utterance spoken by the user comprises the specific verbal command:
receiving image data of a vicinity of the computing device, the image data captured by a camera of the computing device;
determining a direction of the user relative to the computing device based on the image data captured by the camera;
adjusting, based on the direction of the user relative to the computing device, a microphone beamform; and
receiving, using the microphone beamform, a second acoustic signal comprising second voice data corresponding to the subsequent speech spoken by the user.
|