US 11,942,087 B2
	Method and apparatus for using image data to aid voice recognition
Robert A. Zurek, Antioch, IL (US); Adrian M. Schuster, West Olive, MI (US); Fu-Lin Shau, Lake Zurich, IL (US); and Jincheng Wu, Naperville, IL (US)
Assigned to Google Technology Holdings LLC, Mountain View, CA (US)
Filed by Google Technology Holdings LLC, Mountain View, CA (US)
Filed on Jan. 13, 2021, as Appl. No. 17/147,991.
Application 17/147,991 is a continuation of application No. 16/416,427, filed on May 20, 2019, granted, now 10,923,124.
Application 16/416,427 is a continuation of application No. 15/464,704, filed on Mar. 21, 2017, granted, now 10,311,868, issued on Jun. 4, 2019.
Application 15/464,704 is a continuation of application No. 14/164,354, filed on Jan. 27, 2014, granted, now 9,747,900, issued on Aug. 29, 2017.
Claims priority of provisional application 61/827,048, filed on May 24, 2013.
Prior Publication US 2021/0134293 A1, May 6, 2021
Int. Cl. G10L 15/22 (2006.01); B60N 2/00 (2006.01); G06F 3/01 (2006.01); G06V 20/59 (2022.01); G06V 40/16 (2022.01); G06V 40/18 (2022.01); G06V 40/19 (2022.01); G06V 40/20 (2022.01); G10L 15/20 (2006.01); G10L 15/24 (2013.01); G10L 15/25 (2013.01); G10L 15/26 (2006.01); G10L 21/0208 (2013.01); G10L 21/0216 (2013.01); G10L 25/78 (2013.01)

CPC G10L 15/22 (2013.01) [G06F 3/013 (2013.01); G06V 20/59 (2022.01); G06V 40/166 (2022.01); G06V 40/19 (2022.01); G06V 40/20 (2022.01); G10L 15/20 (2013.01); G10L 15/25 (2013.01); G10L 15/26 (2013.01); G10L 21/0208 (2013.01); B60N 2/002 (2013.01); G06V 40/18 (2022.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01); G10L 15/24 (2013.01); G10L 2021/02166 (2013.01); G10L 25/78 (2013.01); H04R 2430/20 (2013.01); H04R 2460/07 (2013.01); H04R 2499/11 (2013.01)]

20 Claims

11. A computing device associated with a user, the computing device comprising:

a plurality of microphones;

a camera;

data processing hardware; and

memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising:

receiving a first acoustic signal comprising first voice data corresponding to an utterance spoken by a user of the computing device;

determining, by processing the first voice data, that the utterance spoken by the user comprises a specific verbal command directed toward the computing device that instructs the computing device to process subsequent speech spoken by the user; and

in response to determining that the utterance spoken by the user comprises the specific verbal command:

receiving image data of a vicinity of the computing device, the image data captured by a camera of the computing device;

determining a direction of the user relative to the computing device based on the image data captured by the camera;

adjusting, based on the direction of the user relative to the computing device, a microphone beamform; and

receiving, using the microphone beamform, a second acoustic signal comprising second voice data corresponding to the subsequent speech spoken by the user.