US 12,456,469 B1
Beamforming using image data
Anshuman Ganguly, Cambridge, MA (US); Srivatsan Kandadai, Danville, CA (US); Trausti Thor Kristjansson, San Jose, CA (US); and Wontak Kim, Watertown, MA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 27, 2022, as Appl. No. 17/849,864.
Int. Cl. G10L 17/20 (2013.01); G06V 40/16 (2022.01); G10L 17/06 (2013.01); G10L 25/60 (2013.01); H04R 1/40 (2006.01); H04R 3/00 (2006.01)
CPC G10L 17/20 (2013.01) [G06V 40/161 (2022.01); G10L 17/06 (2013.01); G10L 25/60 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving a plurality of audio signals from a microphone array of a device;
receiving image data from at least one image capture component of the device;
determining, using the image data, a human represented in the image data;
determining, using an angle determination component, position data corresponding to a first position of the device, the first position resulting from movement of the device;
determining, based at least in part on the position data, a first confidence value corresponding to a likelihood that the human is positioned in a first direction relative to the device;
determining a first plurality of filter coefficients corresponding to the first direction;
performing beamforming using the plurality of audio signals and the first plurality of filter coefficients to determine first processed audio data corresponding to the first direction; and
based at least in part on the first confidence value, causing speech processing to be performed using the first processed audio data.