US 12,288,566 B1
Beamforming using multiple sensor data
Anshuman Ganguly, Cambridge, MA (US); Srivatsan Kandadai, Danville, CA (US); Trausti Thor Kristjansson, San Jose, CA (US); and Wontak Kim, Watertown, MA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 27, 2022, as Appl. No. 17/849,823.
Int. Cl. G10L 21/0216 (2013.01); G10L 21/0264 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01)
CPC G10L 21/0216 (2013.01) [G10L 21/0264 (2013.01); G10L 25/51 (2013.01); G10L 25/78 (2013.01); G10L 2021/02166 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
receiving a plurality of audio signals from a microphone array of a device;
performing beamforming using the plurality of audio signals to determine:
first audio data corresponding to a first direction relative to the device, and
second audio data corresponding to a second direction relative to the device, the second direction different from the first direction;
determining first signal quality data corresponding to the first audio data;
determining second signal quality data corresponding to the second audio data;
based at least in part on the first signal quality data and the second signal quality data, determining a first confidence value representing a likelihood that a user is positioned in the first direction relative to the device;
receiving first image data from at least one image capture component of a device;
processing the first image data to determine a second confidence value corresponding to a likelihood that the user is positioned in a second direction relative to the device;
determining, based at least in part on the first direction and the first confidence value, first adjusted position data;
determining based at least in part on the second direction and the second confidence value, second adjusted position data;
determining, based at least in part on the first confidence value and the second confidence value, third adjusted position data;
based at least in part on the first adjusted position data, the second adjusted position data and third adjusted position data, determining that the user is positioned in the second direction relative to the device; and
causing speech processing to be performed using the second audio data.