| CPC G10L 17/20 (2013.01) [G06V 40/161 (2022.01); G10L 17/06 (2013.01); G10L 25/60 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01)] | 20 Claims |

|
1. A computer-implemented method, comprising:
receiving a plurality of audio signals from a microphone array of a device;
receiving image data from at least one image capture component of the device;
determining, using the image data, a human represented in the image data;
determining, using an angle determination component, position data corresponding to a first position of the device, the first position resulting from movement of the device;
determining, based at least in part on the position data, a first confidence value corresponding to a likelihood that the human is positioned in a first direction relative to the device;
determining a first plurality of filter coefficients corresponding to the first direction;
performing beamforming using the plurality of audio signals and the first plurality of filter coefficients to determine first processed audio data corresponding to the first direction; and
based at least in part on the first confidence value, causing speech processing to be performed using the first processed audio data.
|