| CPC G10L 21/0232 (2013.01) [H04R 3/005 (2013.01); G10L 2021/02166 (2013.01)] | 20 Claims |

|
1. A method of processing an audio signal, comprising:
receiving a first audio signal via a plurality of microphones, the first audio signal including a number (B) of frames for each of the plurality of microphones, each of the B frames for each of the plurality of microphones including a number (N) of time-domain samples;
for a first microphone included in the plurality of microphones:
transforming the B*N time-domain samples into B*N/2 first frequency-domain samples based on an N-point fast Fourier transform (FFT);
transforming the B*N/2 first frequency-domain samples into B*N/2 second frequency-domain samples based on a B-point FFT; and
determining a probability of speech associated with the B*N/2 second frequency-domain samples based on a neural network model;
determining a minimum variance distortionless response (MVDR) beamforming filter based at least in part on the probability of speech for the first microphone; and
processing the first audio signal based on the MVDR beamforming filter.
|