CPC G10L 15/20 (2013.01) [G01S 3/8006 (2013.01); G01S 3/802 (2013.01); G10L 15/08 (2013.01); G10L 15/22 (2013.01); G10L 21/0224 (2013.01); G10L 21/0232 (2013.01); G10L 25/51 (2013.01); G10L 2015/088 (2013.01); G10L 2021/02082 (2013.01); G10L 2021/02166 (2013.01)] | 20 Claims |
1. An audio recognition method, comprising:
obtaining audio signals collected in a plurality of directions in a space, the audio signals comprising a target-audio direct signal;
performing echo cancellation on the audio signals;
obtaining weights of a plurality of time-frequency points in the echo-canceled audio signals, a respective weight of each time-frequency point indicating a relative proportion of the target-audio direct signal in the echo-canceled audio signals at the time-frequency point;
weighting time-frequency components of the audio signals at the plurality of time-frequency points separately for each of the plurality of directions by using the weights of the plurality of time-frequency points, to obtain a weighted audio signal energy distribution of the audio signals in the plurality of directions, further including:
obtaining a weighted covariance matrix of each of the plurality of time-frequency points based at least in part on the obtained weights of the plurality of time-frequency points, starting and ending time points of a target wakeup word in the echo-canceled audio signals, and time-frequency domain expressions of the echo-canceled audio signals; and
performing weighted calculation on a spatial spectrum of the audio signals by using the weighted covariance matrix, to obtain the spatial spectrum of the audio signals weighted at the plurality of time-frequency points;
obtaining a sound source azimuth corresponding to the target-audio direct signal in the audio signals by using the weighted audio signal energy distribution of the audio signals in the plurality of directions; and
performing audio recognition to the audio signals based on the sound source azimuth corresponding to the target-audio direct signal.
|