US 12,293,770 B2
	Voice signal dereverberation processing method and apparatus, computer device and storage medium
Rui Zhu, Shenzhen (CN); Juan Juan Li, Shenzhen (CN); Yan Nan Wang, Shenzhen (CN); and Yue Peng Li, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Guangdong (CN)
Filed on Mar. 2, 2022, as Appl. No. 17/685,042.
Application 17/685,042 is a continuation of application No. PCT/CN2021/076465, filed on Feb. 10, 2021.
Claims priority of application No. 202010250009.3 (CN), filed on Apr. 1, 2020.
Prior Publication US 2022/0230651 A1, Jul. 21, 2022
Int. Cl. G10L 25/12 (2013.01); G10L 21/0232 (2013.01); G10L 25/18 (2013.01); G10L 25/21 (2013.01); G10L 25/30 (2013.01); G10L 21/0208 (2013.01)

CPC G10L 21/0232 (2013.01) [G10L 25/12 (2013.01); G10L 25/18 (2013.01); G10L 25/21 (2013.01); G10L 25/30 (2013.01); G10L 2021/02082 (2013.01)]

20 Claims

1. A speech signal dereverberation processing method, executed by at least one processor, the method comprising:

extracting an amplitude spectrum feature and a phase spectrum feature of a current frame in an original speech signal;

extracting subband amplitude spectrums from the amplitude spectrum feature corresponding to the current frame;

determining, based on the subband amplitude spectrums and a reverberation strength distribution associated with the current frame and by using a first model, a reverberation strength indicator corresponding to the current frame, the first model being a first neural network model that is trained using reverberated band amplitude spectrum, clean speech band amplitude spectrum, and a reverberation-to-clean-speech energy ratio, with the reverberation-to-clean-speech energy ratio used as a training target;

determining, based on the subband amplitude spectrums and the reverberation strength indicator, and by using a second model, a clean speech subband spectrum corresponding to the current frame, wherein the second model is a regressive reverberation strength prediction algorithm model based on a history frame; and

obtaining a dereverberated clean speech signal by performing signal conversion on the clean speech subband spectrum and the phase spectrum feature corresponding to the current frame.