US 12,451,112 B2
	Acoustic signal enhancement device, acoustic signal enhancement method, and program
Tomohiro Nakatani, Tokyo (JP); Rintaro Ikeshita, Tokyo (JP); Keisuke Kinoshita, Tokyo (JP); Hiroshi Sawada, Tokyo (JP); Naoyuki Kamo, Tokyo (JP); and Shoko Araki, Tokyo (JP)
Assigned to NTT, Inc., Tokyo (JP)
Appl. No. 18/571,765
Filed by NTT, Inc., Tokyo (JP)
PCT Filed Sep. 30, 2021, PCT No. PCT/JP2021/036203 § 371(c)(1), (2) Date Dec. 19, 2023, PCT Pub. No. WO2023/276170, PCT Pub. Date Jan. 5, 2023.
Prior Publication US 2024/0312446 A1, Sep. 19, 2024
Int. Cl. G10K 11/178 (2006.01); H04R 3/00 (2006.01)

CPC G10K 11/17821 (2018.01) [G10K 11/17881 (2018.01); H04R 3/005 (2013.01)]

12 Claims

1. An acoustic signal enhancement device that receives, as an input, a recording sound obtained by frequency division and updates parameters, the acoustic signal enhancement device comprising:

processing circuitry configured to:

assuming that a switch weight is a weight indicating a ratio of a classification to which a recording sound at each timing belongs in classifications of spatial states where a recording sound temporally changes,

perform beamformer processing based on a weighted spatial covariance matrix which is updated and update an auxiliary estimation value of a target sound;

update the switch weight and power of a target sound based on the updated auxiliary estimation value and output an estimation value of the target sound; and

update the weighted spatial covariance matrix based on the updated switch weight and the power.

4. An acoustic signal enhancement device that receives, as inputs, recording sounds from a plurality of microphones, the acoustic signal enhancement device comprising:

processing circuitry configured to,

assuming that a first switch weight is a weight indicating a ratio of a classification to which a recording sound at each timing belongs in classifications of spatial states where a recording sound temporally changes, and

assuming that a second switch weight is a weight indicating a ratio of a classification to which a recording sound at each timing belongs in classifications of spatial-temporal states where a recording sound temporally changes,

update a weighted spatial covariance matrix for estimating a coefficient for obtaining a target sound of a beamformer based on the first and second switch weights, power of each sound source, and an auxiliary reverberation-suppressed sound of each sound source;

update the coefficient of the beamformer which estimates a separation sound of a separation matrix based on the weighted spatial covariance matrix and update an auxiliary estimation value of each sound source based on the updated coefficient of the beamformer and the auxiliary reverberation-suppressed sound; and

update estimation values of all the sound sources based on the first and second switch weights, update power of each sound source based on the estimation values of all the sound sources, and update the first switch weight based on the power of each sound source.