US 12,406,678 B2
	Sound signal purification using decoded monaural signals
Ryosuke Sugiura, Tokyo (JP); Takehiro Moriya, Tokyo (JP); and Yutaka Kamamoto, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 18/032,792
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Nov. 5, 2020, PCT No. PCT/JP2020/041397 § 371(c)(1), (2) Date Apr. 19, 2023, PCT Pub. No. WO2022/097234, PCT Pub. Date May 12, 2022.
Prior Publication US 2023/0402044 A1, Dec. 14, 2023
Int. Cl. G10L 19/008 (2013.01); G10L 25/06 (2013.01); G10L 25/21 (2013.01)

CPC G10L 19/008 (2013.01) [G10L 25/06 (2013.01); G10L 25/21 (2013.01)]

13 Claims

1. A sound signal purification method for obtaining, for each frame, an n-th channel purified decoded sound signal X_nthat is a sound signal of each channel of stereo by using at least an n-th channel decoded sound signal X_n(n is each integer of 1 or more and N or less) that is a decoded sound signal of the each channel of the stereo obtained by decoding a stereo code CS and a monaural decoded sound signal X_Mthat is a monaural decoded sound signal obtained by decoding a monaural code CM that is a code different from the stereo code CS, wherein

the n-th channel decoded sound signal X_nis obtained by decoding the stereo code CS without using either information obtained by decoding the monaural code CM or the monaural code CM, and

the sound signal purification method further comprises

a monaural decoded sound upmixing step of obtaining, for the each frame, an n-th channel upmixed monaural decoded sound signal X_Mnthat is a signal obtained by upmixing the monaural decoded sound signal X_Mfor the each channel by an upmixing process using the monaural decoded sound signal X_Mand inter-channel relationship information that is information indicating a relationship between the channels of the stereo, and

an n-th channel signal purification step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value x_n(t)=(1−a_n)× x_n(t)+α_n× X_Mn(t) obtained by adding a value α_n× X_Mn(t) obtained by multiplying an n-th channel purification weight α_nby a sample value x_Mn(t) of the n-th channel upmixed monaural decoded sound signal X_Mnand a value (1−a_n)× X_n(t) obtained by multiplying a value (1−a_n) obtained by subtracting the n-th channel purification weight α_nfrom 1 by a sample value X_n(t) of the n-th channel decoded sound signal X_n, as the n-th channel purified decoded sound signal X_n.