US 12,424,227 B2
	Sound signal refinement method, sound signal decode method, apparatus thereof, program, and storage medium
Ryosuke Sugiura, Tokyo (JP); Takehiro Moriya, Tokyo (JP); and Yutaka Kamamoto, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 18/031,595
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Nov. 5, 2020, PCT No. PCT/JP2020/041402 § 371(c)(1), (2) Date Apr. 12, 2023, PCT Pub. No. WO2022/097239, PCT Pub. Date May 12, 2022.
Prior Publication US 2023/0386481 A1, Nov. 30, 2023
Int. Cl. G10L 19/008 (2013.01)

CPC G10L 19/008 (2013.01)

13 Claims

1. A sound signal purification method for obtaining, for each frame, an n-th channel purified decoded sound signal ^˜X_nthat is a sound signal of each channel of stereo by using at least an n-th channel decoded sound signal X_n(n is each integer of 1 or more and 2 or less) that is a decoded sound signal of the each channel of the stereo obtained by decoding a stereo code CS and a monaural decoded sound signal X_Mthat is a monaural decoded sound signal obtained by decoding a monaural code CM that is a code different from the stereo code CS, wherein

the n-th channel decoded sound signal X_nis obtained by decoding the stereo code CS without using either information obtained by decoding the monaural code CM or the monaural code CM, and

the sound signal purification method comprises

a decoded sound common signal estimation step of obtaining, for the each frame, a decoded sound common signal Y_Mthat is a signal common to all channels of the stereo by using at least all of one or more and two or less n-th channel decoded sound signals X_n,

a decoded sound common signal upmixing step of obtaining, for the each frame, an n-th channel upmixed common signal Y_Mnthat is a signal obtained by upmixing the decoded sound common signal Y_Mfor the each channel by an upmixing process using the decoded sound common signal Y_Mand inter-channel relationship information that is information indicating a relationship between the channels of the stereo,

a monaural decoded sound upmixing step of obtaining, for the each frame, an n-th channel upmixed monaural decoded sound signal X_Mnthat is a signal obtained by upmixing the monaural decoded sound signal X_Mfor the each channel by an upmixing process using the monaural decoded sound signal X_Mand information indicating a relationship between the channels of the stereo,

an n-th channel signal purification step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ^˜y_Mn(t)=(1−α_Mn)× y_Mn(t)+α_Mn× x_Mn(t) obtained by adding a value α_Mn× x_Mn(t) obtained by multiplying an n-th channel purification weight α_Mnby a sample value x_Mn(t) of the n-th channel upmixed monaural decoded sound signal X_Mnand a value (1−α_Mn)× y_Mn(t) obtained by multiplying a value (1−α_Mn) obtained by subtracting the n-th channel purification weight α_Mnfrom 1 by a sample value y_Mn(t) of the n-th channel upmixed common signal Y_Mn, as an n-th channel purified upmixed signal ^˜Y_Mn,

an n-th channel separation combination weight estimation step of obtaining, for the each frame with respect to the each channel n, a normalized inner product value for the n-th channel upmixed common signal Y_Mnof the n-th channel decoded sound signal X_nas an n-th channel separation combination weight β_n, and

an n-th channel separation combination step of obtaining, for the each frame and for each corresponding sample t with respect to the each channel n, a sequence based on a value ^˜x_n(t)= x_n(t)−β_n× y_Mn(t)+β_n×^˜y_Mn(t) obtained by subtracting a value β_n× y_Mn(t) obtained by multiplying the n-th channel separation combination weight β_nby the sample value y_Mn(t) of the n-th channel upmixed common signal Y_Mnfrom a sample value x_n(t) of the n-th channel decoded sound signal X_nand adding a value β_n× y_Mn(t) obtained by multiplying the n-th channel separation combination weight β_nby a sample value y_Mn(t) of the n-th channel purified upmixed signal Y_Mn, as the n-th channel purified decoded sound signal ^˜X_n,

the inter-channel relationship information includes information indicating a number of samples |τ| corresponding to a time difference between channels of a first channel and a second channel, information indicating which of the first channel and the second channel is preceding, and an inter-channel correlation coefficient γ that is a correlation coefficient between a first channel decoded sound signal and a second channel decoded sound signal, and

the decoded sound common signal upmixing step

uses the decoded sound common signal without change as a temporary first channel upmixed common signal Y′_M1and uses a signal obtained by delaying the decoded sound common signal by |τ| samples as a temporary second channel upmixed common signal Y′_M2in a case where the first channel is preceding,

uses a signal obtained by delaying the decoded sound common signal by |τ| samples as a temporary first channel upmixed common signal Y′_M1and uses the decoded sound common signal without change as a temporary second channel upmixed common signal Y′_M2in a case where the second channel is preceding, and

obtains, with respect to the each channel n, a sequence based on y_MN(t)=(1−γ)× x_n(t)+γ×y′_Mn(t) based on a sample value y′_Mn(t) of the temporary n-th channel upmixed common signal Y′_Mn, a sample value x_n(t) of the n-th channel decoded sound signal X_n, and the inter-channel correlation coefficient γ as the n-th channel upmixed common signal Y_Mn.